XML, and therefore PreTeXt, is a markup language. But by and large, what you type into your source will be what you see in your output. So there is not much to say. Except that Subsection 8.1 eventually will be essential. However we do test various tricky situations here (which have technical explanations we avoid). See the Author’s Guide for a superior treatment of the topics addressed here.
One of the goals of PreTeXt is to relieve an author of managing the numerous conflicts when mixing languages that use different characters for special purposes. But, of course, XML has its own special characters.
If you type a “less-than” symbol in your source, the XML processor thinks you are starting an opening, or closing, tag. So how do you get a less-than sign into your source so that it survives into your output, like this: < ? You use an escaped version. Type literally, the four characters < in your source. Then the XML processor will know you want the character and will not mistake it for a tag. But now we want to get an ampersand into our source like: &. How? Another escaped version of a character, literally the five characters &.
Otherwise, keys on your keyboard, even international versions, should be fine in your source and behave as expected. WYTIWYG = What You Type Is What You Get. So the principal concession to using XML markup is the following very simple rule.
Rather than pressing the < and & keys on your keyboard, instead always enter the escape sequences < and & as replacements.
Simple. And it will work in “running text,” verbatim text (like when authoring the content of <c> or <pre> elements), and mixed into LaTeX syntax to desribe mathematical expressions. XML has three other escape sequences >, ', and ", for the characters >, ’, and " (respectively). But they seem largely unnecessary for authoring in PreTeXt, as we now demonstrate by typing them directly from our keyboard into our source: >, ’, and ".
The <q> tag will provide beginning and ending double quotations, while the <sq> tag will behave similarly but provide single quotes. Given the complexity of quotations, the different symbols used in different languages, and the over-simplified versions provided on keyboards, it is necessary to use markup.
“The roots of education are bitter, but the fruit is sweet.” (Aristotle)
‘It is always wise to look ahead, but difficult to look further than you can see.’ (Winston Churchill)
A large quote can be accomodated with the <blockquote> tag, which can carry within itself an <attribution> element.
The problem with writing a book in verse is, to be successful, it has to sound like you knocked it off on a rainy Friday afternoon. It has to sound easy. When you can do it, it helps tremendously because it’s a thing that forces kids to read on. You have this unconsummated feeling if you stop.
―Dr. Seuss
We say that again, to test a multiline attribution of a block quotation. Notice how the dash appears automatically, and that it is a quotation dash in HTML, distinct from other sorts of dashes.
The problem with writing a book in verse is, to be successful, it has to sound like you knocked it off on a rainy Friday afternoon. It has to sound easy. When you can do it, it helps tremendously because it’s a thing that forces kids to read on. You have this unconsummated feeling if you stop.
―Dr. Seuss Children’s Author
Sometimes a quote may extend across several paragraphs. Or a balanced pair of quotations marks crosses an XML boundary, so we need left, right, single and double versions. (For example, see Section 28 on poetry.) Here are all four in a haphazard order: ”, ‘, “, ’. These should be a last resort, and not a replacement for the q and sq tags. The left/right versions are used for the following quote from Abraham Lincoln, which we have edited into two paragraphs.
“I am not bound to win, but I am bound to be true. I am not bound to succeed, but I am bound to live by the light that I have.
I must stand with anybody that stands right, and stand with him while he is right, and part with him when he goes wrong.”
And as a tests, we try some crazy combinations of quotes, which would normally give LaTeX some trouble where the quotation marks are adjacent.
“we use ‘single quotes inside of double quotes’”
‘“double quotes inside of single quotes” with more’
“‘single quotes tight inside of double quotes’”
‘“double quotes tight inside of single quotes”’
An “‘‘“absurd test”’’” of two adjacent single quotes inside a pair of double quotes
you would never do this, but a ‘‘pair of single quotes’’
N.B. We have taken no special care to protect against interactions of the actual quote characters (described above) in LaTeX with themselves, or with the grouping tags.
It is possible to make some other groupings like quotations, such as {some emphasized text grouped within braces}, or [a Book Title inside brackets], an “Article Title”, 〈some foreign words inside angle brackets〉, or ⟦just a bit of text within double brackets⟧. Some of these are used extensively by scholars who study texts to note various restorations or deletions. Note that the <foreign> element may have a xml:lang attribute.
Note that the angle brackets, 〈 and 〉, are not the keyboard characters, < and >. Your best bet is to use the provided <angles> element when constructing a balanced pair. Similarly, <dblbrackets> is provided to make the double-bracket characters easily available, since they are likely not on your keyboard.
Subsection8.4Characters, Symbols, and Constructions
Some keyboard characters are ambiguous. Is the character ' an apostrophe or a right single quote? We presume the former, ’, and provide markup as an alternative for the latter (described above). Is / used to separate words, or to form a fraction? We presume the former, /, and provide <solidus/>, ⁄, for the latter. We test some other characters straight from our US keyboard (with two being escape sequences).
Note that for a long time PreTeXt had empty elements for many of these characters, as a consequence of naïveté. So you might see <dollar/>, <ampersand/>, or others in old source. They will be deprecated and will raise warnings.
Now, when a character is nowhere to be found on your keyboard, we provide conveniences as markup. Or a keyboard character may have a different variant which we implement as an empty element. Here we test many of these. Read the Author’s Guide for tags and more detail.
There are a few common abbreviations of Latin phrases that can be achieved in HTML one way, and in LaTeX with a slightly different mechanism. These are due to LaTeX’s treatment of a period (full stop), depending on its surroundings. So not reserved characters, but just divergent treatment. Using these will lead to the best quality in all your outputs. See Will Robertson’s informative and arcane blog post 1
latex-alive.tumblr.com/post/827168808
on the topic if you want the full story for the treatment of a full stop in LaTeX.
Tag
Realization
Meaning
ad
AD
anno Domini, in the year of the Lord
am
AM
ante meridiem, before midday
bc
BC
English, before Christ
ca
ca.
circa, about
eg
e.g.
exempli gratia, for example
etal
et al.
et alia, and others
etc
etc.
et caetera, and the rest
ie
i.e.
id est, in other words
nb
NB
nota bene, note well
pm
PM
post meridiem, after midday
ps
PS
post scriptum, after what has been written
vs
vs.
versus, against
viz
viz.
videlicet, namely
We also distinguish between abbreviations (vs.), acronyms (SCUBA) and initialisms (XML). This is a test of the text version of a multiplication symbol: 2 × 4.
Simple coordinates with degrees, minutes, seconds, or temperature, or distance in feet and inches. “We parked the car at 36°16′0.83″N, 122°35′47.27″W, and since it was 93°F, we walked 505′3.6″ so we could swim in the bay.”
An em dash is the long dash used much like parentheses (not an en dash used to denote a range, such as a range of page numbers). It should not have spaces around it, but some style guides allow for a thin space, which—we test right now. A publication file entry can be set to none or thin to control this.
For best results, be certain the right Unicode characters are in your source. If you only need a certain symbol rarely, you can enter it in your source via its Unicode number. For example, to obtain a peso, type ₱. This table has been tested with our default fonts, and should be fine for HTML output. Please report any difficulties with different LaTeX fonts, as there are extra measures we can take to make these more robust. (We’ve already done this for the Paraguayan guaraní.)
A limited supply of icons can be used when explaining how to use some computer application. The empty element is <icon/> and the attribute is @name.
We sprinkle a few into a few sentences to check baselines and font sizing. We sprinkle a few into a few sentences to check baselines and font sizing. We sprinkle a few into a few sentences to check baselines and font sizing. We sprinkle a few into a few sentences to check baselines and font sizing.
When processing a LaTeX file with xelatex the Font Awesome 5 icons are expected to be in a system font whose name is Font Awesome 5 Free. This is not a filename, and installing the LaTeXfontawesome5 package into your LaTeX installation does not always guarantee that this font will automatically be available as a system font.
The Publisher’s Guide contains some discussion about installing fonts into a system, as part of the documentation of creating a LaTeX style, and has particular warnings about only using the LaTeXfontawesome5 package as a vehicle for installing and accessing these fonts.
Your text can include specialized text meant to look like a key on the keyboard of a calculator or other device. So you can go bEnter< or F1. Or maybe a sequence as: Tab > Ctrl > T. Use the <kbd> element, with the label of the key as content.
There is a growing supply of keys which are labeled with graphics rather than text, such as a left arrow ←, right arrow →, up arrow ↑, down arrow ↓, and Enter ⮠. See The PreTeXt Guide for the definitive list. In 8.6 the literal column means the symbol/character is the content of a <kbd> element, while the named column means the symbol/character has been chosen via the value of the @name attribute of an empty <kbd/> element.
The <url> element can be used to create an external reference. The mandatory @href attribute is the actual URL complete with the protocol (e.g. https://). Content for the element is optional, and if provided will be the “clickable” text. In this case, a @visual attribute can be provided, and this will become a footnote with a more friendly version of the URL. When no content is provided, the “clickable” will be the URL with a preference for an optional @visual. This subsection has some (extreme) tests and we leave complete documentation and full details for the PreTeXt Guide.
A <url> element with content will get a footnote, by containing the (simplified) URL in the highly-recommended @visual attribute. If you do not provide the @visual attribute in this case, then you will get the @href value repeated, possibly with some editing. If you insist, you can make the @visual attribute identical to the @href attribute. Some tests:
Here is a totally bogus URL, which contains every possible legal character, so if this fails to convert there is some problematic character. In order to test the use of a percent sign (%) in a URL, we follow it by two hex digits, specifically, 58, which is a way to represent the character X in a URL. Normal text, monospace text, <url> with just @href, <url> with @href and @visual, <url> with @href, @visual and content. Notice how the various versions do, or do not, line-break in LaTeX/PDF output, including the (potentially confusing) use of a use of a hyphen in the normal text version
This paragraph has two footnotes, one with a real URL from Jesse Oldroyd, another with a fake URL from the above suite. For good measure, we repeat the URL found in the first footnote (which lacks its own footnote by design 8
And if you do provide a link like Carleson’s Theorem in a footnote, provide an easy-to-read reference, such as en.wikipedia.org/wiki/Carleson%27s_theorem since the extra footnote and any visual will be squelched.
The taxon element can be used all by itself to get an italicized scientific name, as in Escherichia coli. It can also be structured with the elements genus and species, as in using both together in Cyclopskolensis. Or the subelements can be used individually. Rules for capitalization are presently your responsibility as an author. Possible improvements include new subelements, attributes for database identifiers, and checks on capitalization. Also, we might automatically abbreviate the genus after first use.
There is an attribute, @ncbi that you can use on the taxon element to precisely identify the organism you are discussing using an identification number from the National Center for Biotechnology Information 9
is at www.ncbi.nlm.nih.gov/taxonomy. Right now, we do not do anything with this attribute, but things like links are certainly possible. See the source of this document to see it in use with Drosophilamiranda which could be used to construct a link to further information via id number 11
Sage defines a nice syntax for generators of algebraic structures, but we must remember to use an escape sequence for the < symbol (see Subsection 8.1).
There is an alternate Sage syntax, which avoids the less-than and greater symbols.
Ampersands, less-than, and greater symbols are likely to be necessary in source code, such as Sage code (think generators of field extensions) or TikZ code (think arrowheads), and in matrices (think separating entries). If you have a big matrix, or a huge chunk of TikZ code, you can protect it all at once from the XML processor by wrapping it in <![CDATA[]]>. It should be possible to write without ever using the “CDATA” mechanism, but it might get tedious in places to use the supplied macros or XML escape sequences. This construction is often mis-understood as a solution better remedied by reading Subsection 8.1 again.
We test the three pre-defined LaTeX macros for &, <, and > with a pair of aligned equations:
A Jupyter notebook allows a mix of HTML (our logistical preference for a conversion) and Markown (another set of special characters and their escaped versions). Certain pairs of delimiters, when appearing in consecutive HTML <code> elements require extraordinary care. But the one nut we cannot crack is pairs of dollar signs. So the next paragraph is known to render badly in a Jupyter notebook, but should otherwise be a bit boring.
$ and $
Subsection8.13LaTeX Characters, Ligatures, and More
This section is just for testing, and the more you know about LaTeX, the more we would encourage you to not to read this. Look to the Author’s Guide for the right way to author your source.
The ten reserved characters, directly in the source: # $ % ^ & _ { } ~ \. And again: X#X$X%X^X&X_X{X}X~X\X, but smashed up tight to intermediate characters.
In a verbatim presentation: # $ % ^ & _ { } ~ \.
And X#X$X%X^X&X_X{X}X~X\X. (These verbatim versions are authored in different paragraphs to work around the Jupyter notebook bug described above.)
We also disrupt certain constructions from LaTeX. Attempting to sneak-in any traditional macro for the purposes of LaTeX-only output, such as, say a \newpage, will fail since the leading backslash will be caught and converted to \textbackslash. (See? It just happened twice.) For technical reasons we want to particularly test \textbackslash, \textbraceleft, and \textbraceright.
Four “TeX ligatures”, --, ---, ``, and '', authored in running text, --, ---, ``, ’’. It may be hard to tell that the two consecutive apostrophes have not coalesced into a curly left smart quote, but see below, the spacing is subtly different.
We want the double quote mark from your keyboard, ", to not morph into some other character: " .
More testing: runs of hyphens. Such as: - (one), -- (two), --- (three), ---- (four), ----- (five), ------ (six), ------- (seven). Use the empty elements <ndash/> and <mdash/> for the longer dashes/hyphens.
Runs of apostrophes should not become smart right double quotes: ’ (one), ’’ (two), ’’’ (three), ’’’’ (four), ’’’’’ (five), ’’’’’’ (six), ’’’’’’’ (seven). You might want to cut-and-paste these into a text file to convince yourself there are the right number of characters. Here are two smart right double quotes, separated by a non-breaking space, for visual comparison: ” ”. Or 30 apostrophes on a line of their own (longer) followed by 15 smart right double quotes (shorter).
’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’.
”””””””””””””””.
Runs of backticks (accent grave) should not become smart left double quotes when the output is processed by LaTeX: ` (one), `` (two), ``` (three), ```` (four), ````` (five), `````` (six), ``````` (seven). Furthermore, in a context where Markdown syntax is recognized as well (e.g. a Jupyter notebook), paired backticks should not produce `inline verbatim text`.
The next paragraph has a long run of words separated/joined by the keyboard forward-slash character. With this input, LaTeX will not line-break at the slash, nor will it hypenate anywhere. PreTeXt automatically provides an improved slash, which will line-break, as you should see below in LaTeX output. There is a bad right margin, but that is due to the absurdity of this test. This sort of problem should be no better or worse for the use of this character. Further refinements (zero-width space) and packages can be used to get hyphenation. HTML will line-break rationally with no extra help. Remember the <solidus/> character for super-simple text fractions like 7⁄32 (which will not line-break), and math elements or SI unit markup for technical work.
We render mathematics in web pages with the fantastic MathJax Javascript library. Simplifying just a bit, it recognizes LaTeX syntax within a page, takes control of that text, and replaces it wth nice fonts and formatting. Now, if you write about LaTeX you might well have some mathematics in your examples. Best practice would be to use verbatim text for that, and we mark off such text as being off-limits to MathJax.
But if you are writing running text, then you can (accidentally) author some text which MathJax recognizes and converts to something (unintended). And if you are doing this intentionally, then you have ignored PreTeXt markup for mathematics, and are missing out on some features.
Double backticks is a common LaTeX construction, which in LaTeX/PDF output should not become an opening quote-mark. Also, a single backtick in HTML is a signal for MathJax to interpret ASCIIMath, and then a double backtick causes “random” pieces of mathematics on a page to not render at all. So we have a quotation authored LaTeX-style: ``We have nothing to fear, but fear itself.’’