Skip to main content

Section 8 Entering Text in Paragraphs, Titles, Captions

View Source

XML, and therefore PreTeXt, is a markup language. But by and large, what you type into your source will be what you see in your output. So there is not much to say. Except that Subsection 8.1 eventually will be essential. However we do test various tricky situations here (which have technical explanations we avoid). See the Author's Guide for a superior treatment of the topics addressed here.

Subsection 8.1 Special XML characters

One of the goals of PreTeXt is to relieve an author of managing the numerous conflicts when mixing languages that use different characters for special purposes. But, of course, XML has its own special characters.

If you type a “less-than” symbol in your source, the XML processor thinks you are starting an opening, or closing, tag. So how do you get a less-than sign into your source so that it survives into your output, like this: < ? You use an escaped version. Type literally, the four characters &lt; in your source. Then the XML processor will know you want the character and will not mistake it for a tag. But now we want to get an ampersand into our source like: &. How? Another escaped version of a character, literally the five characters &amp;.

Otherwise, keys on your keyboard, even international versions, should be fine in your source and behave as expected. WYTIWYG = What You Type Is What You Get. So the principal concession to using XML markup is the following very simple rule.

Rather than pressing the < and & keys on your keyboard, instead always enter the escape sequences &lt; and &amp; as replacements.

Simple. And it will work in “running text,” verbatim text (like when authoring the content of <c> or <pre> elements), and mixed into syntax to desribe mathematical expressions. XML has three other escape sequences &gt;, &apos;, and &quot;, for the characters >, ', and " (respectively). But they seem largely unnecessary for authoring in PreTeXt, as we now demonstrate by typing them directly from our keyboard into our source: >, ', and ".

How was &amp; authored? Work it out, and then check the source here for the answer.

Subsection 8.2 Quotations

The <q> tag will provide beginning and ending double quotations, while the <sq> tag will behave similarly but provide single quotes. Given the complexity of quotations, the different symbols used in different languages, and the over-simplified versions provided on keyboards, it is necessary to use markup.

“The roots of education are bitter, but the fruit is sweet.” (Aristotle)

‘It is always wise to look ahead, but difficult to look further than you can see.’ (Winston Churchill)

A large quote can be accomodated with the <blockquote> tag, which can carry within itself an <attribution> element.

The problem with writing a book in verse is, to be successful, it has to sound like you knocked it off on a rainy Friday afternoon. It has to sound easy. When you can do it, it helps tremendously because it's a thing that forces kids to read on. You have this unconsummated feeling if you stop.

―Dr. Seuss

We say that again, to test a multiline attribution of a block quotation. Notice how the dash appears automatically, and that it is a quotation dash in HTML, distinct from other sorts of dashes.

The problem with writing a book in verse is, to be successful, it has to sound like you knocked it off on a rainy Friday afternoon. It has to sound easy. When you can do it, it helps tremendously because it's a thing that forces kids to read on. You have this unconsummated feeling if you stop.

―Dr. Seuss
Children's Author

Sometimes a quote may extend across several paragraphs. Or a balanced pair of quotations marks crosses an XML boundary, so we need left, right, single and double versions. (For example, see Section 27 on poetry.) Here are all four in a haphazard order: ”, ‘, “, ’. These should be a last resort, and not a replacement for the q and sq tags. The left/right versions are used for the following quote from Abraham Lincoln, which we have edited into two paragraphs.

“I am not bound to win, but I am bound to be true. I am not bound to succeed, but I am bound to live by the light that I have.

I must stand with anybody that stands right, and stand with him while he is right, and part with him when he goes wrong.”

And as a tests, we try some crazy combinations of quotes, which would normally give some trouble where the quotation marks are adjacent.

  • “we use ‘single quotes inside of double quotes’”

  • ‘“double quotes inside of single quotes” with more’

  • “‘single quotes tight inside of double quotes’”

  • ‘“double quotes tight inside of single quotes”’

  • An “‘‘“absurd test”’’” of two adjacent single quotes inside a pair of double quotes

  • you would never do this, but a ‘‘pair of single quotes’’

N.B. We have taken no special care to protect against interactions of the actual quote characters (described above) in with themselves, or with the grouping tags.

Subsection 8.3 Groupings

It is possible to make some other groupings like quotations, such as {some emphasized text grouped within braces}, or [a Book Title inside brackets], an “Article Title”, 〈some foreign words inside angle brackets〉, or ⟦just a bit of text within double brackets⟧. Some of these are used extensively by scholars who study texts to note various restorations or deletions. Note that the <foreign> element may have a xml:lang attribute.

Note that the angle brackets, 〈 and 〉, are not the keyboard characters, < and >. Your best bet is to use the provided <angles> element when constructing a balanced pair. Similarly, <dblbrackets> is provided to make the double-bracket characters easily available, since they are likely not on your keyboard.

Subsection 8.4 Characters, Symbols, and Constructions

Some keyboard characters are ambiguous. Is the character ' an apostrophe or a right single quote? We presume the former, ', and provide markup as an alternative for the latter (described above). Is / used to separate words, or to form a fraction? We presume the former, /, and provide <solidus/>, ⁄, for the latter. We test some other characters straight from our US keyboard (with two being escape sequences).

~ ` ! @ # $ % ^ & * ( ) _ - + = [ ] { } | \ ; : ' " , < . > ? /

And again as verbatim text.

~ ` ! @ # $ % ^ & * ( ) _ - + = [ ] { } | \ ; : ' " , < . > ? /

Note that for a long time PreTeXt had empty elements for many of these characters, as a consequence of naïveté. So you might see <dollar/>, <ampersand/>, or others in old source. They will be deprecated and will raise warnings.

Now, when a character is nowhere to be found on your keyboard, we provide conveniences as markup. Or a keyboard character may have a different variant which we implement as an empty element. Here we test many of these. Read the Author's Guide for tags and more detail.

©   ℗   🄯   ®   ™   ℠   …   ·   ⁓   ‰   ¶   §   −   ×   ⁄   ÷   ±

There are a few common abbreviations of Latin phrases that can be achieved in HTML one way, and in with a slightly different mechanism. These are due to 's treatment of a period (full stop), depending on its surroundings. So not reserved characters, but just divergent treatment. Using these will lead to the best quality in all your outputs. See Will Robertson's informative and arcane blog post 1  on the topic if you want the full story for the treatment of a full stop in .

Tag Realization Meaning
ad AD anno Domini, in the year of the Lord
am AM ante meridiem, before midday
bc BC English, before Christ
ca ca. circa, about
eg e.g. exempli gratia, for example
etal et al. et alia, and others
etc etc. et caetera, and the rest
ie i.e. id est, in other words
nb NB nota bene, note well
pm PM post meridiem, after midday
ps PS post scriptum, after what has been written
vs vs. versus, against
viz viz. videlicet, namely

We also distinguish between abbreviations (vs.), acronyms (SCUBA) and initialisms (XML). This is a test of the text version of a multiplication symbol: 2 × 4.

Simple coordinates with degrees, minutes, seconds, or temperature, or distance in feet and inches. “We parked the car at 36°16′0.83″N, 122°35′47.27″W, and since it was 93°F, we walked 505′3.6″ so we could swim in the bay.”

An em dash is the long dash used much like parentheses (not an en dash used to denote a range, such as a range of page numbers). It should not have spaces around it, but some style guides allow for a thin space, which—we test right now. The command line stringparam can be set to none or thin to control this.

Subsection 8.5 Currency

For best results, be certain the right Unicode characters are in your source. If you only need a certain symbol rarely, you can enter it in your source via its Unicode number. For example, to obtain a peso, type &#x20B1;. This table has been tested with our default fonts, and should be fine for HTML output. Please report any difficulties with different fonts, as there are extra measures we can take to make these more robust. (We've already done this for the Paraguayan guaraní.)

Table 8.2. Supported Currency
Sign Unicode Name
$ U+0024 dollar
¢ U+00A2 cent
£ U+00A3 sterling
¤ U+00A4 currency
¥ U+00A5 yen
ƒ U+0192 florin
฿ U+0E3F baht
U+20A1 colon
U+20A4 lira
U+20A6 naira
U+20A9 won
U+20AB dong
U+20AC euro
U+20B1 peso
U+20B2 guarani

Subsection 8.6 Icons in Text

A limited supply of icons can be used when explaining how to use some computer application. The empty element is <icon/> and the attribute is @name.

We sprinkle a few into a few sentences to check baselines and font sizing. We sprinkle a few into a few sentences to check baselines and font sizing. We sprinkle a few into a few sentences to check baselines and font sizing. We sprinkle a few into a few sentences to check baselines and font sizing.

Table 8.3. User-Interface Icons
Name Icon Name Icon Name Icon
arrow-down arrow-left arrow-right
arrow-up file-save gear
menu wrench

Nominations of new icons must

  • Have a Unicode character representation.

  • Be in the HTML/CSS/JS Font Awesome catalog.

  • Be in the fontawesome package.

  • Have a reasonably semantic PreTeXt name.

Please supply all this information, including the official Unicode name, with your request. Better yet, form a pull request.

Warning 8.4. Icons, xelatex, and Fonts.

When processing a file with xelatex the FontAwesome icons are expected to be in a system font whose name is FontAwesome. This is not a filename, and installing the fontawesome package into your installation does not mean you have made this font available as a system font.

The Publisher's Guide contains some discussion about installing fonts into a system, as part of the documentation of creating a style, and has particular warnings about only using the fontawesome package as a vehicle for installing and accessing these fonts.

Subsection 8.7 Keyboard Keys

Your text can include specialized text meant to look like a key on the keyboard of a calculator or other device. So you can go b Enter < or F1. Or maybe a sequence as: Tab > Ctrl > T. Use the <kbd> element, with the label of the key as content.

There is a growing supply of keys which are labeled with graphics rather than text, such as a left arrow , right arrow , up arrow , down arrow , and Enter . See The PreTeXt Guide for the definitive list. In 8.5 the literal column means the symbol/character is the content of a <kbd> element, while the named column means the symbol/character has been chosen via the value of the @name attribute of an empty <kbd/> element.

Table 8.5. Named keys
Literal Named
Ampersand & &
Less than < <
Greater than > >
Dollar $ $
Per cent % %
Open brace { {
Close brace } }
Hash # #
Backslash \ \
Tilde ~ ~
Circumflex ^ ^
Underscore _ _
Table 8.6. Upper Case
~ ! @ # $ % ^ & * ( ) _ +
Tab Q W E R T Y U I O P { } |
CapsLock A S D F G H J K L : ' Enter
Shift Z X C V B N M < > ? Shift
Table 8.7. Lower Case
` 1 2 3 4 5 6 7 8 9 0 - =
Tab q w e r t y u i o p [ ] \
CapsLock a s d f g h j k l ; ' Enter
Shift z x c v b n m , . / Shift

Subsection 8.8 URLs, such as

The <url> element can be used to create an external reference. The mandatory @href attribute is the actual URL complete with the protocol (e.g. https://). Content for the element is optional, and if provided will be the “clickable” text. In this case, a @visual attribute can be provided, and this will become a footnote with a more friendly version of the URL. When no content is provided, the “clickable” will be the URL with a preference for an optional @visual. This subsection has some (extreme) tests and we leave complete documentation and full details for the PreTeXt Guide.

A long URL for testing: Notice in the source that you do not put any tags inside the @href or @visual attributes, but you may need to provide XML escape sequences (see Subsection 8.1).

A <url> element with content will get a footnote, by containing the (simplified) URL in the highly-recommended @visual attribute. If you do not provide the @visual attribute in this case, then you will get the @href value repeated, possibly with some editing. If you insist, you can make the @visual attribute identical to the @href attribute. Some tests:

Here is a totally bogus URL, which contains every possible legal character, so if this fails to convert there is some problematic character. In order to test the use of a percent sign (%) in a URL, we follow it by two hex digits, specifically, 58, which is a way to represent the character X in a URL. Normal text, monospace text, <url> with just @href, <url> with @href and @visual, <url> with @href, @visual and content. Notice how the various versions do, or do not, line-break in /PDF output, including the (potentially confusing) use of a use of a hyphen in the normal text version



Characters to a footnote 5 

Line-breaking in /PDF output is specialized, using path separators (slashes) as candidates for splitting across lines:

We are not fans of footnotes, they are totally unstructured. 6  A URL in a footnote migrates around, and so care must be taken. 7  This paragraph has two footnotes, one with a real URL from Jesse Oldroyd, another with a fake URL from the above suite. For good measure, we repeat the URL found in the first footnote (which lacks its own footnote by design): Carleson's Theorem. And we include a no-content version of the same link, with a visual version provided and employed:

Subsection 8.9 Biological Names

The taxon element can be used all by itself to get an italicized scientific name, as in Escherichia coli. It can also be structured with the elements genus and species, as in using both together in Cyclops kolensis. Or the subelements can be used individually. Rules for capitalization are presently your responsibility as an author. Possible improvements include new subelements, attributes for database identifiers, and checks on capitalization. Also, we might automatically abbreviate the genus after first use.

There is an attribute, @ncbi that you can use on the taxon element to precisely identify the organism you are discussing using an identification number from the National Center for Biotechnology Information 8 . Their taxonomy 9  is at Right now, we do not do anything with this attribute, but things like links are certainly possible. See the source of this document to see it in use with Drosophila miranda which could be used to construct a link to further information via id number 10  or even further information via just the name 11 .

Subsection 8.10 Verbatim in titles, \a&b#c%d~e{f}g$h_i^j, OK

You can test the migration of the special characters in this section title by requesting a 2-deep Table of Contents with --stringparam toc.level 2.

Subsection 8.11 Special Situations

Sage defines a nice syntax for generators of algebraic structures, but we must remember to use an escape sequence for the < symbol (see Subsection 8.1).

There is an alternate Sage syntax, which avoids the less-than and greater symbols.

Ampersands, less-than, and greater symbols are likely to be necessary in source code, such as Sage code (think generators of field extensions) or TikZ code (think arrowheads), and in matrices (think separating entries). If you have a big matrix, or a huge chunk of TikZ code, you can protect it all at once from the XML processor by wrapping it in <![CDATA[   ]]>. It should be possible to write without ever using the “CDATA” mechanism, but it might get tedious in places to use the supplied macros or XML escape sequences. This construction is often mis-understood as a solution better remedied by reading Subsection 8.1 again.

We test the three pre-defined macros for &, <, and > with a pair of aligned equations:

\begin{align*} a^2 + b^2\amp\lt c^2\\ c^2\amp\gt a^2 + b^2 \end{align*}

Subsection 8.12 Jupyter Notebook, Markdown, MathJax, Delimiters

A Jupyter notebook allows a mix of HTML (our logistical preference for a conversion) and Markown (another set of special characters and their escaped versions). Certain pairs of delimiters, when appearing in consecutive HTML <code> elements require extraordinary care. But the one nut we cannot crack is pairs of dollar signs. So the next paragraph is known to render badly in a Jupyter notebook, but should otherwise be a bit boring.

$ and $

Subsection 8.13 Characters, Ligatures, and More

This section is just for testing, and the more you know about , the more we would encourage you to not to read this. Look to the Author's Guide for the right way to author your source.

The ten reserved characters, directly in the source: # $ % ^ & _ { } ~ \. And again: X#X$X%X^X&X_X{X}X~X\X, but smashed up tight to intermediate characters.

In a verbatim presentation: # $ % ^ & _ { } ~ \.

And X#X$X%X^X&X_X{X}X~X\X. (These verbatim versions are authored in different paragraphs to work around the Jupyter notebook bug described above.)

We also disrupt certain constructions from . Attempting to sneak-in any traditional macro for the purposes of -only output, such as, say a \newpage, will fail since the leading backslash will be caught and converted to \textbackslash. (See? It just happened twice.) For technical reasons we want to particularly test \textbackslash, \textbraceleft, and \textbraceright.

Four “ ligatures”, --, ---, ``, and '', authored in running text, --, ---, ``, ''. It may be hard to tell that the two consecutive apostrophes have not coalesced into a curly left smart quote, but see below, the spacing is subtly different.

We want the double quote mark from your keyboard, ", to not morph into some other character: " .

More testing: runs of hyphens. Such as: - (one), -- (two), --- (three), ---- (four), ----- (five), ------ (six), ------- (seven). Use the empty elements <ndash/> and <mdash/> for the longer dashes/hyphens.

Runs of apostrophes should not become smart right double quotes: ' (one), '' (two), ''' (three), '''' (four), ''''' (five), '''''' (six), ''''''' (seven). You might want to cut-and-paste these into a text file to convince yourself there are the right number of characters. Here are two smart right double quotes, separated by a non-breaking space, for visual comparison: ” ”. Or 30 apostrophes on a line of their own (longer) followed by 15 smart right double quotes (shorter).



Runs of backticks (accent grave) should not become smart left double quotes when the output is processed by : ` (one), `` (two), ``` (three), ```` (four), ````` (five), `````` (six), ``````` (seven). Furthermore, in a context where Markdown syntax is recognized as well (e.g. a Jupyter notebook), paired backticks should not produce `inline verbatim text`.

The next paragraph has a long run of words separated/joined by the keyboard forward-slash character. With this input, will not line-break at the slash, nor will it hypenate anywhere. PreTeXt automatically provides an improved slash, which will line-break, as you should see below in output. There is a bad right margin, but that is due to the absurdity of this test. This sort of problem should be no better or worse for the use of this character. Further refinements (zero-width space) and packages can be used to get hyphenation. HTML will line-break rationally with no extra help. Remember the <solidus/> character for super-simple text fractions like 7⁄32 (which will not line-break), and math elements or SI unit markup for technical work.


Subsection 8.14 HTML and accidental mathematics

We render mathematics in web pages with the fantastic MathJax Javascript library. Simplifying just a bit, it recognizes syntax within a page, takes control of that text, and replaces it wth nice fonts and formatting. Now, if you write about you might well have some mathematics in your examples. Best practice would be to use verbatim text for that, and we mark off such text as being off-limits to MathJax.

But if you are writing running text, then you can (accidentally) author some text which MathJax recognizes and converts to something (unintended). And if you are doing this intentionally, then you have ignored PreTeXt markup for mathematics, and are missing out on some features.

A few tests that we can prevent any accidents.

Inline mathematics: \(x^2\).

Display mathematics: \begin{align}x^2+y^2=z^2\end{align}

Double backticks is a common construction, which in /PDF output should not become an opening quote-mark. Also, a single backtick in HTML is a signal for MathJax to interpret ASCIIMath, and then a double backtick causes “random” pieces of mathematics on a page to not render at all. So we have a quotation authored -style: ``We have nothing to fear, but fear itself.''[]@!$&'()*+,;=