Site map > Friendly HTML > Typography

Creating a Web Page: a Basic Guide

Specifying the Typographic Style of a Page

This is one of a series of pages that present the basic principles of using HTML to create a user-friendly web site. This page describes the use of tags to produce typographical variants such as italic or bold type, and the use of entities to produce special characters like accented letters (e.g. é), special symbols (e.g. £) and symbols that cannot be typed normally because they have special meanings in HTML (& < and >).

Contents

* Principles of friendly web pages

* The structure of an HTML document

* Specifying the layout

* Specifying the typography

* Logical or physical tags?

* Punctuation

* The <em> (emphatic) and <strong> (strongly emphatic) tags

* The <i> (italic) and <b> (bold) tags

* Other logical tags

* Accented letters and other special characters

* Linking with other locations

* Images

* Organization of web files

* Style sheets

* Refinements: using additional features

* Further refinements: improving accessibility

* Trouble-shooting: why doesn’t it come out as expected?

* Detailed list of contents

Specifying the typography

Logical or physical tags?

There are two basic approaches to typographical control. It is best to assume the readers of your pages have set their browsers the way they want them, so that if you indicate which words or passages fulfil certain kinds of logical functions you can trust the user’s browser will interpret your indications in a way that satisfies the user (not necessarily you). Unfortunately, however, many web authors take the view that they know better than any user which bits of your page ought to be in italics or bold and so they specify these in the HTML with physical tags. The first approach is far closer to the spirit in which HTML was conceived and it works even if the user is not using a visual browser at all – for example a browser for blind people that will know how to speak differently for emphasis but cannot represent italics in speech. In practice physical tags remain widely used, often for reasons that have not been clearly thought out.

I always try to use logical tags when it is a matter of indicating the logical structure of the page (i.e. which bits need emphasis or fulfil defined functions like indicating someone’s address), but physical tags where there are generally recognized conventions to be followed: e.g. biological convention dictates that names of species such as Escherichia coli should be italicized, and it is common practice in mathematics to represent symbols for matrices in bold.

The commonest logical tags are <em> and <strong>, which will be described shortly (after a brief note about punctuation), but others also exist.

Punctuation

In ordinary typography, i.e. on the printed page, it is conventional that a point of punctuation is in the same style as the text that immediately precedes it. This means, for example, that if a sentence ends with word in italics then the full stop should be in italics as well. You may well wonder why this matters, given that an italic full stop in isolation looks much the same as a roman full stop. However, computer software tends to balance the space around an italic passage rather badly, and some web browsers do it particularly badly. Compare the following cases, in which the first line shows the HTML code and the second shows the way your browser handles it:

roman, <i>italic</i>, <b>bold</b>.

The above code produces this on your browser: roman, italic, bold.

With the following code, however,

roman, <i>italic,</i> <b>bold.</b>

the result is: roman, italic, bold.

The balance will usually be better if the comma is within the range of the italic element, and, depending on your browser, it may be very much better.

The <em> (emphatic) and <strong> (strongly emphatic) tags

The <em> and <strong> tags are used to add different degrees of emphasis to words or sentences in your page. Most browsers represent emphatic sections as italic and strongly emphatic sections as bold. You can tell what the browser you are using at this moment does by seeing if emphatic sections as is shown in the same style as italic and if strongly emphatic sections as is shown in the same style as bold. It is not good practice to combine these tags, as the results are not predictable, i.e. <strong><em> and <em><strong> may be interpreted by some browsers as equivalent to one another, but they may be also be treated as <em> or as <strong>, or as just plain wrong and so displayed as normal text. If you decide to ignore this advice you should still nest the tags correctly, closing the inner one before the outer: <em><strong> ... </em></strong> is definitely an error and may have unexpected results.

The range of a typographic tag like <em> should be wholly contained within the range of a layout tag such as <p> or <h2>. This means that if you want to extend an emphasized passage beyond the end of one paragraph and into the next you should close it at the end of one paragraph and reopen it at the beginning of the next.

The <i> (italic) and <b> (bold) tags

These tags allow you specify italic or bold explicitly. They have some recommended uses that have nothing to do with emphasis. For example, it is common typographic practice to print words that are in a different language from the rest in italics, and similarly with words that are being considered just as words, as in the word and is often written as &, a sentence that would be difficult to make sense of if just written as the word and is often written as &. (If you want to avoid italics you can of course put the word in question in quotation marks: the word and is often written as &.) It is likewise usual to put algebraic symbols in italics, and something like y = a + bx is much easier to recognize as a piece of algebra if this convention is followed rather than just writing y = a + bx.

Other logical tags

There are several other logical tags apart from <em> and <strong> already described. Most of these need not concern us for a basic HTML file, but there are two others that are useful even in a first document. These are as follows (listed together with the two already mentioned):

Tag pair Meaning/Use Usual interpretation
<em> ... </em> Emphatic italic
<strong> ... </strong> Strongly emphatic bold
<cite> ... </cite> Quoted text italic
<address> ... </address> Addresses italic

It is obvious – and would be even if this list did not illustrate it – that we can imagine many more logically different kinds of text than there are different typographical styles available to the browser; as a result, therefore, it is inevitable that the same physical style must be used to represent more than one logical style. You may ask what is the point in using <em>, <cite> and <address> tags in an HTML document if they are all going to look the same in the browser. The answer is that even if present-day browsers treat them all alike the browsers of the future may distinguish between them, and it does no harm to include information in the HTML that future browsers may be able to use. Second, when you are revising your HTML – for example because some addresses have changed – it is useful to be able to pick out the <address> tags rapidly without having to wade through a sea of <i> tags.

Accented letters and other special characters

If a page contains accented letters (e, o, Å, u, etc.) or certain mathematical symbols like > and <, then they should not be typed just like that in the HTML file. This should work correctly with most characters if your server supplies correct information about the coding used and if the user’s browser interprets this correctly, but although you can ensure that the first requirement is satisfied there isn’t much you can do about the second. The essential point is that it will not work in general: some characters will be replaced by quite different special characters, whereas others, most notably <, will be misinterpreted by the browsers as HTML codes and may produce quite bizarre results in the browser. To avoid this, you need to replace such characters by entities in the HTML, which consist of the sequence &code;, where & and ; define where the entity begins and ends, and code specifies which character to insert.

For most characters alternative numerical and mnenomic code sequences exist. For example, both &#176; and &deg; define the same character, the degree sign °. However, the mnemonic versions are much easier to remember, and for the common accented letters they take consistent forms. Thus e is written as &eacute; and the other letters with acute accents are expressed analogously; i is &iacute;, and so on. These codes are case-sensitive: &eacute; cannot be written &eAcute;, for example. All of the accented letters normally available can be listed quite concisely, together with the most important other entities:

Examples EntitiesAnalogous casesUsed in
é É &eacute; &Eacute; á í ó ú Á Í Ó Ú French, Spanish, Portuguese
è È &egrave; &Egrave; à ì ò ù À Ì Ò Ù French, Italian, Portuguese
ê Ê &ecirc; &Ecirc; â î ô û Â Î Ô Û French, Portuguese
ä Ä &auml; &Auml; ë ï ö ü ÿ Ë Ï Ö Ü German, Swedish
ç Ç &ccedil; &Ccedil; French, Portuguese
ñ Ñ &ntilde; &Ntilde; ã õ Ã Õ Spanish, Portuguese
å Å &aring; &Aring; Swedish, Danish, Norwegian
ø Ø &oslash; &Oslash; Danish, Norwegian
ß &szlig; German
æ Æ &aelig; &AElig; Danish, Norwegian
& (space) &amp; &nbsp; General
< > &lt; &gt; Mathematics
° µ &deg; &micro; Science
£ ¢ ¥ &pound; &cent; &yen; Finance
α β &alpha; &beta; γ δ ε ζ ... Mathematics, chemistry

The character listed as (space) is a non-breaking space, i.e. a fixed-width space at which a line will not be broken. So, for example, if you want to ensure that a quantity like 3 cm does not get broken between the 3 and the cm at the end of a line you can write it as 3&nbsp;cm. Although most browsers interpret a bare & (with white space following it) as such this is not correct HTML, and &amp; should be used instead. Some HTML authors use an entity &quot; for a quotation mark " if it does not occur within a tag; however, this is not necessary unless you want to include a " within some text that is already enclosed between a pair of quotation marks within a tag. No corresponding problems arise with the semicolon (fortunately!), as this cannot be misinterpreted as the end of an entity unless there has been an opening &.

A specific question that arises with & is that it frequently occurs as part of a URL, so you may need to include it in a link. For example, if you enter ampersand as a search term in a search engine you are likely to be taken to a page with a URL something like http://www.google.com/search?q=ampersand&btnG=Google+Search. If you need to include this as a link, you may wonder whether to write this:

<a href="http://www.google.com/search?q=ampersand&btnG=Google+Search">

or this:

<a href="http://www.google.com/search?q=ampersand&amp;btnG=Google+Search">

The answer is that you must write the second in order to have valid HTML. The browser is then responsible for converting &amp; to & before sending it to the remote server as a request.

Another point arises if you want to use curly (or smart) quotation marks (as in a printed book) instead of straight ones ". The first point to bear in mind is that straight quotation marks are normally just as clear for the reader as curly ones, so you gain little by using curly ones. However, if you think it matters you should use <q> as an opening quotation mark and </q> as a closing one. A modern browser will convert these to the appropriate curly quotation marks, and an older one will either ignore them or show both as straight quotation marks. What you should not do is just put the curly quotation marks in your HTML. These may look fine on your local system, but they will be converted to garbage characters on some others. The <q> tags can be nested within one another, so it is legitimate to write something like this:

<q>A quotation can contain <q>another quotation</q> within in it.</q>

On your system this code is displayed like this: A quotation can contain another quotation within in it.

You cannot assume that the less common characters exist on all systems and will always be reproduced as you expect: if not, they may be replaced by ? or by the entity itself. The lower-case accented letters used in Western European languages are likely to be safe (as long as your readers are working on a system designed for one of these languages), but the capital letters may not be. Moreover, you cannot safely generalize from the examples above to accented letters in other languages; for example, you cannot assume that &cacute; will put an acute accent on a letter c or that &scedil; will put a cedilla under a letter s.

If you only need to use accented letters occasionally – for example if your page is in English or Dutch but you need to include a few foreign words as in names of people or places, then the simplest thing is just to type in each entity when you come to it. However, if your whole page is in another language or for some other reason you need to include a large number of accented letters, it is rather cumbersome and error-prone to deal with each case separately. An easier method is just to write the page as you normally would in a word-processor – writing e rather than &eacute;, etc.; later on, after checking it carefully for errors in this form you can use the find-and-replace function of your text editor to replace all instances of e by &eacute;, etc.

Not all current browsers do not interpret the entities for Greek letters (&alpha; etc.) correctly, and so they are not as useful as they might be. Some web authors try to get around this by defining little images, but these rarely look right except on exactly the system used by the author, or else they try to specify use of a symbol font that contains the characters wanted, but this also does not work reliably. Incidentally, these Greek entities are not intended for representing text in the Greek language. If you want to prepare a web page in Greek or that contains more than occasional isolated Greek letters you need to refer to a more specialized source of information.

A complete table has been compiled by Martin Ramsch.