This is one of a series of pages that present the basic principles of using HTML to create a user-friendly web site. This page is concerned with refinements that allow pages to be more intelligible.
Principles of friendly web pages
The structure of an
HTML
document
Refinements: using additional features
Further refinements: improving accessibility
Hiding your email address from machines without inconveniencing human visitors
Trouble-shooting: why doesn’t it come out as
expected?
This page describes some more refinements to HTML, but unlike those in the previous page they are little used in the great majority of pages found on the web today. Nonetheless, when properly used they can make a major contribution to making pages more intelligible to readers.
Abbreviations present a problem in any kind of text you may be writing, whether
for printing or for the web, but especially they present a problem when writing for the web,
because you have have very little knowledge of your potential audience.
Just about everyone knows that USA
means the United
States of America
, for example, so you would not need to say so unless writing for small children.
But what about UEA
? If you are writing for experts on the Middle East they will
know that this stands for the United Arab Emirates
, and they would find it tedious if you wrote
it out in full twenty times in the same document; but if your readers are not experts they
will find it helpful to be told. One solution is to define your abbreviations just once at first
mention, but it may not always be easy for a reader to find the first mention, and in any case,
readers of web pages often jump into the middle of a document without reading the
introductory paragraphs.
HTML provides a convenient solution to
this problem, allowing you to use UEA
throughout your text, while at the same
time telling readers what it stands for at any point in the page. To do this, you need to
write
<acronym title="United Arab Emirates">UEA</acronym>
instead of
UEA
at every mention. To see what effect this roduces in your browser, examine the following
instances: UEA, UEA and
UEA. Here the first UEA
is marked with <acronym> ... </acronym>, the second is not marked at
all, and the third is marked in a way I shall come to shortly..
Depending on your browser, they may look exactly the same, or the first (and third) may be marked
as a known abbreviation by the browser, for example with a dotted underline. Then see what
happens if you place your mouse pointer over each one in turn. In the second case nothing should
happen, but in the first case a browser that understands the
<acronym> ... </acronym> tags will display the meaning of the
abbreviation in some way, for example as a tool tip
, a piece of text that appears
at the mouse pointer, or in the status line at the bottom of the window.
Unfortunately, instead of defining just one element of this kind, the W3C proposed two with overlapping functions that are not clearly differentiated from one another. As well as the above code, therefore, you could use
<abbr title="United Arab Emirates">UEA</abbr>
instead. In ordinary use an acronym
is an abbreviation that is commonly
pronounced as a word, like NATO,
but unlike FBI, which is not
pronounceable, and also unlike CIA, which
is pronounceable but is not usually treated as a word. This is a rather unnecessary distinction
to bother with in HTML, and
it would seem most logical just to use <abbr> ... </abbr>
for all abbreviations, and ignore <acronym> ... </acronym>.
Unfortunately, however, some widely used current browsers understand
<acronym> ... </acronym> but not
<abbr> ... </abbr>. For this reason in these pages I use
<acronym> ... </acronym> tags.
We have already met the link element as a way of telling the browser where to find the style sheet, but here we are concerned with a different use of it. Unless your site is very small you probably have several pages, and in most cases these are not all totally independent of one another but related either as parts of different multi-part documents or as elements in a hierarchy. For browsers that an make use of the information, therefore, it is helpful to include in the head section of each page an indication of where it fits into the structure of the site. The head section of this page, for example, includes the following lines:
<link rel=home href="homepage.htm" title="Home page">
<link rel=contents href="contents.htm" title="Contents">
<link rel=first href="basichtm.htm" title="First page in this series">
<link rel=prev href="refine.htm" title="Previous page in this series">
<link rel=next href="trouble.htm" title="Next page in this series">
<link rel=last href="trouble.htm" title="Last page in this series">
<link rel=up href="basichtm.htm" title="Parent page in the hierarchy">
<link rel=copyright href="copy.htm" title="Copyright information">
<link rel=help href="help.htm" title="Navigation help">
<link rel=author href="mailto:athel@ibsm.cnrs-mrs.fr" title="Email the author">
What does all this mean? Each of these link elements has a rel attribute, which uses a standard value to specify the function of the page defined by the href attribute in the organization of the site, and the title attribute provides the same information in plain English. Thus, by analysing all these lines the browser can deduce that the page you are looking at is part of a series that goes from a file basichtm.htm to a file trouble.htm, that trouble.htm is also the next page in the series, whereas refine.htm is the previous one in the series. The other lines specify where the full list of contents for the series can be found, where the home page of the site is, where copyright information is stored, where general information about navigating the site is located, as well as an email link to the author.
Unfortunately most current browsers make no use of this information unless special
toolbars are installed, even though the link element defined
in exactly this way has been part of HTML,
since about 1995. NCSA
Mosaic, which as the most popular browser before it was supplanted, first by Netscape
and then by Internet Explorer, put the link element to good
use, as does the text-only browser Lynx, but their good example as been very little
followed, though an honourable exception is the minority browser
iCab. For the moment, therefore, only a minority of your visitors will benefit
from any link elements in your pages, but they won’t do
any harm to the others.
If you scroll to the bottom of this page you will see a list of Access keys
. These
allow you to navigate the site without using your mouse. How you use them depends on the browser
you are using: in some versions of Internet Explorer, for example, you need to press the ALT
key as well as the key specified, and then ENTER. If you are using Internet Explorer,
try this now with ALT-T ENTER. If it works it will take you to the top of this page.
The lines of HTML that specify two of the keys defined for this page look like this:
<strong>T</strong>: <a href="#top" accesskey="T">Top of this page</a><br>
<strong>S</strong>: <a href="sitemap.htm" accesskey="S">Site map </a>
Links, whether inside the same page or to another page are specified with the of the a element, like other links. (Note that the first line implies the existence of an appropriate a element elsewhere in the page with the name attribute assigned a value of top.) The accesskey attribute defines the key to produce the desired effect. Unfortunately no conventions exist for assigning particular keys to particular functions: this means, first of all, that you need to tell readers somewhere on each page what keys are defined, and, more serious, that you cannot assume that keys used on one site will work the same way on another. The best you can do is to ensure that you follow a consistent strategy in all your pages. In all of mine, for example, the T key is defined as the access key for reaching the top of the page.
Your visitors need to have a way of contacting you if they want to comment on anything in your
pages, and they will almost certainly prefer to do this by email. So you ought to include both your
name and your email address on every page in your site. However, everyone nowadays is plagued
by junk email, or spam
, and there is little doubt that one of the ways the spammers find out
email addresses is to harvest them from web pages by searching for a elements
that contain the string mailto:
or just searching for anything that looks like an email address, for example
any long string that contains the character @ but no spaces.
The first thing to be very clear about is that there is no completely reliable solution to this problem. If you don’t put contact information on your pages you will seriously inconvenience some of your readers; if you do put it, then it is available to spammers as well as to legitimate readers. Nonetheless, it is becoming increasingly common to disguise email addresses in some way. If you do this, you should aim to inconvenience your human readers as little as possible (preferably not at all), so what they see on their screen should like like an email address, and if it is marked as a mailto link then clicking on it should launch their email program. If it is not a link it should still be possible to select it as text so that it can be pasted into an email program. Moreover, none of this should be affected by whether the browser has images and Javascript enable: if your visitors want to visit your pages without images or Javascript that is their choice.
So, what you need is a method that does not depend on Javascript, and does not display a little image that looks like text but cannot be selected and copied as text. At the same time you want to make it difficult (though not, unfortunately, impossible) to harvest by machine. The simplest way is to encode some of the characters (especially @) so that your browser will display them exactly as they would be if not encoded, but any person or machine that examines the source will see the coding. Suppose that your email address is falsename@yahoo.com. When people visit your page you want them to see
Email address: falsename@yahoo.com
And if you’ve marked it as a mailto link then it needs to behave like a mailto link if a visitor clicks on it.
However, if you code it as
Email address: <a href="mailto:falsename@yahoo.com">falsename@yahoo.com</a>
then although it will work fine for your legitimate visitors it will also work fine for email-harvesting programs. Instead, therefore, you can code it like this:
Email address: <a href="mailto:falsename;@yahoo.com">falsename@yahoo.com</a>
When a browser sees this it will automatically convert a to a (etc.), not only for displaying it on your screen, but also for sending the information to your email program, so a human user won’t notice any difference at all between is code and the version above. A harvesting program that examines the source of the page, however, will see falsename;@yahoo.com, which is not a valid email address.
Nonetheless, if you have any programming knowledge you will say, but surely it would be easy for email harvesting programs to decode this and extract the true address? Yes, it would indeed be easy, but fortunately current evidence indicates that spammers do not in fact take this easy step, probably because they can get a huge harvest of email addresses without bothering. It would also be technically possible (though less easy) to defeat Javascript solutions, and even displaying little images could be defeated by a sufficiently sophisticated character-recognition program. Ultimately, therefore, the only way to keep information from spammers is not to put it on the web at all.