Site map > Friendly HTML > Refinements > Validators

Why does iCab scowl at this page
(even though the W3C HTML validator says
it contains no errors)?

This page contains some examples of code that satisfies the W3C and other validators, but which does not satisfy the recommendations of the W3C about what HTML 4 is, and in consequence does not get a favourable rating from the browser iCab. This is an updated version of a page originally written in 2000, and notes some changes in the behaviour of iCab over the intervening five years. The original comments were based on the behaviour of version 2.2 under MacOS 8.5, the recent tests on that of version β 337(3.0.0) (version of 8th August 2005) under MacOS 10.4.1.

iCab scowls Introduction iCab scowls

If you are looking at this page with the browser iCab you will see a scowling face in the top-right corner of the page (like the ones in the caption to this paragraph). This means that iCab considers that it contains some HTML errors. However, if you test it with the W3C validator, it will report no errors. This is what it said five years ago about the original version of this page:

Below are the results of attempting to parse this document with an SGML parser.

  No errors found!

Congratulations, this document validates as HTML 4.0 Transitional!

At that time the page was declared as HTML 4.0 Transitional. However, only one minor change was needed to make it conform to the HTML 4.01 Strict DOCTYPE, so it is now declared as Strict, and the validator reports as follows:

The document located at <http://bip.cnrs-mrs.fr/bip10/scowl.htm> was checked and found to be valid HTML 4.01 Strict. This means that the resource in question identified itself as "HTML 4.01 Strict" and that we successfully performed a formal validation using an SGML or XML Parser (depending on the markup language used).

Does this mean that iCab has got it wrong or that the W3C validator has it wrong? In this page I shall examine some examples to get an idea of whether the W3C validator or iCab come closer in practice to following W3C guidelines.

Note. This page was originally prepared in December 2000, at a time when I was more actively interested in learning what constituted valid HTML than I am now. Indeed, I had completely forgotten about its existence until I noticed some recent accesses in the logfile and wondered what page they referred to. The original tests were done with iCab version β 2.2, probably under MacOS 8.6, but I have done a recent check to see whether a recent version of iCab (β300 (3.0.0), July 2005) behaves in the same way under MacOS 10.4. Curiously, although it is still true that iCab scowls at this page, it only reports a single error instead of the several that earlier versions found. Details are noted below.

All of the comments referring to version β300 (3.0.0) have been checked with the newest version (β337 (3.0.0) of 8th August 2005), and all still apply.

Page valet. As well as iCab and the W3C validator, I have checked the revised page with Page Valet. In relation to the two remaining disagreements, it agrees with iCab about one and with the W3C validator for the other, as noted in the appropriate contexts.

Is white space required between attributes?

The recommendations say

Elements may have associated properties, called attributes, which may have values (by default, or set by authors or scripts). Attribute/value pairs appear before the final > of an element's start tag. Any number of (legal) attribute value pairs, separated by spaces, may appear in an element's start tag. They may appear in any order.

(strong emphasis added by me), which means that putting the quotation mark right up against the href in the following example,

<A class="internal "href="http://ir2lcb.cnrs-mrs.fr/~athel/dummy.htm">dummy URL</a>

is an error, and that's what iCab reports, if we put this dummy URL into the code for this page.

Update. The recommendations still say the same as quoted above, but iCab β337 (3.0.0) no longer reports this as an error.

When must an entity end with a semicolon?

The recommendations say

Note. In SGML, it is possible to eliminate the final ; after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ; in all cases to avoid problems with user agents that require this character to be present.

(emphasis in the original), which implies that

<a name="dummy">L'&eacute;te&eacute
dernier </a>

and

<a name="dummy">L'&eacute;te&eacute</a> dernier

are OK, but

L'&eacute;te&eacute dernier

is not: the first missing semicolon is excused by a line break and the second by the start of the </a> tag, but the third is required. The note is a bit vague, however, as it implies the existence of other exceptions without giving any hint of what they might be. Some consider it acceptable to replace the semicolon with an equals sign (something one may find convenient to do in a link to a CGI script), but the note doesn’t make it clear whether it agrees or not. In the following example, therefore,

value&=1

iCab reports an error, but the W3C validator does not.

Update. iCab β337 (3.0.0) no longer reports this as an error.

Must & be coded as &amp;?

The recommendations say

Authors should use &amp; (ASCII decimal 38) instead of & to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use &amp; in attribute values since character references are allowed within CDATA attribute values.

This is not very strongly expressed: does should imply that it’s an error if you don’t do it? iCab thinks so, as here: &. However, the W3C validator does not.

Update. iCab β337 (3.0.0) no longer reports this as an error.

In discussion on the iCab list, Sander Tekelenburg reported that iCab still objected to unescaped & characters in URLs, giving the page at http://santek.no-ip.org/~st/lom/Newsletter.php as an example. He is right: both iCab β337 (3.0.0) and the W3C validator treat unescaped & in URLs as an error, though neither of them does in ordinary text. The error message from the validator is worth quoting:

Entity references start with an ampersand (&) and end with a semicolon (;). If you want to use a literal ampersand in your document you must encode it as "&amp;" (even inside URLs!)

Note the word must. To my eyes the parenthesis at the end is just emphasizing the point about URLs, with no implication that unescaped & is acceptable in other contexts.

Should > be coded as &gt;?

The recommendations say

Authors wishing to put the < character in text should use &lt; (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter). Similarly, authors should use &gt; (ASCII decimal 62) in text instead of > to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values.

This is even less forcefully expressed than with & and &amp;, but it’s still clearly a recommendation that > ought to be coded as &gt;, and iCab reports it (e.g. here: >) as a warning (not as an error), which seems just right. The W3C validator says nothing.

Update. iCab β337 (3.0.0) no longer flags this with a warning.

Do height and width attributes have to be numbers?

The recommendation defines the height and width as %length;, and if you follow the link to find out what this means, you find the following cryptic explanation:

!ENTITY % Length CDATA – nn for pixels or nn% for percentage length

It is not clear to me how ordinary readers of the recommendations are expected to know what this means, but it seems to imply the common-sense idea that these attributes should be numbers of pixels or percentages. However, if you put something that is clearly not a number and makes no sense, such as

<img src="images/icabu.gif" width="Not too wide" height="moderate" alt="iCab scowls">

then iCab does indeed scowl (iCab scowls) but the W3C validator seems quite happy.

Update. iCab β337 (3.0.0) no longer reports this as an error.

Can a tag end with />?

Some pages earn a scowl from iCab because they end tags with />. I cannot find anything in the recommendations that says that this is OK in HTML, and it seems to be something that has flowed over from XHTML, which not only allows but in some cases requires tags to end like this. Of course, if a document is written in XHTML it ought not to declare itself as HTML 4.01 in the DOCTYPE: that is probably the source of some of the problems. iCab doesn’t like it (as here) but the W3C validator does not object. However, Page Valet agrees with iCab, with the following helpful comment:

stray slash found in tag; are you confusing SGML and XML (or HTML and XHTML) syntax? .

Update. This is now the only error reported by iCab β337 (3.0.0).

Is the attribute bgcolor allowed in the body element?

As originally written this page contained the line

<body bgcolor=white>

This was acceptable when the document was declared as HTML 4.0 Transitional, and is still acceptable to iCab β337 (3.0.0) when the document is declared as HTML 4.0.1 Strict, but produces the following error message from the W3C validator:

Line 19, column 14: there is no attribute "BGCOLOR"

<body bgcolor=white>

You have used the attribute named above in your document, but the document type you are using does not support that attribute for this element. This error is often caused by incorrect use of the "Strict" document type...

The recommendations list the bgcolor attribute as one that may apply to the body element, but say that it is deprecated. This is the only instance I have found of code that iCab β337 (3.0.0) accepts but the validator treats as an error. Page Valet also reports this as an error.

Summary

In all of the examples illustrated here it seems clear to me that when the W3C validator and the older version of iCab disagreed, with iCab reporting an error or a warning with code the validator accepted, a case could always be made that iCab was correct and understood the recommendations better than the validator did. The example given immediately above is the only one that I have found that goes in the other sense.

Update. Changes to iCab over the nearly five years since this page was first written have brought it much closer in line with what the W3C validator reports. However, I am far from convinced that this is a good thing. The problem probably results from the fact that the official definition of HTML resides in the coding built into SGML parsers, the recommendations being just a non-normative attempt to express the definition in natural language. In other words, when the recommendations say something different from what the parser requires it is the recommendations that are wrong, not the parser. That is all very well, but HTML was designed to be written by people, not by machines.