This section details common errors in HTML composition, that may lead
to documents which are not fully device-independent. The behaviors of
these errors are undefined, so certain browsers may render them as
intended but not all browsers are guaranteed of doing
so. Therefore, these mistakes should be avoided, even if
your browser of choice renders your documents correctly.
Contents
This is probably the most prevalent kind of error, and is the number
one culprit in cases of ugly HTML rendering. If you fix nothing else,
fix these!
Perhaps the biggest misconception about the <P> element is that
it signals an end-of-paragraph, rather then a paragraph break. According
to the specification, "<P> is used between two pieces of
text which otherwise would be flowed together".
In most cases this is not important -- functionally, the <P>
serves as an end-of-paragraph marker. However, in certain contexts,
use of <P> should be avoided, such as directly before or after
any other element which already implies a paragraph break. To wit,
the <P> element should not be placed either
before or after the headings,
HR
(can I get a ruling on this? people don't handle HR consistently... X
Mosaic has no white space before or after, and Lynx appears to put
white space after),
ADDRESS,
BLOCKQUOTE,
or PRE.
It should also not be placed immediately before or after a list
element of any stripe. That is, a <P> should not be used to mark the
end-of-text for <LI>, <DT> or <DD>. These elements
already imply paragraph breaks.
Some clarifications on the above might be in order. One is the
the difficulties of rendering appropriate white space by a browser.
While it is true that all of the entities mentioned above imply a
paragraph break, this only occassionally means that they also imply
white space between sections -- this depends on the browser. So,
while you might feel inclined to add a <P> in order to fix white
space problems, please think twice and avoid it if you can.
Also, when using the glossary list (DL),
please try to avoid using multiple DD's (definitions of terms) in
order to provide multiple entries for a term (DT). Instead, use a
<P> marker between paragraphs in a definition. The use of a DD
(definition) without a matching DT (term) is illegal, although a DT
without a DD can be used without dire consequences.
All clear now?
Simply put, a character
reference and an entity
reference are ways to represent information that might otherwise
be interpreted as a markup tag. For instance, in order to represent
<P> in this text, I had to use <P> in
my raw HTML. There are currently five
entities for this purpose in HTML, as well as several entities
which allow encoding of the ISO
Latin-1 Character Set.
The most common error in the use of references is to leave off the
trailing semicolon. Also, no additional spaces are needed before or
after the entity/character reference.
Another misunderstood aspect of HTML is in the composition of
URL's.
One grey area involves references to directories. It is
possible to request an index of a directory from an HTTP server. The
typical response from the server is to either return a pregenerated
index document (which is often the document "index.html" in the
referenced directory), or to construct an HTML document on the fly
which contains a listing of all files in the directory. However, when
making such a directory reference, it is important to make sure to
have a trailing slash on the URL. That is, if you were to
request the index of the directory which this document resides in, you
would want to refer to it as
http://www.willamette.edu/html-composition/, not as
http://www.willamette.edu/html-composition.
Some servers are able to catch these errors, and provide
redirection to the proper URL, but it's best to get the URL right in
the first place -- notably because not all browsers support
transparent redirection.
Problems can arise when the hostnames in URLs aren't fully qualified
In local networks, you can usually refer to your own machines simply
by their names -- for instance, here at Willamette we refer to our
local WWW server as "www". However, the server's FQDN (fully qualified
domain name) is "www.willamette.edu". The FQDN provides enough
information that any host, anywhere on the Internet, can find this
particular machine. (It's like trying to find all the Vermeers in New
York :).
What happens is that an HTML might construct a link that looks like
this:
<A HREF="http://www/~jtilton/metanoia/">Metanoia -- A
Change In Spirit>
Transfer interrupted!