There was once a time when HTML was king. Browser manufacturers moved hastily to shove it full of features as quickly as web developers demanded them. Unfortunately, these features often fell outside the original purview of HTML, and in many cases they were carried out in proprietary ways. Beyond the well-known problems of interoperability among browsers that bedeviled web pages for many years, the pumping up of HTML seduced web developers into relying on it as more than just a way to describe what a page contained. They began to use it for how parts of a page should look and behave.
In large web applications, the use of HTML for such commingled responsibilities creates a tangled mess that prevents you from being nimble in structuring your site or guiding visitors through it. Conceptually, this is because doing layout in HTML obscures a page’s information architecture, a model or concept of data that makes the data more readily understandable and digestible in a variety of contexts. When the information architecture of a large web application is not clear, it adversely affects reusability, maintainability, and reliability. Well-constructed HTML does not obscure information architecture, but instead reflects it.
Tenet 3: Large-scale HTML is semantic, devoid of presentation elements other than those inherent in the information architecture, and pluggable into a wide variety of contexts in the form of easily identifiable sections.
This chapter addresses Tenet 3, restated here from the complete list of tenets for developing large web applications provided in Chapter 1. This chapter begins by looking at the HTML for a simple module and presenting alternative examples that reflect its information architecture from worst to best. Next, we’ll examine a detailed list of reasons why a good information architecture for a module is important, and expound on a set of tags to avoid along with a set of tags with good semantic value (some of which don’t see widespread use yet). Finally, we’ll look at how the rigor of XHTML (Extensible Hypertext Markup Language) is beneficial to information architecture, and explore RDFa (Resource Description Framework with Attributes) for adding further meaning to our markup. We’ll conclude with a bit about HTML 5, the latest version of HTML; however, HTML 5 is still in the working draft stage as this book goes to print, so it’s not yet supported across the major browsers.
When assembling a web page, you should make choices that increase the capability of components and portions of the page to be repurposed and reused in as wide a range of scenarios as possible. Even if you don’t plan to reuse pieces, assembling the page from smaller individual components will make it more reliable and easier to maintain. A component can be anything from a small control (e.g., a paginator) to an entire section of a page (e.g., a list of search results).
To determine the potential components on a page, you need to deconstruct the page into a reasonable set of modules. Chapter 7 presents a more unified concept of a module as an entity that includes everything you need to make a component independently functioning and cohesive. For now, let’s look at one module as an example (see Figure 3-1) and focus on how to make its HTML a good reflection of its information architecture.
Example 3-1 presents an ill-fated but not altogether uncommon attempt to create an information architecture for the New Car Reviews module. The problem in this example is that it uses HTML markup more to define the layout than to reveal the information architecture of the module.
<table> <thead> <tr> <th> <big>New Car Reviews</big> </th> <th> <small> <a href="http://...">The Car Connection</a> </small> </th> </tr> </thead> <tbody> <tr> <td colspan="2"> <p> <b>2009 Honda Accord</b> <i>(from $21,905)</i>. </p> <a href="http://.../reviews/00001/">Read the review</a> </td> </tr> <tr> <td colspan="2"> <p> <b>2009 Toyota Prius</b> <i>(from $22,000)</i>. </p> <a href="http://.../reviews/00002/">Read the review</a> </td> </tr> <tr> <td colspan="2"> <p> <b>2009 Nissan Altima</b> <i>(from $19,900)</i>. </p> <a href="http://.../reviews/00003/">Read the review</a> </td> </tr> <tr> <td colspan="2"> <form method="post" action="http://.../email/"> <p> Get our most recent reviews each month: </p> <small>Email</small><br /> <input type="text" name="nwcreveml" value="" /><br /> <p> <input type="submit" name="nwcrevsub" value="Sign Up" /> </p> </form> </td> </tr> </tbody> </table>
First, everything has been placed within a table to achieve a
presentation in which some elements appear side by side while others
appear above or below each other. In fact, the developer has gone to a
lot of trouble to create a two-column table simply to left-justify “New
Car Reviews” while right-justifying “The Car Connection” on the top
line; all the other tr
and td
elements are redundant. Naturally, the
columns in the th
header have nothing
to do with the “rows” underneath it; the table is being used simply for
visual effect.
From a standpoint of information architecture, tables are for tabular data; we’ll look at better ways to achieve a side-by-side presentation in Chapter 4. The limitations of misusing table elements will bite you if the page has to be used for new purposes, such as for display on a mobile device or providing information in a web service.
Another problem is the heavy use of the purely presentational
elements b
, i
, big
, and
small
. These describe how we want
things to appear, not what information the elements enclose. Markup like
this is also often accompanied by HTML attributes such as width
, and border
that also serve only a presentational
purpose (although I have avoided showing them here). The HTML standard
has accumulated an astounding number of such attributes, but they have
no place in large-scale HTML.
Example 3-2 presents a better attempt to reflect the information architecture for the New Car Reviews module.
<div id="nwcrev"> <span class="head"> New Car Reviews </span> <span class="name"> <a href="http://...">The Car Connection</a> </span> <div class="list"> <div class="item"> <p> <strong>2009 Honda Accord</strong> <em>(from $21,905)</em>. </p> <a href="http://.../reviews/00001/">Read the review</a> </div> <div class="item"> <p> <strong>2009 Toyota Prius</strong> <em>(from $22,000)</em>. </p> <a href="http://.../reviews/00002/">Read the review</a> </div> <div class="item"> <p> <strong>2009 Nissan Altima</strong> <em>(from $19,900)</em>. </p> <a href="http://.../reviews/00003/">Read the review</a> </div> </div> <form method="post" action="http://.../email/"> <p> Get our most recent reviews each month: </p> <span class="label">Email</span> <input type="text" id="nwcreveml" name="nwcreveml" value="" /> <p class="action"> <input type="submit" id="nwcrevsub" name="nwcrevsub" value= "Sign Up" /> </p> </form> </div>
First, we have replaced the various table-related elements with
div
and span
elements. These immediately reveal a more
systematic hierarchy that reflects the true relationship between the
elements. In addition, we have added IDs and classes with good, semantic
names that tell us a bit more about what the elements enclose. These
also provide the hooks that we will require to achieve the desired
presentation using CSS. Finally, we have changed the purely
presentational elements b
and
i
to strong
and em
because the meanings of “strong” and
“emphasis” avoid specific connotations of presentation whereas
“boldface” and “italic” do not. A good example of the fact that strong
is better than b
is that if we were to change the
presentation of the strong
element
from boldface to red in the future, we could do it in just one place
using CSS and wouldn’t have to touch the HTML at all.
In the previous section, Example 3-2 leverages the
power of CSS to overload generic div
and span
elements. But div
and span
don’t indicate that we’re delivering a
heading followed by a list, which HTML allows us do. Example 3-3, therefore,
presents the best example of how we can use HTML to reflect the
information architecture of the New Car Reviews module.
<div id="nwcrev"> <h3> New Car Reviews </h3> <cite> <a href="http://...">The Car Connection</a> </cite> <ul> <li class="beg"> <p> <strong>2009 Honda Accord</strong> <em>(from $21,905)</em>. </p> <a href="http://.../reviews/00001/">Read the review</a> </li> <li class="mid"> <p> <strong>2009 Toyota Prius</strong> <em>(from $22,000)</em>. </p> <a href="http://.../reviews/00002/">Read the review</a> </li> <li class="end"> <p> <strong>2009 Nissan Altima</strong> <em>(from $19,900)</em>. </p> <a href="http://.../reviews/00003/">Read the review</a> </li> </ul> <form method="post" action="http://.../email/"> <p> Get our most recent reviews each month: </p> <label for="nwcreveml">Email</label> <input type="text" id="nwcreveml" name="nwcreveml" value="" /> <p class="action"> <input type="submit" id="nwcrevsub" name="nwcrevsub" value= "Sign Up" /> </p> </form> </div>
First, we have replaced the span
elements at the top of the module with an
h3
element for the header and a
cite
element for the
name of the provider of the reviews. These elements carry more meaning
about your information than simply calling the elements “spans.” In
addition, we have used a ul
element and multiple
li
elements to construct the list of
reviews, since these also are a better reflection of what the elements
really contain. Finally, we have used a label
element for the
label associated with the email address input. The label
element in HTML is often overlooked, but
is an important element that conveys valuable information and enables
useful features in many browsers; its for
attribute specifies the ID of the element
to which the label pertains, in this case the input
element for the
email address.
We have also added IDs and classes to several of the elements to
add more meaning about their roles and to provide the hooks that we may
require to achieve a desired presentation using CSS. For example, we
have added beg
, mid
, and end
to the list items based on their places in
the list. The hooks are also useful in JavaScript.
One way to test how well we have reflected the information architecture for a module is to observe how a browser renders it, with none of our styles applied. Figure 3-2 shows how the New Car Reviews module of Example 3-3 is rendered by most browsers by default. As you can see, the default appearance reveals the information architecture of the module rather well.
The original developer of Example 3-1 may have rejected
the use of a ul
list because she
didn’t think the bullet-list layout style was appropriate. But all
browsers have supported powerful enough CSS for several years to let you
make a list look almost any way you want.
Example 3-4 presents the CSS that makes the New Car Reviews module look like Figure 3-1. It assumes you have first applied the browser reset and font normalization CSS presented at the end of Chapter 4.
#nwcrev { position: relative; width: 280px; padding: 9px; border: 1px #333 solid; } #nwcrev h3 { position: absolute; font: normal 123.1% arial; } #nwcrev cite { display: block; width: 280px; margin-top: 3px; font: normal 85% verdana; text-align: right; } #nwcrev cite a { color:#999; } #nwcrev ul { margin: 10px 0; } #nwcrev li.beg { margin-bottom: 8px; } #nwcrev li.mid { margin-bottom: 8px; } #nwcrev li p { text-align: left; } #nwcrev li em { color: #999; font-style: italic; } #nwcrev li a { color: #999; font: normal 85% verdana; } #nwcrev label { display: block; margin: 5px 0 2px; font: normal 85% verdana; } #nwcrev #nwcreveml { width: 274px; } #nwcrev .action { margin-top: 10px; text-align: right; }
A module whose HTML reflects its information architecture is more reusable, maintainable, and reliable because it is more descriptive and honest. The following list, adapted from research at Yahoo! by Nate Koechley et al., describes why well-constructed HTML offers these benefits, and lists a number of others. Well-constructed HTML has the following characteristics:
A module built using well-constructed HTML does not
have to be enclosed by certain other structures. It encapsulates
everything it needs within a single outer div
. As such, the module can be used
safely in a variety of contexts.
Many web developers report seeing savings of 70 to 80 percent in page weight. When Microsoft made the transformation away from tables on its home page in 2004, for example, it saw an improvement of over 72 percent. When Yahoo! made this transformation, it found a savings of around 30 percent, since many pages were already optimized. A savings of 50 percent seems to be more common.
A pure reflection of information architecture means
less for the browser to download, less to parse, and generally
fewer elements to organize in the DOM (Document Object Model),
another model of a page’s information architecture. In addition, when you avoid tables for
the primary layout of a page, the perceived time to load a page is
faster, because the browser does not need to delay rendering until
the entire table is processed (although you can also make tables
render without delay in modern browsers by setting table-layout
to fixed
via CSS).
The terseness of well-constructed HTML directly affects the browser’s creation of the page’s DOM. A DOM that is smaller and easier to map to your information architecture helps you access and manipulate the page more easily using JavaScript, which is especially important for Ajax.
Backward compatibility means that you can be reasonably certain that things you build today will work for browsers of yesterday. All browsers support a core subset of HTML fairly consistently. At the very least, a good information architecture that uses these core elements can help your applications work reasonably well in browsers that you haven’t been able to test and fully support, even if the presentation has some inconsistencies. In the worst cases, you can disable parts of the presentation more easily because it is isolated in its own layer.
Clean, concise HTML is more likely to continue working as visitors upgrade their browsers or adopt new ones.
A lighter page weight, the normal outcome of making HTML a pure reflection of the information architecture, reduces bandwidth requirements. In addition, a good information architecture makes it easy to place most JavaScript and CSS in separate files that browsers can cache. When these files are shared across many pages, the bandwidth savings across a large web application can be very significant.
Internationalization touches many part of an application. By writing clean, concise HTML, you can be confident that the changes needed to support different locales will have fewer and more predictable effects on other parts of your application.
A good information architecture lets you apply the
media
attribute using CSS to
delineate the types of media to which various presentations apply.
The two most common types of media are computer screens (browsers)
and print. A common approach is to provide one set of styles for
all media types and hide certain presentational aspects (e.g.,
large graphics, ads, etc.) for printed pages.
Search engines are primarily concerned with information architecture because it ought to describe what a page contains. The less you do to obscure your information architecture through the use of HTML for presentation, the more accessible your pages are to various search engines, which can find the key elements that your potential readers search for.
Accessibility describes the extent to which any visitor can use your website, particularly visitors with difficulty seeing or scrolling around. Assistive devices, such as screen readers, rely on good information architecture to communicate effectively about what a page contains, especially as visitors navigate a page.
A good information architecture provides more options to layer presentation across many elements in a consistent way. CSS selectors and stylesheets, which require a good information architecture to be effective, help you apply certain presentational elements to entire sets of elements as easily as to individual ones. This promotes visual consistency across a large web application.
Presentation was always a hack in HTML; CSS is the modern and powerful means of providing presentation in browsers. When you use HTML to try to do more than just reflect information architecture, you risk inconsistency in support and absent features within the major browsers, as well as information architecture and presentation that are difficult to untangle when you want to reuse modules in different contexts. In short, you lose visual control.
Better efficiency means that you can produce redesigns faster, more cheaply, and with less bug fixing. Clean, concise HTML helps you predict the ways you’ll have to change your modules as you redesign your application’s information architecture for new purposes and content.
A good information architecture degrades gracefully and is accessible to more users operating in more environments. Even today, there are still plenty of people around the world accessing sites from old browsers. With a good information architecture in place, upon which you can layer other capabilities, your site has more reach, even if in a slightly degraded form. When your HTML tries to do too much, you risk your site appearing completely broken and unreadable.
Taken together, the characteristics of well-constructed HTML presented in this section make large web applications easier to use, easier to maintain, and more accessible to a wide variety of visitors. This gives you an advantage in the market.
Although all web developers are thoroughly familiar with HTML, many of us can benefit from a simple review of tags that offer sound, semantic descriptions for what they enclose. When we think about large-scale HTML primarily as a means of describing a module’s information architecture, it becomes clear that there are some tags that we should avoid as well.
Table 3-1 presents a list
of HTML tags to avoid; these tags are generally either presentational in
nature or deprecated. We could present a number of tag attributes as
well (e.g., bgcolor
, border
, etc.), but that list would be very
long indeed, given the formatting options that have made their way into
HTML over the years. Suffice it to say that if you find yourself
considering any attribute with more presentational implications than
uses for information architecture, you can be sure there is a better
option in CSS. To learn more about any of the tags in Table 3-1, look at the index of detailed
descriptions provided by the W3C at http://www.w3.org/TR/html401/index/elements.html.
Tag | Explanation |
| Presentational. Use |
| Deprecated. |
| Presentational. Use CSS instead. |
| Deprecated. |
| Deprecated. |
| Deprecated. |
| Presentational. Use CSS instead. |
| Presentational. Use |
| Deprecated. |
| Deprecated. |
| Deprecated. |
| Presentational. Use CSS instead. |
| Deprecated. Use |
| Presentational. Use CSS instead. |
| Deprecated. |
Table 3-2 presents a list
of HTML tags that offer sound, semantic descriptions about what they
enclose along with brief explanations of where to apply them. Examples
of useful tags that tend to be forgotten include label
, cite
, dl
,
dt
, and dd
. By making good use of meanings that HTML
provides for tags intrinsically, you can take advantage of the markup
itself to provide meaning where you may have otherwise needed to use a
class name. A rich use of tags also frequently can give you the uniquely
identifiable hooks on which to hang your CSS. To learn more about the
tags in Table 3-2, look at the index of
descriptions provided by the W3C at http://www.w3.org/TR/html401/index/elements.html.
Tag | Explanation |
| Anchor for linking to another page or a point within a page. |
| Abbreviations of any type. If an
acronym, use the |
| Abbreviations that are acronyms (i.e., pronounceable words). |
| Address information about the document author or company. |
| Client-side image map area. |
| Document base URI (Uniform Resource Identifier). Relative URIs are resolved from this point. |
| Long quotation. |
| Document body. |
| Line break. Avoid for presentation; may be a semantic separator. |
| Button in a form. |
| Caption for a table. |
| Citation or reference to a source. |
| Computer code fragment. |
| Column for a table. Allows attribute specification for columns. |
| Explicit grouping of columns for a table. |
| Definition description within a
|
| Deleted text with respect to another version of the document. |
| Defining instance of a term or phrase. |
| Generic, block-level container. Be more specific if possible. |
| List of definitions constructed using
|
| Definition term within a |
| Emphasis. Rendered by default as italics, but do not treat as such. |
| Group of form fields. Give a title
with a |
| Form for data entered by the user via
|
| Document head section. Use exactly one per document. |
| Document root element. Use exactly one per document. |
| Section headers. Ideally, nest in
order from |
| Inline subwindow containing its own document. |
| Embedded image. |
| Input for a form, which can be of many different types. |
| Inserted text with respect to another version of the document. |
| Literal text to be typed by the user. |
| Label for a form input. Every input
should have one with |
| Gives a title to a |
| List item within an |
| Link conveying relationship information, typically for stylesheets. |
| Client-side image map. |
| Metadata for the document, typically keywords, a description, etc. |
| Alternate content for nonscript rendering. |
| Embedded object that may be rendered by an external application. |
| Ordered list of items constructed
using |
| Group of options within a |
| Selectable option within a |
| Paragraph. |
| Named property value. |
| Preformatted text (via indentation). Can be overridden in CSS. |
| Short quotation. |
| Sample text. |
| Text to be interpreted as a script for dynamic control in a browser. |
| Selection list in a form. |
| Generic, inline container. Often, a more specific tag is better. |
| Strong text. Rendered by default as bold, but do not treat as such. |
| Embedded CSS. |
| Think of this in a semantic way. Avoid using it just to lower text. |
| Think of this in a semantic way. Avoid using it just to raise text. |
| Table for data that is truly tabular in nature. Avoid for layout. |
| Body, or main content, of a table. |
| Table data cell (cells other than those in the header of the table). |
| Text input with multiple lines. |
| Footer for a table. |
| Table header cell. |
| Header for a table. |
| Document title. |
| Row within a table. |
| Unordered list of items constructed
using |
|
You can assign any element in HTML an ID using the id
attribute and a
class using the class
attribute.
Because you should use an ID only once per page, they provide a great way to create
a unique scope of sorts for working with a module. For instance,
refer to the outermost div
element
with the ID of nwcrev
for the New Car
Reviews module in Example 3-3. By giving this
module an ID, you can focus specific CSS on that module, as shown in
Example 3-4, and easily
access the module’s elements in the DOM via JavaScript (see Chapter 5). Classes let you do the same for
collections of semantically similar elements all at once. This is
because classes are intended to be used on a page any number of
times.
Don’t confuse the id
attribute
with the name
attribute. In
various form inputs, the name
attribute lets you give names to input values; these names are passed
along with the values for scripts on the server side.
There are a lot of opinions about naming conventions for
IDs, classes, and names, but everyone can agree that establishing some
sort of convention is important. In large-scale HTML, a good naming
convention is key to modularity. One convention, demonstrated earlier in
Example 3-3, is to use
short groups of three to six characters for naming (e.g., nwcrev
is the ID for the New Car Reviews
module). From here, you can append other name segments of three or four
characters to create further qualified names for use deeper within the
module (e.g., nwcreveml
for the
id
and name
attributes of the email address text
field).
Using fully qualified names like this promotes modularity because you can be assured that anywhere you use this module, its names will not conflict with those used by other modules. For example, if you were to place the New Car Reviews module on a page with another module that also contained a similar form input field for an email address, this naming convention would ensure that the inputs of the two modules would be passed to the server-side script with different names.
Because using short, augmentable name segments is compact and works well, it’s the convention that we employ throughout this book. That said, the exact convention is not what is important here; whatever conventions you prefer, establishing a system of unique qualification that ensures modularity is the key.
For quite some time, HTML has implied HTML 4.01, but browsers have been very forgiving of code that did not meet precisely with this specification. In fact, many egregious transgressions are politely rendered by the browsers in a reasonably elegant way. That said, this forgiving attitude by browsers has been a double-edged sword. On the one hand, it plays an essential role in ensuring that older documents can survive on the Web with little or no modification. On the other hand, it gives web developers a lot of room to be sloppy. XHTML establishes a more rigorous definition of HTML that formally helps web developers alleviate some of this sloppiness.
XHTML 1.0, the latest version of XHTML from the W3C to advance past the working draft stage, is a reformulation of HTML 4.01 in XML 1.0. This reformulation provides additional rigor and formality that earlier versions of HTML were never intended to have. Because XHTML conforms to XML, it offers web developers several benefits. First and foremost, XHTML’s strictness results in cleaner, more consistent code that promotes better maintainability and reliability. Next, XHTML is readily viewed, edited, and validated with standard XML tools. In addition, XHTML can utilize applications that rely upon either the HTML DOM or the XML DOM. Finally, XHTML is more likely to interoperate within various XHTML environments in the future should XHTML continue to advance. Since XHTML can be written to operate in older browsers as well as in XHTML-conforming browsers, there are few reasons not to start writing HTML using this higher standard.
Fortunately, it is relatively easy to make the HTML that we write conform to the higher standards of XHTML. The examples of HTML in this chapter, as well as in the rest of the book, are actually XHTML, for the most part. Most HTML is compatible with XHTML, but there are a few guidelines that you need follow to ensure your code conforms to XHTML while continuing to render properly in older and XHTML-conforming browsers alike. A list of these guidelines is presented below.
In XHTML, tags must be nested in such a way that tags are closed in the exact reverse order that they were opened. For example, Example 3-3 contains the following, where the tags are properly nested:
<!-- Yes, XHTML --> <strong>2009 Nissan Altima</strong> <em>(from $19,900)</em>.
Consider, in contrast, the following example, where the strong
tag is closed before the em
tag. This does not conform to
XHTML:
<!-- Not XHTML! --> <strong>2009 Nissan Altima<em> (from $19,900)</strong></em>.
In XHTML, every tag must have a corresponding end tag. In HTML, web developers frequently leave off closing tags for elements such as list items and paragraphs because browsers can infer where these tags should be closed. In XHTML, you must provide the end tags explicitly. Example 3-3 includes the following text, where we have correctly closed all list items:
<!-- Yes, XHTML --> <li class="mid"> <p> <strong>2009 Toyota Prius</strong> <em>(from $22,000)</em>. </p> <a href="http://.../reviews/00002/">Read the review</a> </li> <li class="end"> <p> <strong>2009 Nissan Altima</strong> <em>(from $19,900)</em>. </p> <a href="http://.../reviews/00003/">Read the review</a> </li>
Contrast that with the following example, where there are no end tags for the list items. This does not conform to XHTML:
<!-- Not XHTML!--> <li class="mid"> <p> <strong>2009 Toyota Prius</strong> <em>(from $22,000)</em>. </p> <a href="http://.../reviews/00002/">Read the review</a> <li class="end"> <p> <strong>2009 Nissan Altima</strong> <em>(from $19,900)</em>. </p> <a href="http://.../reviews/00003/">Read the review</a>
The requirement for every tag to have a corresponding end tag
can make tags like br
rather
tedious to use; you would need to use <br></br>
in XHTML wherever you
had been using <br>
in HTML.
Fortunately, there is a shorthand for tags that enclose no content: include a
forward slash before the closing bracket. Although XHTML allows a
construct such as <br/>
to
accomplish this, it is advisable to put a space between the tag and
the forward slash, like <br
/>
, to protect against compatibility problems in HTML
browsers.
In Example 3-3,
we use an input
tag (which always
has no content), correctly terminated with a space and a forward
slash:
<!-- Yes, XHTML --> <input type="submit" id="nwcrevsub" name="nwcrevsub" value="Sign Up" />
Contrast this with the following example, where there is no forward slash before the closing bracket. This does not conform to XHTML:
<!-- Not XHTML! --> <input type="submit" id="nwcrevsub" name="nwcrevsub" value="Sign Up">
Using the shorthand notation can be a handy way to denote
empty content for any tag. For example, if you had an
empty paragraph, you could write <p
/>
instead of writing <p></p>
. The following HTML tags
appear in Table 3-2 and never have
content:
<area />
<base />
<br />
<col />
<img />
<input />
<link />
<meta />
<param />
In XHTML, every tag and tag attribute is case-sensitive and
defined in lowercase. In Example 3-3, we have the
following for the label
tag, where
we see lowercase for the tag and its for
attribute:
<!-- Yes, XHTML --> <label for="nwcreveml">Email</label>
In contrast, the following example puts the tag and its attribute in uppercase. This does not conform to XHTML:
<!-- Not XHTML! --> <LABEL FOR="nwcreveml">Email</LABEL>
In XHTML, all attribute values must be quoted using double quotes. In Example 3-3, we have the following, where we used double quotes:
<!-- Yes, XHTML --> <input type="text" id="nwcreveml" name="nwcreveml" value="" />
In the following example, the attribute values use apostrophes (single quotes) or omit the quotes around the values altogether. These practices do not conform to XHTML:
<!-- Not XHTML! --> <input type=text id=nwcreveml name=nwcreveml value='' />
Furthermore, you must specify an explicit value for all
attributes that you use. This means that attributes that often are
shown without values in HTML must be assigned something in XHTML, even
though this may feel pedantic. Set these attribute values to a value
the same as the name of the attribute (e.g., checked="checked"
).
JavaScript, CSS, and the special characters that these may
contain require some special treatment in XHTML. Whereas in HTML you can wrap sections of embedded JavaScript and CSS between <!--
and -->
, XML browsers may ignore the
sections. On the other hand, if you place these sections in a CDATA
block, HTML browsers will ignore the
CDATA
contents. The ideal solution
is to link JavaScript and CSS via external files, which is a good
practice anyway. However, there may be times that you cannot do this
entirely. In these cases, your document will not conform to
XHTML.
XHTML is also sensitive to certain special characters. In XHTML,
you need to replace greater-than signs, less-than signs, and
ampersands wherever they appear in text nodes, JavaScript, and CSS
with their character entities (e.g., <
, >
, and &
or their numeric
equivalents).
As a result of these issues, many developers set their document types to the HTML 4.01 Strict DTD, even if coding to take advantage of XHTML’s benefits. This lets you continue to validate your document using HTML validators while coding to the higher XHTML standard, albeit with a few compromises for now.
Even when you have created a good information architecture for a module in HTML, there is only so much meaning that you can communicate in a standard way using the small collection of elements that HTML provides. RDFa (Resource Description Framework with Attributes) is an emerging technology for extending your HTML to provide additional meaning. It has special significance for the Semantic Web. The Semantic Web is an evolving extension of the World Wide Web in which web developers define the semantics of information and services so that the Web can understand and satisfy requests for content made by people and machines.
A key characteristic of RDFa is that it defines a standard way for web developers to annotate information further within pages that have been built for visual consumption. In this sense, RDFa attempts to unify the “human Web” (the one we see published as web pages) and the “data Web” (the one increasingly consumed by applications via web services). If we are part of the growing web community that believes that websites should be open for humans and machines to consume alike, we should consider extending the information architecture of our modules with RDFa.
Microformats were an earlier attempt to add meaning beyond what HTML was able to provide. Microformats define standard structures using HTML tags and classes to represent certain commonly occurring data structures. RDFa has much loftier and more extensible goals in mind.
RDFa is fundamentally about creating triples that consist of a subject, predicate, and object to form statements. The subject is what you are making a statement about. The predicate is the relationship that the statement defines. The object is the resource with which the subject forms a relationship. You form these triples by adding attributes to your HTML. Some attributes are already defined as part of XHTML (see Table 3-3), while others are specific to RDFa (Table 3-4).
Attribute | Explanation |
| A predicate URI used for expressing a relationship between two resources. |
| A predicate URI used for expressing a relationship between two resources in reverse. |
| An object literal used for supplying machine-readable content for a literal. |
| An object URI used for expressing the partner resource of a relationship. |
| A URI object used for expressing the partner resource of a relationship when the resource is embedded (e.g., an image). |
Attribute | Explanation |
| A URI subject used for expressing what the data is about. By default, the base URI for the document is the root URI for all statements. |
| A URI predicate used for expressing a relationship between the subject and some literal text. |
| A URI object used to express a resource that is not visible in the document. |
| A URI for expressing a literal’s datatype. The datatype is defined as part of a vocabulary. |
| A URI for expressing the type of a subject. The type is defined as part of a vocabulary. |
Because XHTML is extensible while HTML is not, RDFa has only been specified in the working draft for XHTML 1.1. Web developers can use RDFa markup inside HTML 4.01 without experiencing adverse effects in various browsers, since the designers of RDFa expected this use case. However, RDFa will not validate in HTML 4.01. RDFa attributes validate using the XHTML1.1+RDFa DTD.
RDFa statements built from a subject, predicate, and object are based on a vocabulary to help convey certain meanings. You can define a vocabulary yourself or use existing vocabularies that RDFa processors are likely to understand. One such vocabulary is the Dublin Core vocabulary. This vocabulary defines properties about common resources found in documents, such as title, creator, and subject.
Example 3-5 uses RDFa to
enhance the information architecture that we presented in Example 3-3 for the New Car
Reviews module. In Example 3-5, we have added
RDFa attributes to annotate the three new car reviews. This produces
three triples (see Table 3-5). For each
statement, the subject (a URI) is defined by the about
attribute added to each list item. The
property
attribute for each strong
element specifies dc:title
for each statement’s predicate. The
object for each statement is the literal enclosed within each strong
element itself. The value dc:title
for the predicate comes from the
Dublin Core vocabulary. To use this vocabulary, we have to define a
namespace and refer to it using the xmlns:dc
attribute, typically within a
higher-level element of the page, such as the body
element (see Example 3-6).
<div id="nwcrev"> <h3> New Car Reviews </h3> <cite> <a href="http://...">The Car Connection</a> </cite> <ul> <li class="beg" about="http://.../reviews/00001/"> <p> <strong property="dc:title">2009 Honda Accord</strong> <em>(from $21,905)</em>. </p> <a href="http://.../reviews/00001/">Read the review</a> </li> <li class="mid" about="http://.../reviews/00002/"> <p> <strong property="dc:title">2009 Toyota Prius</strong> <em>($22,000)</em>. </p> <a href="http://.../reviews/00002/">Read the review</a> </li> <li class="end" about="http://.../reviews/00003/"> <p> <strong property="dc:title">2009 Nissan Altima</strong> <em>($22.95)</em>. </p> <a href="http://.../reviews/00003/">Read the review</a> </li> </ul> <form method="post" action="http://.../email/"> <p> Get our most recent reviews each month: </p> <label for="nwcreveml">Email</label> <input type="text" id="nwcreveml" name="nwcreveml" value="" /> <p class="action"> <input type="submit" id="nwcrevsub" name="nwcrevsub" value= "Sign Up" /> </p> </form> </div>
<body xmlns:dc="http://purl.org/dc/elements/1.1/"> . . . </body>
Subject | Predicate | Object |
http://.../reviews/00001/ |
| 2009 Honda Accord |
http://.../reviews/00002/ |
| 2009 Toyota Prius |
http://.../reviews/00003/ |
| 2009 Nissan Altima |
Example 3-7
presents a further enhancement to the information architecture presented
in Example 3-3 for the
New Car Reviews module. In Example 3-7, we have annotated
the title and creator for the reviews as a whole. In addition, we have
used the content
attribute to change
the object of the triple for each review. By doing so, we can provide
something more descriptive than what appears in the markup. Altering the
content like this can be useful when you need to use different
representations of information in the human Web and the data Web (the
human Web did not require this clarification in the title, for example).
The enhancements in Example 3-7 produce five
triples (see Table 3-6).
<div id="nwcrev" about="http://.../reviews/"> <h3 property="dc:title"> New Car Reviews </h3> <cite property="dc:creator"> <a href="http://...">The Car Connection</a> </cite> <ul> <li class="beg" about="http://.../reviews/00001/"> <p> <strong property="dc:title" content="Review for 2009 Honda Accord">2009 Honda Accord</strong> <em>(from $21,905)</em>. </p> <a href="http://.../reviews/00001/">Read the review</a> </li> <li class="mid" about="http://.../reviews/00002/"> <p> <strong property="dc:title" content="Review for 2009 Toyota Prius">2009 Toyota Prius</strong> <em>(from $22,000)</em>. </p> <a href="http://.../reviews/00002/">Read the review</a> </li> <li class="end" about="http://.../reviews/00003/"> <p> <strong property="dc:title" content="Review for 2009 Nissan Altima">2009 Nissan Altima</strong> <em>(from $19,900)</em>. </p> <a href="http://.../reviews/00003/">Read the review</a> </li> </ul> <form method="post" action="http://.../email/"> <p> Get our most recent reviews each month: </p> <label for="nwcreveml">Email</label> <input type="text" id="nwcreveml" name="nwcreveml" value="" /> <p class="action"> <input type="submit" id="nwcrevsub" name="nwcrevsub" value= "Sign Up" /> </p> </form> </div>
Subject | Predicate | Object |
http://.../reviews/ |
| New Car Reviews |
http://.../reviews/ |
| The Car Connection |
http://.../reviews/00001/ |
| Review for 2009 Honda Accord |
http://.../reviews/00002/ |
| Review for 2009 Toyota Prius |
http://.../reviews/00003/ |
| Review for 2009 Nissan Altima |
In the end, both Example 3-5 and Example 3-7 provide additional
meaning for our module because they go beyond the HTML markup to add
annotations that tell a processor exactly what certain pieces of the
markup are. So, instead of having to make an assumption about what the
cite
element represents in Example 3-7 (e.g.,
The Car Connection has been cited as having
something to do with the division that encloses it), you now know
specifically that The Car Connection is the creator
of the content at http://.../reviews/, which is a
resource with the title New Car Reviews.
The value of RDFa depends on the presence of processors that do something useful with the RDFa statements in web applications. With a groundswell of interest in using RDFa data at major websites such as Yahoo! and Google, it’s possible that modern web applications will soon be expected to provide relevant annotations as a matter of course.
As mentioned previously, at the time of this book’s publication, HTML 5 is still in its working draft form, so a consistent implementation hasn’t been agreed upon. However, it’s worth keeping in mind that HTML 5, whenever it does settle down, is likely to bring with it a new set of semantic tags for creating good information architecture using HTML. Table 3-7 presents some of the structural tags being proposed.
Tag | Explanation |
| Independent piece of content of a document. |
| Content only slightly related to the rest of the page. |
| Marks up a conversation between multiple parties. |
| Associates a caption with content. |
| Groups content that typically appears at the bottom of a section. |
| Groups content that typically appears at the top of a section. |
| Groups parts of a header when it has multiple levels. |
| Section of a document intended for navigation. |
| Generic document or application section. |
HTML 5 proposes a number of other important changes to elements. Some examples of these changes include the following:
New elements for common types of data (e.g., canvas
, meter
, progress
, and time
)
New values for the type
attribute of input
elements to
support common user interface components (e.g., url
, email
, and datetime
)
New attributes for many elements
Changes in meanings for some elements and attributes to help reflect how they are used today
The removal of many elements and attributes that were deprecated in earlier HTML versions
HTML 5 also proposes a number of changes and additions to various
interfaces. For example, it introduces useful APIs (application programming interfaces) for creating web
applications. These include a drawing API for use with the canvas
element, an API for controlling audio and
video, and an API for drag and drop, among others. HTML 5 also proposes
extensions to some of the existing DOM interfaces.
Unfortunately, the lack of support for HTML 5 in the major browsers at this time makes it primarily something to keep an eye on for later. Even as it may be tempting to start to use some of the new elements in your markup, the pitfalls regarding potential inconsistencies among the major browsers in the future are still too much of a question to employ these for now. However, look forward to it in the future, because it is likely to include many features that will help you make an information architecture created in HTML more descriptive.