Chapter 3. Large-Scale HTML

There was once a time when HTML was king. Browser manufacturers moved hastily to shove it full of features as quickly as web developers demanded them. Unfortunately, these features often fell outside the original purview of HTML, and in many cases they were carried out in proprietary ways. Beyond the well-known problems of interoperability among browsers that bedeviled web pages for many years, the pumping up of HTML seduced web developers into relying on it as more than just a way to describe what a page contained. They began to use it for how parts of a page should look and behave.

In large web applications, the use of HTML for such commingled responsibilities creates a tangled mess that prevents you from being nimble in structuring your site or guiding visitors through it. Conceptually, this is because doing layout in HTML obscures a page’s information architecture, a model or concept of data that makes the data more readily understandable and digestible in a variety of contexts. When the information architecture of a large web application is not clear, it adversely affects reusability, maintainability, and reliability. Well-constructed HTML does not obscure information architecture, but instead reflects it.

Tenet 3: Large-scale HTML is semantic, devoid of presentation elements other than those inherent in the information architecture, and pluggable into a wide variety of contexts in the form of easily identifiable sections.

This chapter addresses Tenet 3, restated here from the complete list of tenets for developing large web applications provided in Chapter 1. This chapter begins by looking at the HTML for a simple module and presenting alternative examples that reflect its information architecture from worst to best. Next, we’ll examine a detailed list of reasons why a good information architecture for a module is important, and expound on a set of tags to avoid along with a set of tags with good semantic value (some of which don’t see widespread use yet). Finally, we’ll look at how the rigor of XHTML (Extensible Hypertext Markup Language) is beneficial to information architecture, and explore RDFa (Resource Description Framework with Attributes) for adding further meaning to our markup. We’ll conclude with a bit about HTML 5, the latest version of HTML; however, HTML 5 is still in the working draft stage as this book goes to print, so it’s not yet supported across the major browsers.

Modular HTML

When assembling a web page, you should make choices that increase the capability of components and portions of the page to be repurposed and reused in as wide a range of scenarios as possible. Even if you don’t plan to reuse pieces, assembling the page from smaller individual components will make it more reliable and easier to maintain. A component can be anything from a small control (e.g., a paginator) to an entire section of a page (e.g., a list of search results).

To determine the potential components on a page, you need to deconstruct the page into a reasonable set of modules. Chapter 7 presents a more unified concept of a module as an entity that includes everything you need to make a component independently functioning and cohesive. For now, let’s look at one module as an example (see Figure 3-1) and focus on how to make its HTML a good reflection of its information architecture.

The New Car Reviews module
Figure 3-1. The New Car Reviews module

A Bad Example: Using a Table and Presentation Markup

Example 3-1 presents an ill-fated but not altogether uncommon attempt to create an information architecture for the New Car Reviews module. The problem in this example is that it uses HTML markup more to define the layout than to reveal the information architecture of the module.

Example 3-1. A bad example of HTML for the New Car Reviews module
<table>
   <thead>
      <tr>
         <th>
            <big>New Car Reviews</big>
         </th>
         <th>
            <small>
               <a href="http://...">The Car Connection</a>
            </small>
         </th>
      </tr>
   </thead>
   <tbody>
      <tr>
         <td colspan="2">
            <p>
               <b>2009 Honda Accord</b> <i>(from $21,905)</i>.
            </p>
            <a href="http://.../reviews/00001/">Read the review</a>
         </td>
      </tr>
      <tr>
         <td colspan="2">
            <p>
               <b>2009 Toyota Prius</b> <i>(from $22,000)</i>.
            </p>
            <a href="http://.../reviews/00002/">Read the review</a>
         </td>
      </tr>
      <tr>
         <td colspan="2">
            <p>
               <b>2009 Nissan Altima</b> <i>(from $19,900)</i>.
            </p>
            <a href="http://.../reviews/00003/">Read the review</a>
         </td>
      </tr>
      <tr>
         <td colspan="2">
            <form method="post" action="http://.../email/">
               <p>
                  Get our most recent reviews each month:
               </p>
               <small>Email</small><br />
               <input type="text" name="nwcreveml" value="" /><br />
               <p>
                  <input type="submit" name="nwcrevsub" value="Sign
                     Up" />
               </p>
            </form>
         </td>
      </tr>
   </tbody>
</table>

First, everything has been placed within a table to achieve a presentation in which some elements appear side by side while others appear above or below each other. In fact, the developer has gone to a lot of trouble to create a two-column table simply to left-justify “New Car Reviews” while right-justifying “The Car Connection” on the top line; all the other tr and td elements are redundant. Naturally, the columns in the th header have nothing to do with the “rows” underneath it; the table is being used simply for visual effect.

From a standpoint of information architecture, tables are for tabular data; we’ll look at better ways to achieve a side-by-side presentation in Chapter 4. The limitations of misusing table elements will bite you if the page has to be used for new purposes, such as for display on a mobile device or providing information in a web service.

Another problem is the heavy use of the purely presentational elements b, i, big, and small. These describe how we want things to appear, not what information the elements enclose. Markup like this is also often accompanied by HTML attributes such as width, and border that also serve only a presentational purpose (although I have avoided showing them here). The HTML standard has accumulated an astounding number of such attributes, but they have no place in large-scale HTML.

A Better Example: Using CSS

Example 3-2 presents a better attempt to reflect the information architecture for the New Car Reviews module.

Example 3-2. A better example of HTML for the New Car Reviews module
<div id="nwcrev">
   <span class="head">
      New Car Reviews
   </span>
   <span class="name">
      <a href="http://...">The Car Connection</a>
   </span>
   <div class="list">
      <div class="item">
         <p>
            <strong>2009 Honda Accord</strong>
            <em>(from $21,905)</em>.
         </p>
         <a href="http://.../reviews/00001/">Read the review</a>
      </div>
      <div class="item">
         <p>
            <strong>2009 Toyota Prius</strong>
            <em>(from $22,000)</em>.
         </p>
         <a href="http://.../reviews/00002/">Read the review</a>
      </div>
      <div class="item">
         <p>
            <strong>2009 Nissan Altima</strong>
            <em>(from $19,900)</em>.
         </p>
         <a href="http://.../reviews/00003/">Read the review</a>
      </div>
   </div>
   <form method="post" action="http://.../email/">
      <p>
         Get our most recent reviews each month:
      </p>
      <span class="label">Email</span>
      <input type="text" id="nwcreveml" name="nwcreveml" value="" />
      <p class="action">
         <input type="submit" id="nwcrevsub" name="nwcrevsub" value=
            "Sign Up" />
      </p>
   </form>
</div>

First, we have replaced the various table-related elements with div and span elements. These immediately reveal a more systematic hierarchy that reflects the true relationship between the elements. In addition, we have added IDs and classes with good, semantic names that tell us a bit more about what the elements enclose. These also provide the hooks that we will require to achieve the desired presentation using CSS. Finally, we have changed the purely presentational elements b and i to strong and em because the meanings of “strong” and “emphasis” avoid specific connotations of presentation whereas “boldface” and “italic” do not. A good example of the fact that strong is better than b is that if we were to change the presentation of the strong element from boldface to red in the future, we could do it in just one place using CSS and wouldn’t have to touch the HTML at all.

The Best Example: Semantically Meaningful HTML

In the previous section, Example 3-2 leverages the power of CSS to overload generic div and span elements. But div and span don’t indicate that we’re delivering a heading followed by a list, which HTML allows us do. Example 3-3, therefore, presents the best example of how we can use HTML to reflect the information architecture of the New Car Reviews module.

Example 3-3. The best example of HTML for the New Car Reviews module
<div id="nwcrev">
   <h3>
      New Car Reviews
   </h3>
   <cite>
      <a href="http://...">The Car Connection</a>
   </cite>
   <ul>
      <li class="beg">
         <p>
            <strong>2009 Honda Accord</strong>
            <em>(from $21,905)</em>.
         </p>
         <a href="http://.../reviews/00001/">Read the review</a>
      </li>
      <li class="mid">
         <p>
            <strong>2009 Toyota Prius</strong>
            <em>(from $22,000)</em>.
         </p>
         <a href="http://.../reviews/00002/">Read the review</a>
      </li>
      <li class="end">
         <p>
            <strong>2009 Nissan Altima</strong>
            <em>(from $19,900)</em>.
         </p>
         <a href="http://.../reviews/00003/">Read the review</a>
      </li>
   </ul>
   <form method="post" action="http://.../email/">
      <p>
         Get our most recent reviews each month:
      </p>
      <label for="nwcreveml">Email</label>
      <input type="text" id="nwcreveml" name="nwcreveml" value="" />
      <p class="action">
         <input type="submit" id="nwcrevsub" name="nwcrevsub" value=
            "Sign Up" />
      </p>
   </form>
</div>

First, we have replaced the span elements at the top of the module with an h3 element for the header and a cite element for the name of the provider of the reviews. These elements carry more meaning about your information than simply calling the elements “spans.” In addition, we have used a ul element and multiple li elements to construct the list of reviews, since these also are a better reflection of what the elements really contain. Finally, we have used a label element for the label associated with the email address input. The label element in HTML is often overlooked, but is an important element that conveys valuable information and enables useful features in many browsers; its for attribute specifies the ID of the element to which the label pertains, in this case the input element for the email address.

We have also added IDs and classes to several of the elements to add more meaning about their roles and to provide the hooks that we may require to achieve a desired presentation using CSS. For example, we have added beg, mid, and end to the list items based on their places in the list. The hooks are also useful in JavaScript.

One way to test how well we have reflected the information architecture for a module is to observe how a browser renders it, with none of our styles applied. Figure 3-2 shows how the New Car Reviews module of Example 3-3 is rendered by most browsers by default. As you can see, the default appearance reveals the information architecture of the module rather well.

The default rendering of
Figure 3-2. The default rendering of Example 3-3

The original developer of Example 3-1 may have rejected the use of a ul list because she didn’t think the bullet-list layout style was appropriate. But all browsers have supported powerful enough CSS for several years to let you make a list look almost any way you want.

Example 3-4 presents the CSS that makes the New Car Reviews module look like Figure 3-1. It assumes you have first applied the browser reset and font normalization CSS presented at the end of Chapter 4.

Example 3-4. CSS for adding presentation to the New Car Reviews module
#nwcrev
{
   position: relative;
   width: 280px;
   padding: 9px;
   border: 1px #333 solid;
}
#nwcrev h3
{
   position: absolute;
   font: normal 123.1% arial;
}
#nwcrev cite
{
   display: block;
   width: 280px;
   margin-top: 3px;
   font: normal 85% verdana;
   text-align: right;
}
#nwcrev cite a
{
   color:#999;
}
#nwcrev ul
{
   margin: 10px 0;
}
#nwcrev li.beg
{
   margin-bottom: 8px;
}
#nwcrev li.mid
{
   margin-bottom: 8px;
}
#nwcrev li p
{
   text-align: left;
}
#nwcrev li em
{
   color: #999;
   font-style: italic;
}
#nwcrev li a
{
   color: #999;
   font: normal 85% verdana;
}
#nwcrev label
{
   display: block;
   margin: 5px 0 2px;
   font: normal 85% verdana;
}
#nwcrev #nwcreveml
{
   width: 274px;
}
#nwcrev .action
{
   margin-top: 10px;
   text-align: right;
}

Benefits of Good HTML

A module whose HTML reflects its information architecture is more reusable, maintainable, and reliable because it is more descriptive and honest. The following list, adapted from research at Yahoo! by Nate Koechley et al., describes why well-constructed HTML offers these benefits, and lists a number of others. Well-constructed HTML has the following characteristics:

Modularity

A module built using well-constructed HTML does not have to be enclosed by certain other structures. It encapsulates everything it needs within a single outer div. As such, the module can be used safely in a variety of contexts.

Lighter

Many web developers report seeing savings of 70 to 80 percent in page weight. When Microsoft made the transformation away from tables on its home page in 2004, for example, it saw an improvement of over 72 percent. When Yahoo! made this transformation, it found a savings of around 30 percent, since many pages were already optimized. A savings of 50 percent seems to be more common.

Faster rendering

A pure reflection of information architecture means less for the browser to download, less to parse, and generally fewer elements to organize in the DOM (Document Object Model), another model of a page’s information architecture. In addition, when you avoid tables for the primary layout of a page, the perceived time to load a page is faster, because the browser does not need to delay rendering until the entire table is processed (although you can also make tables render without delay in modern browsers by setting table-layout to fixed via CSS).

Support for Ajax development

The terseness of well-constructed HTML directly affects the browser’s creation of the page’s DOM. A DOM that is smaller and easier to map to your information architecture helps you access and manipulate the page more easily using JavaScript, which is especially important for Ajax.

Backward compatibility

Backward compatibility means that you can be reasonably certain that things you build today will work for browsers of yesterday. All browsers support a core subset of HTML fairly consistently. At the very least, a good information architecture that uses these core elements can help your applications work reasonably well in browsers that you haven’t been able to test and fully support, even if the presentation has some inconsistencies. In the worst cases, you can disable parts of the presentation more easily because it is isolated in its own layer.

Forward compatibility

Clean, concise HTML is more likely to continue working as visitors upgrade their browsers or adopt new ones.

Reduced bandwidth requirements

A lighter page weight, the normal outcome of making HTML a pure reflection of the information architecture, reduces bandwidth requirements. In addition, a good information architecture makes it easy to place most JavaScript and CSS in separate files that browsers can cache. When these files are shared across many pages, the bandwidth savings across a large web application can be very significant.

Better internationalization support

Internationalization touches many part of an application. By writing clean, concise HTML, you can be confident that the changes needed to support different locales will have fewer and more predictable effects on other parts of your application.

Support for multiple types of media

A good information architecture lets you apply the media attribute using CSS to delineate the types of media to which various presentations apply. The two most common types of media are computer screens (browsers) and print. A common approach is to provide one set of styles for all media types and hide certain presentational aspects (e.g., large graphics, ads, etc.) for printed pages.

Better search engine optimization

Search engines are primarily concerned with information architecture because it ought to describe what a page contains. The less you do to obscure your information architecture through the use of HTML for presentation, the more accessible your pages are to various search engines, which can find the key elements that your potential readers search for.

Better accessibility

Accessibility describes the extent to which any visitor can use your website, particularly visitors with difficulty seeing or scrolling around. Assistive devices, such as screen readers, rely on good information architecture to communicate effectively about what a page contains, especially as visitors navigate a page.

Visual consistency

A good information architecture provides more options to layer presentation across many elements in a consistent way. CSS selectors and stylesheets, which require a good information architecture to be effective, help you apply certain presentational elements to entire sets of elements as easily as to individual ones. This promotes visual consistency across a large web application.

Precise visual control

Presentation was always a hack in HTML; CSS is the modern and powerful means of providing presentation in browsers. When you use HTML to try to do more than just reflect information architecture, you risk inconsistency in support and absent features within the major browsers, as well as information architecture and presentation that are difficult to untangle when you want to reuse modules in different contexts. In short, you lose visual control.

More efficient redesigns

Better efficiency means that you can produce redesigns faster, more cheaply, and with less bug fixing. Clean, concise HTML helps you predict the ways you’ll have to change your modules as you redesign your application’s information architecture for new purposes and content.

Expanded audience

A good information architecture degrades gracefully and is accessible to more users operating in more environments. Even today, there are still plenty of people around the world accessing sites from old browsers. With a good information architecture in place, upon which you can layer other capabilities, your site has more reach, even if in a slightly degraded form. When your HTML tries to do too much, you risk your site appearing completely broken and unreadable.

A competitive edge

Taken together, the characteristics of well-constructed HTML presented in this section make large web applications easier to use, easier to maintain, and more accessible to a wide variety of visitors. This gives you an advantage in the market.

HTML Tags

Although all web developers are thoroughly familiar with HTML, many of us can benefit from a simple review of tags that offer sound, semantic descriptions for what they enclose. When we think about large-scale HTML primarily as a means of describing a module’s information architecture, it becomes clear that there are some tags that we should avoid as well.

Bad HTML Tags

Table 3-1 presents a list of HTML tags to avoid; these tags are generally either presentational in nature or deprecated. We could present a number of tag attributes as well (e.g., bgcolor, border, etc.), but that list would be very long indeed, given the formatting options that have made their way into HTML over the years. Suffice it to say that if you find yourself considering any attribute with more presentational implications than uses for information architecture, you can be sure there is a better option in CSS. To learn more about any of the tags in Table 3-1, look at the index of detailed descriptions provided by the W3C at http://www.w3.org/TR/html401/index/elements.html.

Table 3-1. Bad HTML tags

Tag

Explanation

b

Presentational. Use strong instead.

basefont

Deprecated.

big

Presentational. Use CSS instead.

center

Deprecated.

dir

Deprecated.

font

Deprecated.

hr

Presentational. Use CSS instead.

i

Presentational. Use em instead.

isindex

Deprecated.

menu

Deprecated.

s

Deprecated.

small

Presentational. Use CSS instead.

strike

Deprecated. Use del instead.

tt

Presentational. Use CSS instead.

u

Deprecated.

Good HTML Tags

Table 3-2 presents a list of HTML tags that offer sound, semantic descriptions about what they enclose along with brief explanations of where to apply them. Examples of useful tags that tend to be forgotten include label, cite, dl, dt, and dd. By making good use of meanings that HTML provides for tags intrinsically, you can take advantage of the markup itself to provide meaning where you may have otherwise needed to use a class name. A rich use of tags also frequently can give you the uniquely identifiable hooks on which to hang your CSS. To learn more about the tags in Table 3-2, look at the index of descriptions provided by the W3C at http://www.w3.org/TR/html401/index/elements.html.

Table 3-2. Good HTML tags

Tag

Explanation

a

Anchor for linking to another page or a point within a page.

abbr

Abbreviations of any type. If an acronym, use the acronym tag.

acronym

Abbreviations that are acronyms (i.e., pronounceable words).

address

Address information about the document author or company.

area

Client-side image map area.

base

Document base URI (Uniform Resource Identifier). Relative URIs are resolved from this point.

blockquote

Long quotation.

body

Document body.

br

Line break. Avoid for presentation; may be a semantic separator.

button

Button in a form.

caption

Caption for a table.

cite

Citation or reference to a source.

code

Computer code fragment.

col

Column for a table. Allows attribute specification for columns.

colgroup

Explicit grouping of columns for a table.

dd

Definition description within a dl tag.

del

Deleted text with respect to another version of the document.

dfn

Defining instance of a term or phrase.

div

Generic, block-level container. Be more specific if possible.

dl

List of definitions constructed using dt and dd tags.

dt

Definition term within a dl tag.

em

Emphasis. Rendered by default as italics, but do not treat as such.

fieldset

Group of form fields. Give a title with a legend tag.

form

Form for data entered by the user via input, select, etc. tags.

head

Document head section. Use exactly one per document.

html

Document root element. Use exactly one per document.

h1 ... h6

Section headers. Ideally, nest in order from h1 to h6 with one h1.

iframe

Inline subwindow containing its own document.

img

Embedded image.

input

Input for a form, which can be of many different types.

ins

Inserted text with respect to another version of the document.

kbd

Literal text to be typed by the user.

label

Label for a form input. Every input should have one with for set.

legend

Gives a title to a fieldset tag.

li

List item within an ol or ul tag.

link

Link conveying relationship information, typically for stylesheets.

map

Client-side image map.

meta

Metadata for the document, typically keywords, a description, etc.

noscript

Alternate content for nonscript rendering.

object

Embedded object that may be rendered by an external application.

ol

Ordered list of items constructed using li tags.

optgroup

Group of options within a select tag.

option

Selectable option within a select tag.

p

Paragraph.

param

Named property value.

pre

Preformatted text (via indentation). Can be overridden in CSS.

q

Short quotation.

samp

Sample text.

script

Text to be interpreted as a script for dynamic control in a browser.

select

Selection list in a form.

span

Generic, inline container. Often, a more specific tag is better.

strong

Strong text. Rendered by default as bold, but do not treat as such.

style

Embedded CSS.

sub

Think of this in a semantic way. Avoid using it just to lower text.

sup

Think of this in a semantic way. Avoid using it just to raise text.

table

Table for data that is truly tabular in nature. Avoid for layout.

tbody

Body, or main content, of a table.

td

Table data cell (cells other than those in the header of the table).

textarea

Text input with multiple lines.

tfoot

Footer for a table.

th

Table header cell.

thead

Header for a table.

title

Document title.

tr

Row within a table.

ul

Unordered list of items constructed using li tags.

var

Instance of a program argument or variable.

IDs, Classes, and Names

You can assign any element in HTML an ID using the id attribute and a class using the class attribute. Because you should use an ID only once per page, they provide a great way to create a unique scope of sorts for working with a module. For instance, refer to the outermost div element with the ID of nwcrev for the New Car Reviews module in Example 3-3. By giving this module an ID, you can focus specific CSS on that module, as shown in Example 3-4, and easily access the module’s elements in the DOM via JavaScript (see Chapter 5). Classes let you do the same for collections of semantically similar elements all at once. This is because classes are intended to be used on a page any number of times.

Don’t confuse the id attribute with the name attribute. In various form inputs, the name attribute lets you give names to input values; these names are passed along with the values for scripts on the server side.

Conventions for Naming

There are a lot of opinions about naming conventions for IDs, classes, and names, but everyone can agree that establishing some sort of convention is important. In large-scale HTML, a good naming convention is key to modularity. One convention, demonstrated earlier in Example 3-3, is to use short groups of three to six characters for naming (e.g., nwcrev is the ID for the New Car Reviews module). From here, you can append other name segments of three or four characters to create further qualified names for use deeper within the module (e.g., nwcreveml for the id and name attributes of the email address text field).

Using fully qualified names like this promotes modularity because you can be assured that anywhere you use this module, its names will not conflict with those used by other modules. For example, if you were to place the New Car Reviews module on a page with another module that also contained a similar form input field for an email address, this naming convention would ensure that the inputs of the two modules would be passed to the server-side script with different names.

Because using short, augmentable name segments is compact and works well, it’s the convention that we employ throughout this book. That said, the exact convention is not what is important here; whatever conventions you prefer, establishing a system of unique qualification that ensures modularity is the key.

XHTML

For quite some time, HTML has implied HTML 4.01, but browsers have been very forgiving of code that did not meet precisely with this specification. In fact, many egregious transgressions are politely rendered by the browsers in a reasonably elegant way. That said, this forgiving attitude by browsers has been a double-edged sword. On the one hand, it plays an essential role in ensuring that older documents can survive on the Web with little or no modification. On the other hand, it gives web developers a lot of room to be sloppy. XHTML establishes a more rigorous definition of HTML that formally helps web developers alleviate some of this sloppiness.

Benefits of XHTML

XHTML 1.0, the latest version of XHTML from the W3C to advance past the working draft stage, is a reformulation of HTML 4.01 in XML 1.0. This reformulation provides additional rigor and formality that earlier versions of HTML were never intended to have. Because XHTML conforms to XML, it offers web developers several benefits. First and foremost, XHTML’s strictness results in cleaner, more consistent code that promotes better maintainability and reliability. Next, XHTML is readily viewed, edited, and validated with standard XML tools. In addition, XHTML can utilize applications that rely upon either the HTML DOM or the XML DOM. Finally, XHTML is more likely to interoperate within various XHTML environments in the future should XHTML continue to advance. Since XHTML can be written to operate in older browsers as well as in XHTML-conforming browsers, there are few reasons not to start writing HTML using this higher standard.

XHTML Guidelines

Fortunately, it is relatively easy to make the HTML that we write conform to the higher standards of XHTML. The examples of HTML in this chapter, as well as in the rest of the book, are actually XHTML, for the most part. Most HTML is compatible with XHTML, but there are a few guidelines that you need follow to ensure your code conforms to XHTML while continuing to render properly in older and XHTML-conforming browsers alike. A list of these guidelines is presented below.

Proper nesting of tags

In XHTML, tags must be nested in such a way that tags are closed in the exact reverse order that they were opened. For example, Example 3-3 contains the following, where the tags are properly nested:

<!-- Yes, XHTML -->
<strong>2009 Nissan Altima</strong>
<em>(from $19,900)</em>.

Consider, in contrast, the following example, where the strong tag is closed before the em tag. This does not conform to XHTML:

<!-- Not XHTML! -->
<strong>2009 Nissan Altima<em>
(from $19,900)</strong></em>.

End tags and empty tags

In XHTML, every tag must have a corresponding end tag. In HTML, web developers frequently leave off closing tags for elements such as list items and paragraphs because browsers can infer where these tags should be closed. In XHTML, you must provide the end tags explicitly. Example 3-3 includes the following text, where we have correctly closed all list items:

<!-- Yes, XHTML -->
<li class="mid">
   <p>
      <strong>2009 Toyota Prius</strong>
      <em>(from $22,000)</em>.
   </p>
   <a href="http://.../reviews/00002/">Read the review</a>
</li>
<li class="end">
   <p>
      <strong>2009 Nissan Altima</strong>
      <em>(from $19,900)</em>.
   </p>
   <a href="http://.../reviews/00003/">Read the review</a>
</li>

Contrast that with the following example, where there are no end tags for the list items. This does not conform to XHTML:

<!-- Not XHTML!-->
<li class="mid">
   <p>
      <strong>2009 Toyota Prius</strong>
      <em>(from $22,000)</em>.
   </p>
   <a href="http://.../reviews/00002/">Read the review</a>
<li class="end">
   <p>
      <strong>2009 Nissan Altima</strong>
      <em>(from $19,900)</em>.
   </p>
   <a href="http://.../reviews/00003/">Read the review</a>

The requirement for every tag to have a corresponding end tag can make tags like br rather tedious to use; you would need to use <br></br> in XHTML wherever you had been using <br> in HTML. Fortunately, there is a shorthand for tags that enclose no content: include a forward slash before the closing bracket. Although XHTML allows a construct such as <br/> to accomplish this, it is advisable to put a space between the tag and the forward slash, like <br />, to protect against compatibility problems in HTML browsers.

In Example 3-3, we use an input tag (which always has no content), correctly terminated with a space and a forward slash:

<!-- Yes, XHTML -->
<input type="submit" id="nwcrevsub" name="nwcrevsub" value="Sign Up" />

Contrast this with the following example, where there is no forward slash before the closing bracket. This does not conform to XHTML:

<!-- Not XHTML! -->
<input type="submit" id="nwcrevsub" name="nwcrevsub" value="Sign Up">

Using the shorthand notation can be a handy way to denote empty content for any tag. For example, if you had an empty paragraph, you could write <p /> instead of writing <p></p>. The following HTML tags appear in Table 3-2 and never have content:

  • <area />

  • <base />

  • <br />

  • <col />

  • <img />

  • <input />

  • <link />

  • <meta />

  • <param />

Case sensitivity

In XHTML, every tag and tag attribute is case-sensitive and defined in lowercase. In Example 3-3, we have the following for the label tag, where we see lowercase for the tag and its for attribute:

<!-- Yes, XHTML -->
<label for="nwcreveml">Email</label>

In contrast, the following example puts the tag and its attribute in uppercase. This does not conform to XHTML:

<!-- Not XHTML! -->
<LABEL FOR="nwcreveml">Email</LABEL>

Attribute values

In XHTML, all attribute values must be quoted using double quotes. In Example 3-3, we have the following, where we used double quotes:

<!-- Yes, XHTML -->
<input type="text" id="nwcreveml" name="nwcreveml" value="" />

In the following example, the attribute values use apostrophes (single quotes) or omit the quotes around the values altogether. These practices do not conform to XHTML:

<!-- Not XHTML! -->
<input type=text id=nwcreveml name=nwcreveml value='' />

Furthermore, you must specify an explicit value for all attributes that you use. This means that attributes that often are shown without values in HTML must be assigned something in XHTML, even though this may feel pedantic. Set these attribute values to a value the same as the name of the attribute (e.g., checked="checked").

JavaScript, CSS, and special characters

JavaScript, CSS, and the special characters that these may contain require some special treatment in XHTML. Whereas in HTML you can wrap sections of embedded JavaScript and CSS between <!-- and -->, XML browsers may ignore the sections. On the other hand, if you place these sections in a CDATA block, HTML browsers will ignore the CDATA contents. The ideal solution is to link JavaScript and CSS via external files, which is a good practice anyway. However, there may be times that you cannot do this entirely. In these cases, your document will not conform to XHTML.

XHTML is also sensitive to certain special characters. In XHTML, you need to replace greater-than signs, less-than signs, and ampersands wherever they appear in text nodes, JavaScript, and CSS with their character entities (e.g., &lt;, &gt;, and &amp; or their numeric equivalents).

As a result of these issues, many developers set their document types to the HTML 4.01 Strict DTD, even if coding to take advantage of XHTML’s benefits. This lets you continue to validate your document using HTML validators while coding to the higher XHTML standard, albeit with a few compromises for now.

RDFa

Even when you have created a good information architecture for a module in HTML, there is only so much meaning that you can communicate in a standard way using the small collection of elements that HTML provides. RDFa (Resource Description Framework with Attributes) is an emerging technology for extending your HTML to provide additional meaning. It has special significance for the Semantic Web. The Semantic Web is an evolving extension of the World Wide Web in which web developers define the semantics of information and services so that the Web can understand and satisfy requests for content made by people and machines.

A key characteristic of RDFa is that it defines a standard way for web developers to annotate information further within pages that have been built for visual consumption. In this sense, RDFa attempts to unify the “human Web” (the one we see published as web pages) and the “data Web” (the one increasingly consumed by applications via web services). If we are part of the growing web community that believes that websites should be open for humans and machines to consume alike, we should consider extending the information architecture of our modules with RDFa.

Note

Microformats were an earlier attempt to add meaning beyond what HTML was able to provide. Microformats define standard structures using HTML tags and classes to represent certain commonly occurring data structures. RDFa has much loftier and more extensible goals in mind.

RDFa Triples

RDFa is fundamentally about creating triples that consist of a subject, predicate, and object to form statements. The subject is what you are making a statement about. The predicate is the relationship that the statement defines. The object is the resource with which the subject forms a relationship. You form these triples by adding attributes to your HTML. Some attributes are already defined as part of XHTML (see Table 3-3), while others are specific to RDFa (Table 3-4).

Table 3-3. XHTML attributes relevant to RDFa

Attribute

Explanation

rel

A predicate URI used for expressing a relationship between two resources.

rev

A predicate URI used for expressing a relationship between two resources in reverse.

content

An object literal used for supplying machine-readable content for a literal.

href

An object URI used for expressing the partner resource of a relationship.

src

A URI object used for expressing the partner resource of a relationship when the resource is embedded (e.g., an image).

Table 3-4. Attributes specific to RDFa

Attribute

Explanation

about

A URI subject used for expressing what the data is about. By default, the base URI for the document is the root URI for all statements.

property

A URI predicate used for expressing a relationship between the subject and some literal text.

resource

A URI object used to express a resource that is not visible in the document.

datatype

A URI for expressing a literal’s datatype. The datatype is defined as part of a vocabulary.

typeof

A URI for expressing the type of a subject. The type is defined as part of a vocabulary.

Because XHTML is extensible while HTML is not, RDFa has only been specified in the working draft for XHTML 1.1. Web developers can use RDFa markup inside HTML 4.01 without experiencing adverse effects in various browsers, since the designers of RDFa expected this use case. However, RDFa will not validate in HTML 4.01. RDFa attributes validate using the XHTML1.1+RDFa DTD.

RDFa statements built from a subject, predicate, and object are based on a vocabulary to help convey certain meanings. You can define a vocabulary yourself or use existing vocabularies that RDFa processors are likely to understand. One such vocabulary is the Dublin Core vocabulary. This vocabulary defines properties about common resources found in documents, such as title, creator, and subject.

Applying RDFa

Example 3-5 uses RDFa to enhance the information architecture that we presented in Example 3-3 for the New Car Reviews module. In Example 3-5, we have added RDFa attributes to annotate the three new car reviews. This produces three triples (see Table 3-5). For each statement, the subject (a URI) is defined by the about attribute added to each list item. The property attribute for each strong element specifies dc:title for each statement’s predicate. The object for each statement is the literal enclosed within each strong element itself. The value dc:title for the predicate comes from the Dublin Core vocabulary. To use this vocabulary, we have to define a namespace and refer to it using the xmlns:dc attribute, typically within a higher-level element of the page, such as the body element (see Example 3-6).

Example 3-5. The New Car Reviews module annotated using RDFa
<div id="nwcrev">
   <h3>
      New Car Reviews
   </h3>
   <cite>
      <a href="http://...">The Car Connection</a>
   </cite>
   <ul>
      <li class="beg" about="http://.../reviews/00001/">
         <p>
            <strong property="dc:title">2009 Honda Accord</strong>
            <em>(from $21,905)</em>.
         </p>
         <a href="http://.../reviews/00001/">Read the review</a>
      </li>
      <li class="mid" about="http://.../reviews/00002/">
         <p>
            <strong property="dc:title">2009 Toyota Prius</strong>
            <em>($22,000)</em>.
         </p>
         <a href="http://.../reviews/00002/">Read the review</a>
      </li>
      <li class="end" about="http://.../reviews/00003/">
         <p>
            <strong property="dc:title">2009 Nissan Altima</strong>
            <em>($22.95)</em>.
         </p>
         <a href="http://.../reviews/00003/">Read the review</a>
      </li>
   </ul>
   <form method="post" action="http://.../email/">
      <p>
         Get our most recent reviews each month:
      </p>
      <label for="nwcreveml">Email</label>
      <input type="text" id="nwcreveml" name="nwcreveml" value="" />
      <p class="action">
         <input type="submit" id="nwcrevsub" name="nwcrevsub" value=
            "Sign Up" />
      </p>
   </form>
</div>
Example 3-6. Namespace definition for the Dublin Core vocabulary
<body xmlns:dc="http://purl.org/dc/elements/1.1/">
.
.
.
</body>
Table 3-5. Triples from the RDFa attributes in Example 3-5

Subject

Predicate

Object

http://.../reviews/00001/

dc:title

2009 Honda Accord

http://.../reviews/00002/

dc:title

2009 Toyota Prius

http://.../reviews/00003/

dc:title

2009 Nissan Altima

Example 3-7 presents a further enhancement to the information architecture presented in Example 3-3 for the New Car Reviews module. In Example 3-7, we have annotated the title and creator for the reviews as a whole. In addition, we have used the content attribute to change the object of the triple for each review. By doing so, we can provide something more descriptive than what appears in the markup. Altering the content like this can be useful when you need to use different representations of information in the human Web and the data Web (the human Web did not require this clarification in the title, for example). The enhancements in Example 3-7 produce five triples (see Table 3-6).

Example 3-7. The New Car Reviews module annotated further using RDFa
<div id="nwcrev" about="http://.../reviews/">
   <h3 property="dc:title">
      New Car Reviews
   </h3>
   <cite property="dc:creator">
      <a href="http://...">The Car Connection</a>
   </cite>
   <ul>
      <li class="beg" about="http://.../reviews/00001/">
         <p>
            <strong property="dc:title" content="Review for 2009 Honda
               Accord">2009 Honda Accord</strong>
            <em>(from $21,905)</em>.
         </p>
         <a href="http://.../reviews/00001/">Read the review</a>
      </li>
      <li class="mid" about="http://.../reviews/00002/">
         <p>
            <strong property="dc:title" content="Review for 2009 Toyota
               Prius">2009 Toyota Prius</strong>
            <em>(from $22,000)</em>.
         </p>
         <a href="http://.../reviews/00002/">Read the review</a>
      </li>
      <li class="end" about="http://.../reviews/00003/">
         <p>
            <strong property="dc:title" content="Review for 2009 Nissan
               Altima">2009 Nissan Altima</strong>
            <em>(from $19,900)</em>.
         </p>
         <a href="http://.../reviews/00003/">Read the review</a>
      </li>
   </ul>
   <form method="post" action="http://.../email/">
      <p>
         Get our most recent reviews each month:
      </p>
      <label for="nwcreveml">Email</label>
      <input type="text" id="nwcreveml" name="nwcreveml" value="" />
      <p class="action">
         <input type="submit" id="nwcrevsub" name="nwcrevsub" value=
            "Sign Up" />
      </p>
   </form>
</div>
Table 3-6. Triples from the RDFa attributes in Example 3-7

Subject

Predicate

Object

http://.../reviews/

dc:title

New Car Reviews

http://.../reviews/

dc:creator

The Car Connection

http://.../reviews/00001/

dc:title

Review for 2009 Honda Accord

http://.../reviews/00002/

dc:title

Review for 2009 Toyota Prius

http://.../reviews/00003/

dc:title

Review for 2009 Nissan Altima

In the end, both Example 3-5 and Example 3-7 provide additional meaning for our module because they go beyond the HTML markup to add annotations that tell a processor exactly what certain pieces of the markup are. So, instead of having to make an assumption about what the cite element represents in Example 3-7 (e.g., The Car Connection has been cited as having something to do with the division that encloses it), you now know specifically that The Car Connection is the creator of the content at http://.../reviews/, which is a resource with the title New Car Reviews.

The value of RDFa depends on the presence of processors that do something useful with the RDFa statements in web applications. With a groundswell of interest in using RDFa data at major websites such as Yahoo! and Google, it’s possible that modern web applications will soon be expected to provide relevant annotations as a matter of course.

HTML 5

As mentioned previously, at the time of this book’s publication, HTML 5 is still in its working draft form, so a consistent implementation hasn’t been agreed upon. However, it’s worth keeping in mind that HTML 5, whenever it does settle down, is likely to bring with it a new set of semantic tags for creating good information architecture using HTML. Table 3-7 presents some of the structural tags being proposed.

Table 3-7. Tags proposed for HTML 5 to help with structure

Tag

Explanation

article

Independent piece of content of a document.

aside

Content only slightly related to the rest of the page.

dialog

Marks up a conversation between multiple parties.

figure

Associates a caption with content.

footer

Groups content that typically appears at the bottom of a section.

header

Groups content that typically appears at the top of a section.

hgroup

Groups parts of a header when it has multiple levels.

nav

Section of a document intended for navigation.

section

Generic document or application section.

HTML 5 proposes a number of other important changes to elements. Some examples of these changes include the following:

  • New elements for common types of data (e.g., canvas, meter, progress, and time)

  • New values for the type attribute of input elements to support common user interface components (e.g., url, email, and datetime)

  • New attributes for many elements

  • Changes in meanings for some elements and attributes to help reflect how they are used today

  • The removal of many elements and attributes that were deprecated in earlier HTML versions

HTML 5 also proposes a number of changes and additions to various interfaces. For example, it introduces useful APIs (application programming interfaces) for creating web applications. These include a drawing API for use with the canvas element, an API for controlling audio and video, and an API for drag and drop, among others. HTML 5 also proposes extensions to some of the existing DOM interfaces.

Unfortunately, the lack of support for HTML 5 in the major browsers at this time makes it primarily something to keep an eye on for later. Even as it may be tempting to start to use some of the new elements in your markup, the pitfalls regarding potential inconsistencies among the major browsers in the future are still too much of a question to employ these for now. However, look forward to it in the future, because it is likely to include many features that will help you make an information architecture created in HTML more descriptive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset