Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Semantics

PREVIOUS VERSIONS OF HTML didn’t provide us with a huge number of elements to work with. We have elements for paragraphs, lists, and headlines, but no way to mark up a defined article with ancillary information or footnotes. Nor have we had a way to declare that a certain collection of links and content was a main navigation feature of our site.

Clearly, this limitation hasn’t been a showstopper; just look at the amazing variety of websites out there. Even though HTML might not provide a specific element for marking up a particular piece of content, it provides just enough flexibility for us to get by.

To paraphrase Winston Churchill, HTML is the worst form of markup, except for all the others that have been tried.

EXTENSIBILITY

Other markup languages allow you to invent any element you want. In XML, if you want an event element or a price element, you just go right ahead and create it. The downside to this freedom is that you then have to teach a parser what event or price means. The advantage to HTML’s limited set of elements is that every user agent knows about every element. Browsers have a built-in knowledge of HTML. That wouldn’t be possible if we were allowed to make up element names.

HTML provides a handy escape clause that allows web designers to add more semantic value to elements: the class attribute. This attribute allows us to label specific instances of an element as being a special class or type of that element. The fact that browsers don’t understand the vocabulary we use in our class attributes doesn’t affect the rendering of our documents.

If, at this point, you’re thinking, “Wait a minute; aren’t classes for CSS?” then you’re half-right. The CSS class selector is one example of a technology that makes use of the class attribute, but it isn’t the only reason for using classes. Classes can also be used in DOM scripting. They can even be used by browsers if the class names follow an agreed-upon convention, as is the case with microdata and microformats.

Microformats and microdata

Microdata and microformats are two vocabularies that help us to describe our content, allowing parsers to better understand and use the content of our documents.

Microformats are a set of formatting conventions agreed upon by a community. These formats use the class attribute to plug some of the more glaring holes in HTML: hCard for contact details, hCalendar for events, hAtom for news stories. Because there is a community consensus on what class names to use, there are now parsers and browser extensions that work with those specific patterns.

Microformats are limited by design. They don’t attempt to solve every possible use case. Instead, they aim for the low-hanging fruit. They solve 80% of the use cases with 20% of the effort. Deciding what qualifies as low-hanging fruit is pretty straightforward: just look at what kind of content people are already marking up. In other words, pave the cowpaths.

Sound familiar? Microformats and HTML5 are built on very similar philosophies—they can both be defined as conventions agreed upon by a community.

Boiling the ocean

The way the microformats process has been used as a template for developing HTML5 isn’t to everyone’s taste. While the 80/20 rule is good enough for the rough ’n’ ready world of class names, is it really good enough for the most important markup language in the world?

Some people feel that HTML needs to be infinitely extensible. That means it isn’t enough to provide solutions for the majority of use cases; the language must provide a solution for any possible use case.

Perhaps the most eloquent argument for this kind of extensibility came from John Allsopp in his superb A List Apart article, “Semantics in HTML5” (http://bkaprt.com/html52e/04-01/):

We don’t need to add specific terms to the vocabulary of HTML, we need to add a mechanism that allows semantic richness to be added to a document as required.

Technologies already exist to do just that. RDFa allows authors to embed custom vocabularies within HTML documents. But unlike microformats—which simply use an  agreed-upon set of class names—RDFa uses namespaces to allow an infinite variety of formats. So where a microformat might use markup such as h1 class="summary", RDFa would use h1 property="myformat:summary".

There’s no doubt that RDFa is potentially very powerful, but its expressiveness comes at a price. Namespaces introduce an extra layer of complexity that doesn’t sit well with the relatively simple nature of HTML.

The namespace debate isn’t new. In a blog post from a few years back, Mark Nottingham mused on the potentially destructive side effects (http://bkaprt.com/html52e/04-02/):

What I found interesting about HTML extensibility was that namespaces weren’t necessary; Netscape added blink, MSFT added marquee, and so on. I’d put forth that having namespaces in HTML from the start would have had the effect of legitimizing and institutionalizing the differences between different browsers instead of (eventually) converging on the same solution.

Rather than infinite extensibility, that’s a powerful argument for a limited vocabulary based on community consensus.

The WHATWG developed a microdata vocabulary, similar to microformats, that did not become part of the W3C snapshot of the specification. There is now a microformats2 specification that supersedes the WHATWG microdata vocabulary. You might also come across the vocabulary detailed on schema.org, which uses microdata.

At first glance, this all perhaps seems rather academic, and a mess of competing formats. After all, marking up content this way may have no benefit at all to site visitors, unless they happen to be using a browser plugin that can detect these formats. Microformats and microdata vocabularies are interesting once parsers are written for them; simply marking up content this way only has use once something is looking for it.

One avenue through which we will all encounter the use of these vocabularies is through search engines. Google encourages developers to use schema.org vocabularies to add “structured data markup” for better page indexing (http://bkaprt.com/html52e/04-03/).

NEW ELEMENTS

HTML5 introduces a handful of new inline elements to augment our existing arsenal of span, strong, em, abbr, and so on. Oh, and we don’t call them “inline” anymore. Instead, these elements describe text-level semantics.

mark

When browsing a list of search results, you’ll often see the search term highlighted within each result. You could mark up each instance of the search term with a span element, but span is a semantically meaningless crutch, good for little more than attaching classes for styling.

You could use em or strong, but that wouldn’t be semantically accurate. You don’t want to place any importance on the search term; you simply want it to be highlighted somehow.

Enter the mark element:

<h1>Search results for 'unicorn'</h1>

<ol>

  <li><a href="http://clearleft.com/">

  Riding the UX <mark>unicorn</mark> across »

  the rainbow of the web.

  </a></li>

</ol>

The mark element doesn’t attach any importance to the content within it, other than to show that it’s currently of interest. As the specification says, mark denotes “a run of text in one document marked or highlighted for reference purposes, due to its relevance in another context.”

The mark element is permitted in contexts other than search results, but we’re hard-pressed to think of an example.

time

hCalendar is one of the most popular microformats because it scratches a very common itch: marking up events so that users can add them straight to their calendar.

The only tricky bit in hCalendar is describing dates and times in a machine-readable way. Humans like to describe dates as “May 25th” or “next Wednesday,” but parsers expect a nicely formatted ISO date: YYYY-MM-DDThh:mm:ss.

The microformats community came up with some clever solutions to this problem, such as using the abbr element:

<abbr class="dtstart" title="1992-01-12">

  January 12th, 1992

</abbr>

If using the abbr element in this way makes you feel a little queasy, there are plenty of other ways of marking up machine-readable dates and times in microformats using the class-value pattern. In HTML5, the issue is solved with the new time element:

<time class="dtstart" datetime="1992-01-12">

  January 12th, 1992

</time>

The time element can be used for dates, times, or combinations of both:

<time datetime="17:00">5pm</time>

<time datetime="2010-04-07">April 7th</time>

<time datetime="2010-04-07T17:00">5pm on April 7th »

</time>

You don’t have to put the datetime value inside the datetime  attribute—but if you don’t, then you must expose the value to the end user:

<time>2010-04-07</time>

meter

The meter element can be used to mark up measurements, provided that those measurements are part of a scale with minimum and maximum values.

<meter>9 out of 10 cats</meter>

You don’t have to expose the maximum value if you don’t want to. You can use the max attribute instead:

<meter max="10">9 cats</meter>

There’s a corresponding min attribute. You also get high, low, and optimum attributes to play with. If you want, you can even hide the measurement itself inside a value attribute.

<meter low="-273" high="100" min="12" max="30" »

optimum="21" value="25">

  It's quite warm for this time of year.

</meter>

progress

While meter is good for describing something that has already been measured, the progress element allows you to mark up a value that is in the process of changing:

Your profile is <progress>60%</progress> complete.

Once again, you have min, max, and value attributes if you want to use them:

<progress min="0" max="100" value="60"></progress>

The progress element is most useful when used in combination with DOM scripting. You can use JavaScript to dynamically update the value, allowing the browser to communicate that change to the user—very handy for Ajax file uploads.

STRUCTURE

Back in 2005, Google did some research to find out what kind of low-hanging fruit could be found on the cowpaths of the web (http://bkaprt.com/html52e/04-04/).

A parser looked at over a billion web pages and tabulated the most common class names. The results were unsurprising. Class names such as “header,” “footer,” and “nav” were prevalent. These emergent semantics map nicely to some of the new structural elements introduced in HTML5.

section

The section element is used for grouping together thematically related content. That sounds a lot like the div element, which is often used as a generic content container. The difference is that div has no semantic meaning; it doesn’t tell you anything about the content within. The section element, on the other hand, is used explicitly for grouping related content.

You might be able to replace some of your div elements with section elements, but remember to always ask yourself, “Is all of the content related?”

<section>

   <h1>DOM Scripting</h1>

  <p>The book is aimed at designers »

  rather than programmers.</p>

  <p>By Jeremy Keith</p>

</section>

If your chunk of content does not have a natural heading identifying the theme of the section, then it is likely that it is not really a section, and a div might be a better choice.

If the chunk of content could be independently syndicated—for example, a blog post or article that would still make sense if it were sent in an email with no other context surrounding it—then it is likely the article element would be a better choice.

header

The HTML5 spec describes the header element as a container for “a group of introductory or navigational aids.” That sounds reasonable. That’s the kind of content one would expect to find in a masthead, and the word header is often used as a synonym for masthead.

There’s a crucial difference between the header element in HTML5 and the generally accepted use of the word header or masthead. There’s usually only one masthead in a page, but a document can have multiple header elements. You can use the header element within a section element, for example. In fact, you probably should use a header within a section. The specification describes the section element as “a thematic grouping of content, typically with a heading.”

<section>

  <header>

    <h1>DOM Scripting</h1>

  </header>

  <p>The book is aimed at designers »

  rather than programmers.</p>

  <p>By Jeremy Keith</p>

</section>

A header will usually appear at the top of a document or section, but it doesn’t have to. It is defined by its content—introductory or navigational aids—rather than its position.

footer

Like the header element, footer sounds like it’s a description of position but, as with header, this isn’t the case. Instead, the footer element should contain information about its containing element: who wrote it, copyright information, links to related content, etc.

That maps quite nicely onto the mental model that web designers have for the word footer. The difference is that, whereas we are used to having one footer for an entire document, HTML5 also allows us to have footers within sections.

<section>

  <header>

    <h1>DOM Scripting</h1>

  </header>

  <p>The book is aimed at designers »

  rather than programmers.</p>

  <footer>

    <p>By Jeremy Keith</p>

  </footer>

</section>

aside

Just as the header element matches the concept of a masthead, the aside element matches the concept of a sidebar. But sidebar doesn’t refer to position; just because some content appears to the left or to the right of the main content isn’t reason enough to use the aside element. Once again, it’s the content that matters, not the position.

The aside element should be used for tangentially related content. If you have a chunk of content that you consider to be separate from the main content, then the aside element is probably the right container for it. Ask yourself if the content within an aside could be removed without reducing the meaning of the main content of the document or section.

The aside element takes on a different meaning if it is nested inside an article. In that case, it should be content that directly relates to the article, such as a pull quote.

Remember, just because your visual design calls for some content to appear in a sidebar doesn’t necessarily mean that aside is the correct containing element. It’s quite common, for example, to place an author bio in a sidebar. That kind of data is best suited to the footer element—the specification explicitly mentions authorship information as being suitable for footers (FIG 5.01).

FIG 5.01: The “about the author” text in this screenshot should be marked up with  footer, not aside.

Ninety percent of the time, header elements will be positioned at the top of your content, footer elements will be positioned at the end of your content, and aside elements will be positioned to one side. But don’t get complacent. Stay on your toes and watch out for the remaining ten percent.

nav

The nav element does exactly what you think it does: it contains navigation information, usually a list of links.

Actually, let’s clarify that. The nav element is intended for major navigation information. Just because a fistful of links are grouped together in a list isn’t enough reason to use the nav element. Site-wide navigation, on the other hand, almost certainly belongs in a nav element.

Quite often, a nav element will appear within a header element. That makes sense when you consider that the header element can be used for “navigational aids.”

article

It’s helpful to think of header, footer, nav, and aside as being specialized forms of the section element. A section is a generic chunk of related content, while headers, footers, navs, and asides are chunks of specific kinds of related content.

The article element is another specialized kind of section. Use it for self-contained related content. Now the tricky part is deciding what constitutes “self-contained.”

Ask yourself if you would syndicate the content in an RSS or Atom feed, or send it in an email to someone without other context from the page or site. If the content still makes sense in that context, then article is probably the right element to use. In fact, the article element is specifically designed for syndication.

If you use a time element within an article, you can add an optional pubdate Boolean attribute to indicate that it contains the date of publication:

<article>

  <header>

    <h1>DOM Scripting review</h1>

  </header>

  <p>A small lighthouse for what has been a long »

  and sometimes dark voyage for JavaScript.</p>

  <footer>

    <p>Published

      <time datetime="2005-10-08T15:13" pubdate>

      3:13pm on October 8th, 2005

      </time>

    by Glenn Jones</p>

  </footer>

</article>

If you have more than one time element within an article, only one of them can have the pubdate attribute.

The article element is useful for blog posts, news stories, comments, reviews, and forum posts. It covers exactly the same use cases as the hAtom microformat.

The HTML5 specification goes further than that. It also declares that the article element should be used for self-contained widgets: stock tickers, calculators, clocks, weather widgets, and the like.

It may seem unintuitive that an element named “article” should apply to the construct known as “widget.” Then again, both articles and widgets are self-contained, syndicatable kinds of content.

More problematic is that article and section are so very similar. All that separates them is the concept of self-containment. Deciding which element to use would be easy if there were some hard and fast rules. Instead, it’s a matter of interpretation. The WHATWG says:

A section forms part of something else. An article is its own thing. But how does one know which is which? Mostly the real answer is “it depends on author intent.” (http://bkaprt.com/html52e/04-05/)

You can have multiple article elements within a section; you can have multiple section elements within an article; you can nest sections within sections and articles within articles. It’s up to you to decide which element is the most semantically appropriate in any given situation.

A cure for div-itis?

HTML5 gives us the handful of new structural elements described above. They’re especially handy if you’re putting together a conventional site, such as a blog. Most blog designs consist of a header followed by a series of articles, with  some tangential content in an aside, finished off with a footer (FIG 5.02).

You can now replace some of your div elements with more semantically precise structural elements. Don’t go overboard, though. Chances are, if you are using a div today, you will still be using a div tomorrow. Don’t swap your div elements for shiny new HTML5 elements just for the sake of it. Think about the content. These new elements weren’t created just to replace div elements. They provide web browsers with a completely new way of understanding your content.

FIG 5.02: Jeremy Keith’s adactio.com, showing typical elements included in a blog.

CONTENT MODELS

Previous flavors of markup divided elements into two categories: inline and block. HTML5 uses a more fine-grained approach, dividing elements into a wider range of categories.

Inline elements now have a content model of “text-level semantics.” Many block-level elements now fall under the banner of “grouping content”: paragraphs, list items, divs, and so on. Forms have their own separate content model. Images, audio, video, and canvas are all “embedded content.” The new structural elements introduce a completely new content model called “sectioning content.”

Sectioning content

It’s possible to create an outline of an HTML document using the heading elements, h1 through h6. Take a look at this markup, for example:

<h1>An Event Apart</h1>

<h2>Cities</h2>

<p>Join us in these cities in 2010.</p>

<h3>Seattle</h3>

<p>Follow the yellow brick road to the emerald »

city.</p>

<h3>Boston</h3>

<p>That's Beantown to its friends.</p>

<h3>Minneapolis</h3>

<p>It's so <em>nice</em>.</p>

<small>Accommodation not provided.</small>

That gives us this outline:

An Event Apart
Cities
Seattle
Boston
Minneapolis

This works well enough. Any content that follows a heading element is presumed to be associated with that heading.

Now look at the final small element. That should be associated with the entire document. But a browser has no way of knowing that. There’s no way of knowing that the small element shouldn’t fall under the heading “Minneapolis.”

The new sectioning content in HTML5 allows you to explicitly demarcate the start and the end of related content:

<h1>An Event Apart</h1>

<section>

  <header>

    <h2>Cities</h2>

  </header>

  <p>Join us in these cities in 2010.</p>

  <h3>Seattle</h3>

  <p>Follow the yellow brick road.</p>

  <h3>Boston</h3>

  <p>That's Beantown to its friends.</p>

  <h3>Minneapolis</h3>

  <p>It's so <em>nice</em>.</p>

</section>

<small>Accommodation not provided.</small>

Now it’s clear that the small element falls under the heading “An Event Apart” rather than “Minneapolis.”

We can subdivide this content even further, placing each city in its own section:

<h1>An Event Apart</h1>

<section>

  <header>

    <h2>Cities</h2>

  </header>

  <p>Join us in these cities in 2010.</p>

  <section>

    <header>

      <h3>Seattle</h3>

    </header>

    <p>Follow the yellow brick road.</p>

  </section>

  <section>

    <header>

      <h3>Boston</h3>

    </header>

    <p>That's Beantown to its friends.</p>

  </section>

  <section>

    <header>

      <h3>Minneapolis</h3>

    </header>

    <p>It's so <em>nice</em>.</p>

  </section>

</section>

<small>Accommodation not provided.</small>

That still gives us the same outline:

An Event Apart
Cities
Seattle
Boston
Minneapolis

The outline algorithm

So far, the new sectioning content isn’t giving us much more than what we could do with previous versions of HTML. Here’s the kicker: in HTML5, each piece of sectioning content has its own self-contained outline. That means that, according to the specification, you don’t have to keep track of what heading level you should be using—you can just start from h1 each time:

<h1>An Event Apart</h1>

<section>

  <header>

    <h1>Cities</h1>

  </header>

  <p>Join us in these cities in 2010.</p>

  <section>

    <header>

      <h1>Seattle</h1>

    </header>

    <p>Follow the yellow brick road.</p>

  </section>

  <section>

    <header>

      <h1>Boston</h1>

    </header>

    <p>That’s Beantown to its friends.</p>

  </section>

  <section>

    <header>

      <h1>Minneapolis</h1>

    </header>

    <p>It's so <em>nice</em>.</p>

  </section>

</section>

<small>Accommodation not provided.</small>

In previous versions of HTML, this would have produced an inaccurate outline:

An Event Apart
Cities
Seattle
Boston
Minneapolis

In HTML5, the outline is accurate:

An Event Apart
Cities
Seattle
Boston
Minneapolis

Sectioning roots

Some elements are invisible to the generated outline. In other words, it doesn’t matter how many headings you use within these elements, they won’t appear in the document’s outline.

The blockquote, fieldset, and td elements are all immune to the outline algorithm. These elements are called “sectioning roots”—not to be confused with sectioning content.

Portability

Because each piece of sectioning content generates its own outline, you can now get far more heading levels than simply h1 through h6. There is no limit to how deep your heading levels can go. More importantly, you can start to think about your content in a truly modular way.

Suppose you have a blog post entitled “Cheese sandwich.” Before HTML5, you would need to know the context of the blog post in order to decide which heading level to use for the title of the post. If the post were on the front page, then it would appear after an h1 element containing the title of the blog:

<h1>My awesome blog</h1>

<h2><a href="cheese.html">Cheese sandwich</a></h2>

<p>My cat ate a cheese sandwich.</p>

But if you were publishing the blog post on its own  page, then you would want the title of the blog post to be a level-one heading:

<h1>Cheese sandwich</h1>

<p>My cat ate a cheese sandwich.</p>

In HTML5, you don’t have to worry about which heading level to use. You just need to use sectioning content—an article element, in this case:

<article>

  <h1>Cheese sandwich</h1>

  <p>My cat ate a cheese sandwich.</p>

</article>

Now the content is truly portable. It doesn’t matter whether it’s appearing on its own page or on the homepage:

<h1>My awesome blog</h1>

<article>

  <h1>Cheese sandwich</h1>

  <p>My cat ate a cheese sandwich.</p>

</article>

HTML5’s new outline algorithm produces the correct result:

My awesome blog
Cheese sandwich

Scoped styles

The fact that each piece of sectioning content has its own outline makes it the perfect match for Ajax. Yet again, HTML5 displays its provenance as a specification for web applications.

Trying to port a piece of content from one document into another introduces some problems. The CSS rules being applied to the parent document will also apply to the inserted content. That’s currently one of the challenges in distributing widgets on the web.

HTML5 offers a solution to this problem in the shape of the scoped attribute, which can be applied to a style element. Any styles declared within that style element will only be applied to the containing sectioning content:

<h1>My awesome blog</h1>

<article>

  <style scoped>

    h1 { font-size: 75% }

  </style>

  <h1>Cheese sandwich</h1>

  <p>My cat ate a cheese sandwich.</p>

</article>

In that example, only the second h1 element will have a font-size value of 75%. That’s the theory, anyway. Firefox is the only browser to support the scoped attribute at the present time; it was added into Chrome behind the Experimental Web Platform features flag, yet has now been removed due to code complexity. And, at present, no browsers support the outline functionality.

Therein lies the rub. Before you can start using a new addition to HTML5, you need to consider the browser support for that feature. As HTML5 is a living standard, there will always be new features being added; this will always be an ongoing process. We have a few strategies to help you use HTML5 in a forward- and backward-compatible way. We’ll share them with you in the next and final chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 5. Semantics

Create new playlist

Sign In

Sign Up