2. Structure and Semantics for Documents

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2. Structure and Semantics for Documents

Both the previously mentioned MAMA survey conducted by Opera and Google’s study of Web Authoring Statistics of 2005 (http://code.google.com/webstats) conclude that it was common practice at that time to determine the page structure of web sites with the class or id attribute. Frequently used attribute values were footer, content, menu, title, header, top, main, and nav, and it therefore made sense to factor the current practice into the new HTML5 specification and to create new elements for structuring pages.

The result is a compact set of new structural elements—for example, header, hgroup, article, section, aside, footer, and nav—that facilitate a clear page structure without detours via class or id. To illustrate this, we will use a fictitious and not entirely serious HTML5 blog entry to risk a look ahead to the year 2022 (see Figure 2.1). But please concentrate less on the content of the post and focus instead on the document structure.

Figure 2.1 The fictitious HTML5 blog

Before analyzing the source code of the HTML5 blog in detail, here are a few important links, for example, to the specification HTML: The Markup Language Reference—subsequently shortened and referred to as markup specification at http://www.w3.org/TR/html-markup.

Here, Mike Smith, the editor and team contact of W3C HTML WG, lists each element’s definition, any existing limitations, valid attributes or DOM interfaces, plus formatting rules in CSS notation (if to be applied)—a valuable help that we will use repeatedly. The HTML5 specification also contains the new structural elements in the following chapter: http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html

The .html and .css files to go with the HTML5 blog are of course also available online at:

• http://html5.komplett.cc/code/chap_structure/blog_en.html

• http://html5.komplett.cc/code/chap_structure/blog.css

At first glance, you can see four different sections in Figure 2.1—a header, the article, the footer, and a sidebar. All the new structural elements are used in these four sections. In combination with short CSS instructions in the stylesheet blog.css, they determine the page structure and layout.

2.1 Header with “header” and “hgroup”

In the header we encounter the first two new elements: header and hgroup. Figure 2.2 shows the markup and the presentation of the header:

Figure 2.2 The basic structure of the HTML5 blog header

The term header as used in the markup specification refers to a container for headlines and additional introductory contents or navigational aids. Headers are not only the headers at the top of the page, but can also be used elsewhere in the document. Not allowed are nested headers or a header within an address or footer element.

In our case the headline of the HTML5 blog is defined by header in combination with the logo as an img element and two headings (h1 and h2) surrounded by an hgroup element containing the blog title and a subtitle.

Whereas it was common practice until now to write the h1 and h2 elements directly below one another to indicate title and subtitle, this is no longer allowed in HTML5. We now have to use hgroup for grouping such elements. The overall position of the hgroup element is determined by the topmost heading. Other elements can occur within hgroup, but as a general rule, we usually have a combination of tags from h1 to h6.

We can glimpse a small but important detail from the markup specification: The guideline is to format header elements as display: block in CSS, like all other structural elements. This ensures that even browsers that do not know what to do with the new tags can be persuaded to display the element concerned correctly. We only need a few lines of code to teach Internet Explorer 8 our new header element, for example:

Of course there is also a detailed JavaScript library on this workaround, and it contains not only header, but also many other new HTML5 elements. Remy Sharp makes it available for Internet Explorer at http://code.google.com/p/html5shim.

Note

In computer language, the term shim describes a compatibility workaround for an application. Often, the term shiv is wrongly used instead. The word shiv was coined by John Resig, the creator of jQuery, in a post of that title (http://ejohn.org/blog/html5-shiv). It remains unknown whether he may in fact have meant shim.

As far as CSS is concerned, the header does not contain anything special. The logo is integrated with float:left, the vertical distance between the two headings h1 and h2 is shortened slightly, and the subtitle is italicized.

2.2 Content with “article”

The article element represents an independent area within a web page, for example, news, blog entries, or similar content. In our case the content of the blog entry consists of such an article element combined with an img element to liven things up, an h2 heading for the headline, a time and address element for the date it was created and the copyright, plus three paragraphs in which you can also see q and cite elements for quotations of the protagonists.

Because the content element is now lacking, although it ranked right at the top in web page analyses by Google and Opera, it did not make it into HTML5 for some reason. Our blog entry is embedded in a surrounding div (see Figure 2.3). So nothing stands in the way of adding further articles:

Figure 2.3 The basic structure of the HTML5 blog content

By definition, the address element contains contact information, which incidentally does not, as is often wrongly assumed, refer only to the postal address, but simply means information about the contact, such as name, company, and position. For addresses, the specification recommends using p. The address element applies to the closest article element; if there is none, it applies to the whole document. The time element behaves in a similar way in relation to its attributes pubdate and datetime, which form the timestamp for our document. You will find details on this in section 2.7.2, The “time” Element.

If article elements are nested within each other, the inner article should in principle have a theme similar to that of the outer article. One example of this kind of nesting would be, in our case, adding a subarticle to our blog with comments on the post concerned.

Regarding styling via CSS, we should mention that article once again requires display: block, that the content width is reduced to 79% via the surrounding div, and that this div also neutralizes the logo’s float: left with clear: left. The italicized author information is the result of the default format of address and is not created via em. The picture is anchored on the left with float: left, the text is justified with align: justify, and quotations are integrated using the q element. One interesting detail is that the quotation marks are not part of the markup but are automatically added by the browser via the CSS pseudo-elements :before and :after in accordance with the style rules for the q element. The syntax in CSS notation once more reflects the markup specification:

/* Style rule for the q-element: */
q { display: inline; }
q:before { content: '"'; }
q:after { content: '"'; }

2.3 Footer with “footer” and “nav”

In the footer of our HTML blog, we find two other new structural elements: footer and nav (see Figure 2.4). The former creates the frame, and the latter provides navigation to other areas of the web page. footer contains additional info on the relevant section, such as who wrote it (as address of course); are there other, related pages; what do we need to look out for (copyright, disclaimer); and so on.

Figure 2.4 The basic structure of the HTML blog footer

Unlike the human body, where the head is usually at the top and the foot at the bottom, a footer in a document does not always have to be at the end of the document, but can, for example, also be part of an article element. Not allowed, however, are nested footer elements or a footer within a header or address element.

If you want to create navigation blocks to allow page navigation via jump labels within a document or to external related pages, you can use nav. Just as with footer, nav can appear in other areas of the document as well, as you will see in the section 2.4, Sidebar with “aside” and “section”—the only exception being that you cannot have nav within the address element:

As for CSS, our HTML5 blog’s footer has a few special features. For example, the entire footer is colored in the same light gray as the page background, and only the links are formatted with background-color: white. The copyright in the first p requires float: left, and the navigation text-align: right plus the h3 heading in the nav block are hidden with display: none. Just why there is an h3 element in there at all will become clear in section 2.5, The Outline Algorithm. To improve the style of the links, they are surrounded by div tags. And of course we have display: block for header and nav, plus a reduction of the width in the footer element to 79%.

2.4 Sidebar with “aside” and “section”

For areas of a page that are only loosely related to the main content and can therefore be seen as rather separate entities, we can use the aside element. In our example, it creates a classical sidebar on the right with three blocks for Questionnaire, Login, and Quick Links. If the link list is implemented as nav, as is to be expected, the two first blocks are embedded in another new element: section.

The section element contains sections of a document that are thematically connected, for example, chapters of an essay or individual tabs of a page constructed from tabs, typically with a heading. If section is used within footer, it is usually used for appendices, indices, license agreements, or the like. Generally, it makes sense to use section if it belonged in a table of contents as well. In our example, as shown in Figure 2.5, the Questionnaire and the Login are tagged with section, and the links are tagged as nav as mentioned earlier:

Figure 2.5 The basic structure of the HMTL5 blog sidebar

For the same reason as with the nav block in the footer (see the following section), the sidebar contains a heading h2 directly before the first Questionnaire block, hidden via CSS with display: none. The sidebar format is float: right with width: 20% and font-size: 0.9em. The striking feature of the sidebar is the rounded bottom-right corner, which means it’s time to admit that the HTML5 blog also uses CSS3: The rounded corner is only one of two features used. The CSS syntax for the class rounded-bottom-right looks like this:

.rounded-bottom-right {
  -moz-border-radius: 0px 0px 20px 0px;
  -webkit-border-radius: 0px 0px 20px 0px;
  border-radius: 0px 0px 20px 0px;
}

The second feature is responsible for the subtle shadow of the four areas and is defined as follows in the CSS file:

.shadow {
  -moz-box-shadow: 4px 0px 10px -3px silver;
  -webkit-box-shadow: 4px 0px 10px -3px silver;
  box-shadow: 4px 0px 10px -3px silver;
}

The tripling of the CSS command through the prefixes -moz-* and -webkit-* is conspicuous; it is caused by the fact that CSS3 is not yet in the Candidate Recommendation phase. Once it enters this stage of the standardization process, only then will it be ensured that border-radius and box-shadow will no longer be changed. Until then, the prefixes are maintained to show that the implementation could still contain small deviations from the standard.

Note

If you want to learn more about these two eagerly awaited features of the CSS3 specification, you will find further information here:

• http://www.w3.org/TR/css3-background/#the-border-radius

• http://www.w3.org/TR/css3-background/#box-shadow

2.5 The Outline Algorithm

Even if the details for outlining a document sound rather complicated in the specification, there is a simple idea behind outlining, which is a machine-readable summary of the underlying document structure. This structure is determined by a combination of so-called sectioning content—for example, body, article, aside, nav, and section—and heading content, such as h1 to h6 or hgroup, which provides the proper entries of the outline.

If we check our HTML5 blog with Geoffrey Sneddon’s online HTML5 Outliner (http://gsnedders.html5.org/outliner), we see the following structure:

1. The HLML5 blog!
   1. Link Block
      1. Questionnaire
      2. Login
      3. Quick Links
   2. Tug of war between W3C and WHATWG enters ...
   3. Navigation

With the italicized entries Link Block and Navigation, we get back exactly those two headings that were hidden in the layout with display: none. If we had omitted these headings completely, we would have seen the text Untitled Section in their place. But this way, the structure is complete and the outline is much easier to read.

Regarding the choice of headings h1 to h6, we should note the following: In principle, any sectioning content can start with the heading rank h1, but it does not have to. In our case the heading ranks reflect the hierarchy in the outline: h1 for the blog header; h2 for the article title, the link block, and the footer navigation; and h3 for the other headers. If we tagged everything with h1, we would get the same outline, but the layout would suffer somewhat and we would need to sort it out manually in the CSS file.

When using hgroup, you need to remember that the outline only includes the highest level in the hgroup. That is why you cannot see the subtitle Tips, tricks & tidbits for today’s web developers in the outline.

Even if there is as yet no browser that directly uses the outline algorithm in any form, this does not mean that it could not play a more important role in the future. Automatically generated navigation bars would be a possibility, or the creation of short, concise summaries, or perhaps improvements for crawlers extracting relevant content for search engines. Until then it definitely does not hurt to do some serious thinking about the structure of your document. It is easy to check the structure, so why not go ahead and do it?

2.6 Figures with “figure” and “figcaption”

The elements figure and figcaption do not really count among the structural elements, but they are still a welcome addition to our options in structuring the integration of independent pictures, graphics, diagrams, and code lists. Each figure element can have only one figcaption element. It is up to the author whether this is placed before or after the figure in question. A brief example with markup and its browser implementation (see Figure 2.6) could look like this:

<figure>
<img src="images/tarot_0980.jpg" alt="XXI: The World">
<img src="images/tarot_0963.jpg" alt="VI: The Choice">
<img src="images/tarot_0996.jpg" alt="XVIII: The Moon">
<figcaption> Three magical sculptures in Niki de Saint
Phalles <em>Giardino dei Tarocchi</em> near Capalbio in the
Tuscany region of Italy. The tarot cards from left to right:
The World (XXI), The Choice (VI), and The Moon (XVIII)</figcaption>
</figure>

Figure 2.6 Example of “figure” with “figcaption”

2.7 Text-Level Semantics—More New Tags

Apart from focusing on clear structures, the HTML5 specification also attaches importance to semantics and tries to assign each element a certain meaning on the text level. At the same time, the HTML5 specification determines in which context the tag concerned can be used and in which it cannot. There are some new elements and some that have disappeared completely (such as font, center, and big), and the definitions of others have changed slightly. The following chapter will introduce new and changed elements. Later, in Table 2.2 we will show you the classical applications of all elements that appear in the specification’s Text-level semantics chapter. Let’s start with the most exotic of the new elements—ruby.

2.7.1 The Elements “ruby,” “rt,” and “rp”

The term ruby refers to a typographic annotation system, meaning “short runs of text alongside the base text, typically used in East Asian documents to indicate pronunciation or to provide a short annotation” (www.w3.org/TR/ruby). Ruby annotation is used in Chinese and Japanese to show the pronunciation of characters, as you can see in the example on the left in Figure 2.7.

Figure 2.7 Two examples of ruby annotation

The markup for ruby annotations contains the elements ruby, rt, and rp. First, the expression that will be explained is specified within a ruby element. The explanation is then provided by the following rt element, and in browsers with ruby support the content of this rt element is positioned above the expression described. As you can see in the Beijing example, several words in a row can be annotated this way.

Browsers without ruby support (such as Firefox and Opera) display the individual components consecutively, which can make the words more difficult to read. Because it is not necessarily clear that the second word is the explanation of the first word, a visual separation of the two components is required. That is what the rp element is for: It enables adding optional parentheses that will only be displayed if a browser does not know ruby. As you can see in Figure 2.7, Google Chrome can interpret ruby and visually separate it. A browser without ruby support would display the examples as b i j ng and HTML N°5 (Web Standard).

2.7.2 The “time” Element

The time element represents either a time in the 24-hour-format or a date in the Gregorian calendar with optional time and time-zone components. Its purpose is to give modern date and time specifications in a machine-readable format within an HTML5 document. Vague time references, such in the spring of 2011 or five minutes before the turn of the millennium, are therefore not allowed.

To ensure machine readability, we can use the attribute datetime, and its attribute value can be specified either as time, date, or a combination of both. The syntax for specifying the time components is clearly defined in the specification and is described in Table 2.1.

Table 2.1 The Rules for Timestamps for the “time” Element’s “datetime” Attribute

The pubdate attribute is a boolean attribute and indicates that the specified date applies to the next level article in the hierarchy, and—if there is none—should be understood as the publication date of the document. If you are using pubdate, there has to be a datetime element as well. If this is not the case, the section between the time element’s start tag and end tag must contain a valid date.

Note

Be careful when writing boolean attributes in HTML5: true or false are not valid attribute values! As soon as the parser discovers the attribute name in boolean attributes, it switches to true. So there are three valid notations for setting a boolean attribute to true:

<time pubdate>
<time pubdate="">
<time pubdate="pubdate"> (of course you can also omit the quotation marks)

To switch to false, you only have one option: Omit the attribute altogether!

2.7.3 The “mark” Element

The mark element represents a highlighted text segment that is regarded as relevant in a different context. That sounds a bit cumbersome, so we will illustrate it with some brief examples: If you want to highlight a certain passage of a quotation in particular, you change the original text and almost force a new meaning onto it. You can use the mark element to add significance to certain words in a document or code listing as a result of searching for them or in the course of interpreting the code.

2.7.4 The “wbr” Element

Unsurprisingly, the wbr element enables the browser to insert an optional line break in long words. For example, inserting a couple of wbr elements in a rather long word, such as supercalifragilisticexpialidocious, would give the browser the opportunity to break the word over two lines if the layout requires it:

supercali<wbr>fragilistic<wbr>expialidocious

It depends entirely on the layout whether and where the line break occurs. wbr only allows a line break, it does not force it. Possible applications would be long URLs or code listings. Similar to br, wbr is a so-called void element, which means it must not contain an end tag—a quality it shares with 14 other elements in HTML5. Here they are

But of course void elements can contain a slash in the start tag itself (e.g., <br />), which is useful with regard to meeting the requirements of valid XHTML5 documents.

2.7.5 Elements with Marginal Changes

The list of elements with marginal changes starts with b and i, two tags that no longer fit into the concept of HTML5, also because of their names: b for bold and i for italic give definite formatting instructions, and these are not popular in HTML5. The relevance is now essential, so we should instead use strong and em as in emphasis to stress the importance of a word. Unfortunately, b and i are among the most widely used tags, which is why it was impossible to prevent their use altogether. The solution was a compromise that continues to allow both but alters their meaning: b now refers to offset text in bold and i to offset text in italics. But if you want to write clean HTML5, you should avoid using b and i in the future and instead use strong and em.

Other small changes mean that cite now designates the title of a work and must explicitly not be used for citing names. small now means not only small print, but also represents side comments or small print in the sense of legal notices but without making statements as to their importance. hr now signals a thematic break, not just a horizontal line to break up the layout.

The specification offers a usage summary of individual tags with examples at the end of the chapter Text-level semantics. To save you from having to look it up, here it is in our Table 2.2.

Table 2.2 Usage of Semantic Text Elements

Summary

HTML5 offers a wealth of new structural elements, such as header, hgroup, article, section, aside, footer, and nav. The detailed example at the beginning of this chapter, the creation of a fictitious blog entry, demonstrates how easily and intuitively these elements can be used. Instead of anonymous div elements, which only make sense in combination with the class attributes, we now find speaking elements—a concept continued with figure and figcaption for integrating images and graphics. From the comprehensive list of HTML5 semantic text elements plus examples of their usage in Table 2.2, we briefly introduced the most interesting new elements, such as ruby, rt, and rp for ruby annotations; time for specifying the time; mark for marking text passages; and wbr for optional line breaks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Structure and Semantics for Documents

Create new playlist

Sign In

Sign Up

2. Structure and Semantics for Documents

2.1 Header with “header” and “hgroup”

2.2 Content with “article”

2.3 Footer with “footer” and “nav”

2.4 Sidebar with “aside” and “section”

2.5 The Outline Algorithm

2.6 Figures with “figure” and “figcaption”

2.7 Text-Level Semantics—More New Tags

2.7.1 The Elements “ruby,” “rt,” and “rp”

2.7.2 The “time” Element

2.7.3 The “mark” Element

2.7.4 The “wbr” Element

2.7.5 Elements with Marginal Changes

Summary

Table of Contents for
2. Structure and Semantics for Documents