Appendix C.  Microdata

Microdata is another technology that’s rapidly gaining adoption and support, but unlike WAI-ARIA, it’s actually part of HTML5. The Microdata Specification is still early in development, but it’s worth mentioning this technology here, because it provides a peek into what may be the future of document readability and semantics.

In the spec, Microdata is defined as a mechanism that allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model.

With Microdata, page authors can add specific labels to HTML elements to annotate them so that they are able to be read by machines or bots. This is done by means of a customized vocabulary. For example, you might want a script or other third-party service to be able to access your pages and interact with specific elements on the page in a certain manner. With Microdata, you can extend existing semantics (like article and figure) to allow those services to have specialized access to the annotated content.

This can appear confusing, so let’s think about a real-world example. Let’s say your site includes reviews of movies. You might have each review in an article element, with a number of stars or a percentage score for your review. But when a machine comes along, like Google’s search spider, it has no way of knowing which part of your content is the actual review—all it sees is a bunch of text on the page.

Why would a machine want to know what you thought of a movie? It’s worth considering that Google has recently started displaying richer information in its search results pages, in order to provide searchers with more than just textual matches for their queries. It does this by reading the review information encoded into those sites’ pages using Microdata or other similar technologies. An example of movie review information is shown in Figure C.1.

Google leverages Microdata to show additional information in search results

Figure C.1. Google leverages Microdata to show additional information in search results

By using Microdata, you can specify exactly which parts of your page correspond to reviews, people, events, and more—in a consistent vocabulary that software applications can understand and make use of.

Aren’t HTML5’s semantics enough?

You might be thinking that if a specific element is unavailable using existing HTML, then how useful could it possibly be? After all, the HTML5 spec now includes a number of new elements to allow for more expressive markup. But the creators of HTML5 have been careful to ensure that the elements that are part of the HTML5 spec are ones that will most likely be used.

It would be counterproductive to add elements to HTML that would only be used by a handful of people. This would unnecessarily bloat the language, making it unmaintainable from the perspective of a specification author or standards body.

Microdata, on the other hand, allows you to create your own custom vocabularies for very specific situations—situations that aren’t possible using HTML5’s semantic elements. Thus, existing HTML elements and new elements added in HTML5 are kept as a sort of semantic baseline, while specific annotations can be created by developers to target their own particular needs.

The Microdata Syntax

Microdata works with existing, well-formed HTML content, and is added to a document by means of name-value pairs (also called properties). Microdata does not allow you to create new elements; instead it gives you the option to add customized attributes that expand on the semantics of existing elements.

Here’s a simple example:

<aside itemscope>
  <h1 itemprop="name">John Doe</h1>
  <p><img src="http://www.sitepoint.com/bio-photo.jpg" 
↵alt="John Doe" itemprop="photo"></p>
  <p><a href="http://www.sitepoint.com" itemprop="url">Author’s 
↵website</a></p>
</aside>

In the example above, we have a run-of-the-mill author bio, placed inside an aside element. The first oddity you’ll notice is the itemscope attribute. This identifies the aside element as the container that defines the scope of our Microdata vocabulary. The presence of the itemscope attribute defines what the spec refers to as an item. Each item is characterized by a group of name-value pairs.

The ability to define the scope of our vocabularies allows us to define multiple vocabularies on a single page. In the example above, all name-value pairs inside the aside element are part of a single Microdata vocabulary.

After the itemscope attribute, the next item of interest is the itemprop attribute, which has a value of name. At this point, it’s probably a good idea to explain how a script would obtain information from these attributes, as well as what we mean by “name-value pairs.”

Understanding Name-Value Pairs

A name is a property defined with the help of the itemprop attribute. In our example, the first property name happens to be one called name. There are two additional property names in this scope: photo and url.

The values for a given property are defined differently, depending on the element the property is declared on. For most elements, the value is taken from its text content; for instance, the name property in our example would get its value from the text content between the opening and closing <h1> tags. Other elements are treated differently.

The photo property takes its value from the src attribute of the image, so the value consists of a URL pointing to the author’s photo. The url property, although defined on an element that has text content (namely, the phrase “Author’s website”), doesn’t use this text content to determine its value; instead, it gets its value from the href attribute.

Other elements that don’t use their associated text content to define Microdata values include meta, iframe, object, audio, link, and time. For a comprehensive list of elements that obtain their values from somewhere other than the text content, see the Values section of the Microdata specification.

Microdata Namespaces

What we’ve described so far is acceptable for Microdata that’s not intended to be reused, but that’s a little impractical. The real power of Microdata is unleashed when, as we discussed, third-party scripts and page authors can access our name-value pairs and find beneficial uses for them.

In order for this to happen, each item must define a type by means of the itemtype attribute. Remember that an item in the context of Microdata is the element that has the itemscope attribute set. Every element and name-value pair inside that element is part of that item. The value of the itemtype attribute, therefore, defines the namespace for that item’s vocabulary. Let’s add an itemtype to our example:

<aside itemscope itemtype="http://www.data-vocabulary.org/Person">
  <h1 itemprop="name">John Doe</h1>
  <p><img src="http://www.sitepoint.com/bio-photo.jpg" 
↵alt="John Doe" itemprop="photo"></p>
  <p><a href="http://www.sitepoint.com" itemprop="url">Author’s 
↵website</a></p>
</aside>

In our item, we’re using the URL http://www.data-vocabulary.org/, a domain owned by Google. It houses a number of Microdata vocabularies, including Organization, Person, Review, Breadcrumb, and more.

Further Reading

This brief introduction to Microdata barely does the topic justice, but we hope it will provide you with a taste of what’s possible when extending the semantics of your documents with this technology.

It’s a very broad topic that requires reading and research outside of this source. With that in mind, here are a few links to check out if you want to delve deeper into the possibilities offered by Microdata:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset