Microdata is another technology that’s rapidly gaining adoption and support, but unlike WAI-ARIA, it’s actually part of HTML5. The Microdata Specification is still early in development, but it’s worth mentioning this technology here, because it provides a peek into what may be the future of document readability and semantics.
In the spec, Microdata is defined as a mechanism that “allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model.”
With Microdata, page authors can add specific labels to HTML elements
to annotate them so that they are able to be read by machines or bots. This
is done by means of a customized vocabulary. For example, you might want a
script or other third-party service to be able to access your pages and
interact with specific elements on the page in a certain manner. With
Microdata, you can extend existing semantics (like article
and figure
) to allow those services to have
specialized access to the annotated content.
This can appear confusing, so let’s think about a real-world example.
Let’s say your site includes reviews of movies. You might have each review
in an article
element, with a number of
stars or a percentage score for your review. But when a machine comes along,
like Google’s search spider, it has no way of knowing which part of your
content is the actual review—all it sees is a bunch of text on the
page.
Why would a machine want to know what you thought of a movie? It’s worth considering that Google has recently started displaying richer information in its search results pages, in order to provide searchers with more than just textual matches for their queries. It does this by reading the review information encoded into those sites’ pages using Microdata or other similar technologies. An example of movie review information is shown in Figure C.1.
By using Microdata, you can specify exactly which parts of your page correspond to reviews, people, events, and more—in a consistent vocabulary that software applications can understand and make use of.
You might be thinking that if a specific element is unavailable using existing HTML, then how useful could it possibly be? After all, the HTML5 spec now includes a number of new elements to allow for more expressive markup. But the creators of HTML5 have been careful to ensure that the elements that are part of the HTML5 spec are ones that will most likely be used.
It would be counterproductive to add elements to HTML that would only be used by a handful of people. This would unnecessarily bloat the language, making it unmaintainable from the perspective of a specification author or standards body.
Microdata, on the other hand, allows you to create your own custom vocabularies for very specific situations—situations that aren’t possible using HTML5’s semantic elements. Thus, existing HTML elements and new elements added in HTML5 are kept as a sort of semantic baseline, while specific annotations can be created by developers to target their own particular needs.
Microdata works with existing, well-formed HTML content, and is added to a document by means of name-value pairs (also called properties). Microdata does not allow you to create new elements; instead it gives you the option to add customized attributes that expand on the semantics of existing elements.
Here’s a simple example:
<aside itemscope> <h1 itemprop="name">John Doe</h1> <p><img src="http://www.sitepoint.com/bio-photo.jpg" ↵alt="John Doe" itemprop="photo"></p> <p><a href="http://www.sitepoint.com" itemprop="url">Author’s ↵website</a></p> </aside>
In the example above, we have a run-of-the-mill author bio, placed
inside an aside
element. The first
oddity you’ll notice is the itemscope
attribute. This identifies the aside
element as the container that defines the scope of
our Microdata vocabulary. The presence of the itemscope
attribute defines what the spec
refers to as an item. Each item is characterized by
a group of name-value pairs.
The ability to define the scope of our vocabularies allows us to
define multiple vocabularies on a single page. In the example above, all
name-value pairs inside the aside
element are part of a single Microdata vocabulary.
After the itemscope
attribute,
the next item of interest is the itemprop
attribute, which has a value of
name
. At this point, it’s probably a
good idea to explain how a script would obtain information from these
attributes, as well as what we mean by “name-value pairs.”
A name is a property defined with the help of the itemprop
attribute. In our example, the first
property name happens to be one called name
. There are
two additional property names in this scope: photo
and
url
.
The values for a given property are defined differently, depending
on the element the property is declared on. For most elements, the value
is taken from its text content; for instance, the name
property in our example would get its value from the text content between
the opening and closing <h1>
tags. Other
elements are treated differently.
The photo
property takes its value from the
src
attribute of the image, so the
value consists of a URL pointing to the author’s photo. The
url
property, although defined on an element that has
text content (namely, the phrase “Author’s website”), doesn’t use this
text content to determine its value; instead, it gets its value from the
href
attribute.
Other elements that don’t use their associated text content to
define Microdata values include meta
,
iframe
, object
, audio
, link
, and time
. For a comprehensive list of elements
that obtain their values from somewhere other than the text content, see
the Values section of
the Microdata specification.
What we’ve described so far is acceptable for Microdata that’s not intended to be reused, but that’s a little impractical. The real power of Microdata is unleashed when, as we discussed, third-party scripts and page authors can access our name-value pairs and find beneficial uses for them.
In order for this to happen, each item must define a
type by means of the itemtype
attribute. Remember that an item in
the context of Microdata is the element that has the itemscope
attribute set. Every element and
name-value pair inside that element is part of that item. The value of the
itemtype
attribute, therefore,
defines the namespace for that item’s vocabulary. Let’s add an itemtype
to our example:
<aside itemscope itemtype="http://www.data-vocabulary.org/Person">
<h1 itemprop="name">John Doe</h1>
<p><img src="http://www.sitepoint.com/bio-photo.jpg"
↵alt="John Doe" itemprop="photo"></p>
<p><a href="http://www.sitepoint.com" itemprop="url">Author’s
↵website</a></p>
</aside>
In our item, we’re using the URL http://www.data-vocabulary.org/, a domain owned by Google. It houses a number of Microdata vocabularies, including Organization, Person, Review, Breadcrumb, and more.
This brief introduction to Microdata barely does the topic justice, but we hope it will provide you with a taste of what’s possible when extending the semantics of your documents with this technology.
It’s a very broad topic that requires reading and research outside of this source. With that in mind, here are a few links to check out if you want to delve deeper into the possibilities offered by Microdata: