11. Microdata

Saturday, October 9th 2010, just before half past eight in the evening. Pat Metheny steps onto the stage of the sold-out Community Theater in Morristown, NJ. The stage is decorated with a Persian rug and heavy red drapes. In the background we spot a piano, two vibraphones, various small instruments, and several strange objects resembling organ pipes, pharmaceutical jars, or rocket launchers.

The setting seems rather strange, because the long-time companions of the ingenious guitarist are missing: no Antonio Sanchez on the drums, no Steve Rodby on the bass, and no Lyle Mays on the piano. Instead of the Pat Metheny Group in flesh and blood, we now see an army of machines, small hammers, and LEDs which are activated in turn to bring the surrogate human artists to life. By the time we have listened to the obligatory solo on the 42-string guitar, just as the red curtain is lifting, the whole dimension of the enterprise Orchestrion hits home: This will be a fascinating evening full of awe and wonder.

This could be the beginning of a fictitious review of a concert in an equally fictitious blog: two paragraphs full of information, filtered and combined automatically by the reader whilst reading. The event is defined in terms of time and location; objects, instruments, and events on stage are recognized and people mentioned in the text are identified as a matter of course as musicians with their respective instruments. The human brain is trained to filter information efficiently. Computers are not and require help to filter information. This help basically boils down to marking and correlating the relevant information.

Which information is relevant depends entirely on what we want to filter out of the text. For a diary it would be the name of the event, its time, and place; for an address book, the contact details of the musicians; and for searching for new CDs to add to your music collection, you need the names of the artists and bands. One option for offering the quintessence of a text in the relevant context and in machine-readable form is microdata—a very young and emotionally debated feature of HTML5.

In the eyes of many critics, microdata stands in direct competition with RDFa, the Resource Description Framework, another option of embedding metadata. Its close connection to XHTML makes it especially difficult to fit in with the concept of HTML5, which lacks the namespaces used abundantly in RDFa. The result of the tug-of-war between the two approaches is, not surprisingly, two specifications, with microdata present both as an integrated WHATWG version and as a W3C stand-alone version, whereas RDFa can only be found in the W3C. The links to the specifications are

http://www.w3.org/TR/microdata

http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#microdata

http://www.w3.org/TR/rdfa-in-html

The a in RDFa stands for attributes, which brings us to the feature both techniques have in common. Both RDFa and microdata use a set of attributes to define metadata. In RDFa, this metadata is present as a triple of subject, predicate, and object. As explained in Wikipedia with regard to the Resource Description Framework, the subject denotes the resource (Pat Metheny), the predicate denotes traits or aspects of the resource (musician), and the object expresses a relationship between the subject and the object (Orchestrion). With microdata, the information ends up as name-value pairs, such as Pat Metheny : musician or Pat Metheny : Orchestrion. Which of the two approaches will ultimately prevail is uncertain. Both techniques have advantages and disadvantages, and could also coexist. But because microdata can already be integrated seamlessly into HTML5, we will concentrate on microdata in this chapter.

11.1 The Syntax of Microdata

If we take the text at the beginning of the chapter and add a few links, an image, and the signature of the blog author, we end up with a complete, fictitious blog entry, as shown in Figure 11.1. It will accompany us through an explanation of the microdata syntax.

Figure 11.1 Screen shot of fictitious blog entry about Pat Metheny’s Orchestrion Tour

image

11.1.1 The Attributes “itemscope” and “itemprop”

We first need to define the area relevant for microdata. Structuring elements are suitable, as are container elements, such as div or p. In our example, we chose an article element, which encloses the entire blog entry. The required attribute for defining the scope starts with item—just as the other four microdata attributes—has the type boolean, and is called, rather logically, itemscope:

<article itemscope>
...
</article>

The itemscope defines a new group of name-value pairs, also called items in the specification. The corresponding values are supplied by itemprop attributes, where prop means properties. If we want to mark all musicians in the text as musicians, we therefore require four itemprop attributes, which we insert in a suitable place. If no suitable elements are readily available, we first need to create them as a span or div element. So “Pat Metheny” becomes “<span>Pat Metheny</span>” in the HTML code, an addition that does not affect the text layout and allows us to specify an itemprop attribute for the span element. Unlike itemscope, itemprop is not a boolean attribute, but defines the name of the corresponding property via its attribute value:

<article itemscope>
... <span itemprop=musician>Pat Metheny</span> steps onto the stage ...
... <span itemprop=musician>Antonio Sanchez</span> on the drums ...
... <span itemprop=musician>Steve Rodby</span> on the  bass ...
... <span itemprop=musician>Lyle Mays</span> on the piano ...
</article>

Our first microdata example is now complete, and the question arises as to how this metadata could be interpreted by a search engine spider that indexes the blog entry. Philip Jägenstedt’s Live Microdata viewer, from now on referred to as microdata viewer, helps us visualize the data structure. This is an online application where we can copy code fragments with microdata content into a text field, making hidden microdata visible in JSON notation. You should probably save the link http://foolip.org/microdatajs/live as a bookmark: You will need it for testing all code examples.


Tip

image

To avoid having to painstakingly retype the example texts, all HTML code fragments in this chapter are available as a plain text file online so they can easily be copied into the microdata viewer. The individual fragments are listed in the same order as they appear in the chapter. The link to the file is http://html5.komplett.cc/code/chap_microdata/fragments_en.txt.


If we copy the second HTML fragment from the file fragments_en.txt into Philip Jägenstedt’s microdata viewer, the JSON notation shows the following structure:

{
  "items":[{
      "properties":{
        "musician":["Pat Metheny",
          "Antonio Sanchez",
          "Steve Rodby",
          "Lyle Mays"
        ]
      }
    }
  ]
}

At first glance, the many curly and square brackets may seem confusing, but they disclose the metadata structure very clearly if you look closer. Each entry (“items”) consists of an array of properties (“properties”), which are in turn made up of name-value pairs with the name of the property (“musician”) and the corresponding values (“Pat Metheny,” “Antonio Sanchez,” “Steve Rodby,” “Lyle Mays”) as an array.

Some HTML elements automatically define the value of the specified property as soon as an itemprop attribute is assigned to them. Let’s use the blog entry’s descriptive picture to test the microdata viewer and give it an itemprop attribute image:

<article itemscope>
  <img itemprop=image src=icons/orchestrion.jpg alt=...>
</article>

This automatically gives us the value of the src attribute as the value for the property image. In addition to the img element, there are a number of other elements to which this behavior applies. You can see them in Table 11.1.

Table 11.1 Elements with special “itemprop” values

image

Let’s turn back to the spider, which is now indexing our microdata-filled blog. It won’t know what to make of the items musician and image. The reason is because we have defined our own microdata terms that have meaning only to us. To be able to use microdata sensibly, we need standardized dialects that can be comprehended by our spider, just as by an intelligent mail program that automatically extracts e-mail addresses encoded as microdata if you drag a URL into its address book, or by a diary able to recognize diary dates by the same method.

11.1.2 The “itemtype” Attribute

We do not have far to go during our search for standardized dialects. The WHATWG’s Microdata specification already contains three of them: vCard for contact information, vEvents for dates of events, and a third one for specifying licenses of a work. A multitude of other dialects can be found in the microformats community at http://microformats.org. But in contrast to microdata, these dialects are defined in the microformats scheme, making lavish use of class and rel attributes for determining metadata structure.

With the attribute itemtype, you determine that the existing microdata follows a standardized vocabulary. As an attribute value, itemtype expects a URL for the corresponding standard. vCard and vEvent reflect the close link between microdata and microformats, because both profiles in the specification refer directly to microformats.org:

http://microformats.org/profile/hcard

http://microformats.org/profile/hcalendar#vevent

Let’s try to code a vEvent for the concert in our blog entry. We need to add the correct itemtype and then specify the itemprop attributes in accordance with the hCalendar specification:

<article itemscope
 itemtype=http://microformats.org/profile/hcalendar#vevent>
 <time itemprop=dtstart
  datetime="2010-10-09T20:30:00-04:00">
  Saturday, October 9th 2010, just before half past eight in the
evening
 </time> ...
 <span itemprop=location>Community Theater</span> in
 <span itemprop=location>Morristown, NJ</span>...
 <span itemprop=summary>Orchestrion</span> ...
</article>

If we copy this microdata fragment into the microdata viewer, we can see another output option next to the JSON notation, this time in iCal format, which could be seamlessly imported into your own calendar:

BEGIN:VCALENDAR
PRODID:jQuery Microdata
VERSION:2.0
BEGIN:VEVENT
DTSTAMP;VALUE=DATE-TIME:20101227T205755Z
DTSTART;VALUE=DATE-TIME:20101009T2030000400
LOCATION:Community Theater
LOCATION:Morristown, NJ
SUMMARY:Orchestrion
END:VEVENT
END:VCALENDAR

The conversion of microdata to the iCal format is based on Philip Jägenstedt’s JavaScript library microdatajs, which is also the core of the microdata viewer. You can download it from http://gitorious.org/microdatajs.

On this occasion we can also write the license for this library as a microdata structure. The rules for the vocabulary are in the WHATWG version of the microdata specification in the section Licensing works and require as an itemtype the URL http://n.whatwg.org/work plus the keywords work, title, author, and license as itemprop attributes:

<div itemscope itemtype=http://n.whatwg.org/work>
<a itemprop=work
 href="http://gitorious.org/microdatajs">
 <span itemprop=title>microdatajs</span></a> by
<span itemprop=author>Philip Jägenstedt</span>
<a itemprop=license
 href=http://creativecommons.org/licenses/publicdomain/>
 (<span>Public Domain</span>)</a>
</div>

The next example shows how microdata dialects can be used in combination. As part of a concert review, it makes sense to code the event as vEvent and the author of the review as vCard. The technique for nesting dialects is quite simple. If we want to define the itemProp attribute reviewer in the hReview dialect as vCard, we just have to add an itemScope attribute with the itemType of the vCard dialect to the same element and then add the desired entries of the vCard. The same applies for embedding vEvent into hReview and can be tested with the following code fragment in the microdata viewer:

<article itemscope
 itemtype=http://microformats.org/wiki/hreview>
 <div
  itemprop=item itemscope
  itemtype=http://microformats.org/profile/hcalendar#vevent>
  <span itemprop=summary>Orchestrion</span>,
  <time itemprop=dtstart
   datetime="2010-10-09T20:30:00-04:00">October 9th 2010
  </time>:
 </div>
 <span itemprop="summary">A fascinating evening</span>
 rated with <span itemprop="rating">5</span> stars out of 5 stars.
 <div itemprop=reviewer itemscope
  itemtype=http://microformats.org/profile/hcard>
  <span itemprop=fn>Nicos Thassofilakas</span>,
  <a href=http://openweb.cc itemprop=url>openWeb.cc</a>
 </div>
</article>

11.1.3 The “itemid” Attribute

As soon as a microdata structure has an itemtype attribute, items in the dialect used can be tagged with unique IDs via the itemid attribute. Examples of such IDs are the ISBN (International Standard Book Number) for books, the EAN (European Article Number) for identifying products, and the ASIN (Amazon Standard Identification Number) for orders within Amazon.

Valid values for the itemid attribute are URLs, including Uniform Resource Names (URN) with the prefix urn:. Using a fictitious vocabulary for describing books, the tablature of Pat Metheny’s solo album One Quiet Night could be identified via its unique ISBN number:

<div itemscope
     itemtype=http://vocab.example.net/book
     itemid="urn:isbn:978-0634066634">
<span itemprop=album>One Quiet Night</span> by
<span itemprop=artist>Pat Metheny</span>
(<time itemprop=pubdate datetime=2005-04-01>2005</time>,
<span itemprop=pages>88</span> pages)
</div>

11.1.4 The “itemref” Attribute

Often, it is not possible to accommodate all desired microdata information within one container element. With our blog entry, the itemscope attribute goes with the surrounding article element, and all associated itemProp attributes are within the article. If you want to include itemProp attributes outside of the article elements, you can use itemref attributes. Separated by commas, they contain a list of IDs of elements also to be searched for microdata contents. The connection between the itemscope attribute and a container element can then be removed completely:

<article>
 <div id=location>
  <span itemprop=member>Pat Metheny</span>
 </div>
 <div id=intro>
  <span itemprop=member>Antonio Sanchez</span>
  <span itemprop=member>Steve Rodby</span>
  <span itemprop=member>Lyle Mays</span>
  <span itemprop=band>Pat Metheny Group</span>
 </div>
</article>
<div itemscope itemref="location intro"></div>

In this example, the two paragraphs of the blog entry are divided into two div elements with the IDs location and intro. Within these div elements, the individual musicians are identified as members of the band Pat Metheny Group via itemprop attributes. The itemscope attribute is outside of the article and refers via the itemref attribute to the IDs of those areas where the actual information can be found. In complicated microdata structures, this option can be very useful.

11.2 The Microdata DOM API

As you would expect, the microdata structure of a document can also be explored via JavaScript via the microdata DOM API.

Accessing all top-level microdata items (that is, those items that have an itemscope attribute and are not part of another item) takes place via the method document.getItems(). It returns as a result a DOM-NodeList of elements found in the order in which they appear in the DOM tree. If we are only after items of a particular type, we could pass a list of desired itemtype attributes, separated by commas, in the getItems() call:

var allNodes = document.getItems();
var vCards = document.getItems(
  'http://microformats.org/profile/hcard'
);

Each element of the resulting NodeList allows access to the additional microdata attributes present for each element. Table 11.2 lists the attribute names and their content.

Table 11.2 Attributes of a top-level microdata item

image

Starting from the relevant top-level item, we can then work towards the properties defined with itemprop. We find these in item.properties, the so-called HTMLPropertiesCollection, which allows access to the name-value pairs of each property via additional interfaces. The elements are sorted according to their position in the DOM tree. Table 11.3 shows the required attributes and methods, and their content.

Table 11.3 Attributes and methods of “HTMLPropertiesCollection”

image

The last attribute in the microdata DOM API is itemValue. It allows access to the content of elements that have an itemprop attribute. If an element in a variable elem is a container—for example, article, div, or span—then elem.itemValue represents the text content that can also be changed.

You need to be careful if nested items are involved, because then the element concerned also has an itemscope attribute and the content of the element has to be interpreted independently, almost as a top-level item. The specification takes this into account and requires that in this case elem.itemValue makes a new item object available.

A second and last special case concerns HTML elements, which you have already encountered as elements with special status in the section on itemprop attributes. a, src, time, meta, and object belong to this category and assign their href, src, datetime, content, or data attribute to elem.itemValue, in contrast to the usual practice. The list of all representatives in this category are found in Table 11.1.

Summary

In this chapter we take a closer look at the syntax of microdata, a mechanism to add semantic markup to documents using a variety of global attributes. Starting with the boolean attribute itemscope that marks the area relevant for microdata and defines a new, empty group of name-value pairs—the so-called items—we then move on to the itemprop attribute that actually defines the name of the corresponding property via its attribute value.

We use the itemtype attributes to denote standardized vocabularies like vCard for contact information or vEvents for dates of events and itemid attributes to tag items in these dialects with unique IDs, such as ISBN or EAN numbers. Last but not least we finish the topic of microdata attributes with itemref, enabling us to specify a comma-separated list of element IDs also to be searched for microdata contents. A walk through the Microdata DOM API concludes this chapter and shows how you can easily access your microdata structure via JavaScript.

Unfortunately, no browser supports microdata at the time of this writing, so the only way to explore the many examples shown in this chapter is with Philip Jägenstedt’s Live Microdata viewer, which is available at http://foolip.org/microdatajs/live. We can only wait and see if and when microdata will eventually become established.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset