Chapter 3

Transforming with Style (Stylesheets, That Is)

In This Chapter

bullet Introducing stylesheets

bullet Constructing a stylesheet

bullet Assigning namespaces

bullet Looking at XML documents as trees

X SLT plays in a sandbox called a stylesheet, and you stay within that sandbox for the rest of this book. You could skip across this sandbox with childlike fervor, but that would be messy and get sand everywhere. I suggest taking off your shoes and taking one step (or building one part of your castle) at a time. This way, you can divide and conquer the three major areas of XSLT: stylesheets, template rules, and XPath patterns. And that step-by-step approach is exactly the goal of the next three chapters.

In this chapter, you start off by finding out about stylesheets and focusing on the major issues related to stylesheets as a whole.

Structure of a Stylesheet

An XSLT stylesheet has a well-defined structure. Perhaps the easiest way to make sense of this structure is to compare it to something you are familiar with already, such as an ordinary document.

A document is made up of one or more paragraphs. A paragraph is a division of a document that contains one or more sentences that express a unified thought. However, not all sentences in a well-crafted paragraph are created equal. Traditionally, the first sentence holds a unique responsibility to lead the rest of the sentences by introducing a new subject or idea. The rest of the paragraph then expands upon this idea.

When you look at an XSLT stylesheet, you’ll find a comparable structure. At the top level is a stylesheet, which acts as the overall container for XSLT code, much like a document serves as a container for all the sentences inside it. Whereas a paragraph is the primary component of a document, a template rule is the basic building block of a stylesheet. And, like the first sentence in a paragraph, the match pattern defines where the template rule is going. Figure 3-1 highlights these layers of a stylesheet.

Figure 3-1: Structure of an XSLT stylesheet.

Figure 3-1: Structure of an XSLT stylesheet.

Taking this analogy a step further, there are some elements in a document that aren’t paragraphs per se. In a normal business letter, for example, the return address, date, greeting, and signature are all distinct, required elements but do not fit the definition of a paragraph. In the same way, an XSLT stylesheet has additional elements, such as xsl:output, that are valid to use but do not fit inside template rules.

Constructing Your XSLT Stylesheet

As you read in Chapter 1, an XSLT stylesheet is a well-formed XML document. By convention, it has an .xsl file extension.

xsl:stylesheet element

The xsl:stylesheet element serves as the topmost element (or document element) of an XSLT stylesheet. The shell of any XSLT stylesheet consists of:

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”> 

<!-- XLST code goes here-->

</xsl:stylesheet> 

As you can see in the preceding snippet, an xsl:stylesheet element must have two parts defined:

bullet Namespace: In the preceding code, the XSLT namespace is defined as xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” (Don’t worry about what a namespace is just yet; you’ll find out about namespaces later in this chapter.)

bullet Version: The version attribute defined providing the version of XSLT used, which is currently 1.0.

This information tells the XSLT processor how to process the stylesheet.

Alternatively, you can also use the xsl:transform element, which is synonymous to xsl:stylesheet:

<xsl:transform xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”> 

<!-- XLST code goes here-->

</xsl:transform> 

Tip

Although both xsl:stylesheet and xsl:transform are valid, xsl:stylesheet is by far the most commonly used of the two elements. I use xsl:stylesheet throughout this book.

Top-level elements

An xsl:stylesheet element contains all the XSLT code that appears in the stylesheet. By and large, the basic building block of the stylesheet is the template rule, defined by using the xsl:template, but you can actually add 11 additional XSLT elements directly inside the xsl:stylesheet element. These elements are called top-level elements and are shown in Table 3-1.

Table 3-1 Top-level XSLT Elements
Element Definition
xsl:template Defines a template rule.
xsl:output Specifies the output format for the result document.
xsl:variable Defines a variable.
xsl:param Defines a parameter, which is a special kind of variable.
xsl:import Loads an external stylesheet.
xsl:include Loads an external stylesheet as part of the current
stylesheet.
xsl:preserve-space Preserves whitespace in the result document.
xsl:strip-space Removes whitespace in the result document.
xsl:key Defines a key that can be used to link together XML
elements.
xsl:decimal-format Defines the decimal format to use when converting
numbers to strings.
xsl:namespace-alias Maps a namespace to another namespace.
xsl:attribute-set Defines a named set of attributes for use in the result
document.

The following code snippet shows an XSLT stylesheet with some of these top-level elements defined:

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”> 

  <xsl:output method=”html”/>

  <xsl:preserve-space elements=”chapters”/>

  <xsl:template match=”book”>

    <p><xsl:apply-templates/></p>

  </xsl:template>

  <xsl:include href=”moretemplates.xsl”/>

</xsl:stylesheet>

Generally, you can put the top-level elements in any sequence you wish. The XSLT processor processes these elements the same way regardless of order. For example, if I move the elements around, the code generates the same results as the original:

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”> 

  <xsl:include href=”moretemplates.xsl”/>

  <xsl:template match=”book”>

    <p><xsl:apply-templates/></p>

  </xsl:template>

  <xsl:preserve-space elements=”chapters”/>

  <xsl:output method=”html”/>

</xsl:stylesheet>

However, there are a couple of exceptions to this rule, which tend to occur only in advanced situations. Specifically, when you use the xsl:import element, it must be the first top-level element defined under xsl:stylesheet. Also, in some error checking routines, element placement can become critical.

Comments

A comment is text included in your stylesheet for behind-the-scenes use that the XSLT processor ignores during the transformation. In stylesheets, people typically use comments to label a template rule or other part of the code describing its functionality. Just like in HTML, a comment is any text surrounded by a <!-- prefix and --> suffix. For example, the heavily commented XSLT stylesheet shown here produces the same results as the preceding example:

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”> 

  <!-- Developed by: R. Wagner -->

  <!-- Last modified: 0 4/22 -->

  <!-- This stylesheet will output an HTML document using several

       template rules, one defined in this file and the others from

       moretemplates.xsl -->

  <!-- Output document to HTML format -->

  <xsl:output method=”html”/>

  <!-- Preserve space for chapters elements -->

  <xsl:preserve-space elements=”chapters”/>

  <!--- For each book element, surround its content with 

     HTML paragraph tags -->

  <xsl:template match=”book”>

    <p><xsl:apply-templates/></p>

  </xsl:template>

  <!-- Include more template rules, which

       are stored in a separate file -->

  <include href=”moretemplates.xsl”/>

</xsl:stylesheet>

When I say the processor ignores any comment, I mean it. You can even insult the processor with <!-- Hey processor, you’re a loser! --> and this still doesn’t impact its performance. Now that’s service.

Tip

Use comments freely. As you can see by the preceding examples, comments make XSLT code much more readable than without it. This is especially true if you are trying to read a stylesheet someone else wrote.

What’s in a Name(space)?

You’ve probably noticed that something has come before each of the XSLT elements you have seen and worked with so far in this book. Yes, that xsl: prefix is part of what is known as a namespace. XML uses namespaces to distinguish one set of element names from another set.

The necessity for namespaces becomes clear when you think about the flexibility of the XML language. Because XML has no predefined set of element names, naming is totally up to each developer, whether he or she lives in San Jose, Lucerne, or Ouagadougou. So the possibility of two different developers using an identical element name is very high.

For example, suppose a satellite dish company develops an XML vocabulary using dish as an element, while a housewares company has its own dish element to describe the round thing off which you eat. Now, if these companies never exchange data with anyone outside their companies, they can use their XML vocabulary as is and don’t need to use a namespace. But if they wish to exchange data with outside suppliers, the possibility of duplicate element names exists when this data is merged.

Namespaces were developed to avoid this name collision by linking a namespace identifier with a URI (Uniform Resource Identifier). A URI is the method in which resources on the Web are named, the most widespread form being URLs (Uniform Resource Locators), also known simply as Web addresses (for example, http://www.dummies.com). Because URIs are unique, you can be sure that the namespace associated with the URI is one of a kind.

When you define a namespace, you declare a URI once in a document and then refer to the namespace elsewhere in the document by using a namespace prefix (also known as namespace identifier or abbreviation), as shown in Figure 3-2.

Figure 3-2: A URI is linked to a namespace prefix.

Figure 3-2: A URI is linked to a namespace prefix.

Although you aren’t required to use namespaces as part of an XML document, you must use them as part of an XSLT stylesheet. If you don’t use namespaces in your stylesheet, for example, it would be impossible to tell if you wanted to use <key> to describe the XSLT key element or to refer to an item in a hardware store inventory.

XSLT stylesheets use the URI http://www.w3.org/1999/XSL/Transform and, by convention, assign this URI to an xsl: namespace identifier, as shown in Figure 3-3.

The mapping from the namespace identifier to the URI is what is important, not the literal namespace identifier xsl:. The xsl: identifier can actually be any label you choose. For example, the following is a perfectly valid XSLT stylesheet:

<richsmostexcellenttransform:stylesheet xmlns:richsmostexcellenttransform=”http://www.w3.org/1999/XSL/Transform” version=”1.0”> 

  <richsmostexcellenttransform:template match=”book”>

    <p><richsmostexcellenttransform:apply-templates/></p>

  </richsmostexcellenttransform:template>

</richsmostexcellenttransform:stylesheet>

Figure 3-3: XSLT stylesheet namespace.

Figure 3-3: XSLT stylesheet namespace.
Tip

Although you are free to use any prefix you like, I recommend sticking with xsl:. It is the prefix you’ll see everywhere else. Using an alternative could easily lead to confusion.

Documents as Trees

XSLT processors don’t read a document like you and I do. When I read a document, I start at the top of the page and read from left to right, line by line down the page to the end. I make sense of a document by reading its words, sentences, and paragraphs in sequence. (Okay, I admit it, I love those Dr. Seuss books the best, because I can get by just looking at pictures.)

In contrast, an XSLT processor does not read a document sequentially, but swallows it as a hierarchy of information. This hierarchy is best thought of as a tree, and in fact, XSLT uses tree lingo as a way to describe an XML document.

Remember

A solid grasp of document trees can help you realize that XSLT and XPath don’t do their work by using smoke and mirrors, but actually follow a logical, understandable process. In fact, a good alternative title for this section is “Read This! This Section Is Important.”

Treespeak

A common tree, be it a maple, oak, or elm, has a certain built-in structure or hierarchy to its various parts. At the bottom layer, a root system serves as the hidden foundation, supplying water and nutrients to the rest of the tree. Connected to the roots is the trunk, the most visible part of the support system. At the next level, you have branches of all shapes and sizes, which either connect to smaller branches or else directly to leaves. These smaller branches then lead to leaves or even tinier branches, and so on. Starting at the trunk, you can locate each branch and leaf somewhere in its complex hierarchy of parts.

An XML document follows this same pattern. Every XML document has its own counterpart to a trunk, an element commonly referred to as the document element. The document element encloses all the other elements inside its start and end tags. For example, doc in the snippet below is the document element because it contains all of the other elements in the document:

<doc>

  <para>Text1</para>

  <para>Text2</para>

</doc>

As you have seen already in this chapter, an xsl:stylesheet element contains template rules and all other parts of an XSLT stylesheet, so it acts as the document element of an XSLT stylesheet.

Remember

Just as a tree cannot have two trunks, neither can an XML document have two document elements. A well-formed XML document can have only a single document element that contains all of the other elements.

Top-level elements nested directly inside a document element are the equivalent of the first level of branches of a tree. Some of these elements (also called nodes in treespeak) contain additional elements, like smaller branches on a tree.

Even a tree’s roots have an XML counterpart. Each document has something called a root node that contains all elements, including the document element. The root node, however, is invisible to your document, because no element or text represents the root node in the document.

Table 3-2 summarizes the comparison between a real tree and an XML one.

Table 3-2 Treespeak
Real World XML World
Tree Document
Roots Root node
Trunk Document element
Branch Node that contains more nodes or leaves
Leaf A node that has no children (also called leaf)

Familyspeak

Just as a leaf cannot survive apart from a tree, a node cannot exist in isolation. Each node of a tree is related in some way to the other nodes that surround it in the document structure. The terminology used to describe these relationships comes straight from The Waltons or The Simpsons: ancestor, parent, sibling, and child. I like to call this terminology familyspeak.

Using familyspeak, you can say that a tree trunk is the parent of all the branches connected directly to it. Each of these attached branches is a child of the trunk and a sibling to the others. Any given branch typically has children as well, which may be either branches or leaves.

To illustrate the interrelationships of an XML document, consider a family tree expressed in XML:

<!-- familytree.xml --> 

<family>

  <member firstname=”Peter” surname=”Selim” birth=”1815”>

    <spouse firstname=”Maja” surname=”Jonsdotter”/>

    <member firstname=”Carl” surname=”Selim” birth=”1845”>

      <spouse firstname=”Joannah” surname=”Lund” birth=”1844”/>

      <member firstname=”Hannah” surname=”Selim”/> 

      <member firstname=”David” surname=”Selim”/> 

      <member firstname=”Selma” surname=”Selim”/> 

      <member firstname=”Ellen” surname=”Selim”/> 

      <member firstname=”Charlie” surname=”Selim” birth=”1869”>

        <spouse firstname=”Hannah” surname=”Carlsdotter” birth=”1865”/>

        <member firstname=”George” surname=”Selim” birth=”1898”></member> 

          <spouse firstname=”Dagmar” surname=”Selim” birth=”1898”/>

        <member firstname=”Paul” surname=”Selim”/> 

        <member firstname=”Pearl” surname=”Rohden” birth=”1897”/>

        <member firstname=”Frances” surname=”Lambert” birth=”1903”/> 

        <member firstname=”Gladys” surname=”Carlson” birth=”1906”></member> 

          <spouse firstname=”Hilmer” surname=”Carlson” birth=”1906”/>

          <member firstname=”Patricia” surname=”Gustafson”>

            <spouse firstname=”Lauren” surname=”Gustafson”/>

          </member>   

          <member firstname=”Wayne” surname=”Carlson”/>

          <member firstname=”Janet” surname=”Olsen”/>

          <member firstname=”Linda” surname=”Zatkalik”/>

          <member firstname=”Eunice” surname=”Shafer”>

        </member> 

      </member> 

    </member>

  </member> 

</family>

This family tree has a parent element called family, which serves as the container for everything in that family. Peter Selim is the oldest recorded ancestor of this family, which is demonstrated by the <member firstname=”Peter” surname=”Selim” birth=”1815”> being the first member element of this family tree and serving as the ancestor for all the rest of the member elements.

Peter had a spouse named Maja and one son named Carl. Carl and his wife Joannah had five children, one of whom had children himself, and so on, down the family tree. So, for example, Eunice Shafer, the last child element on the tree had a parent named Gladys Carlson, and Eunice’s siblings were Patricia, Wayne, Janet, and Linda. Peter Selim is a distant ancestor to Eunice.

Each of these family relationships — from Peter to Eunice — are interconnected. You find out later in this book that traversing this tree through these interrelationships is an important part of XSLT.

Nodes ‘R’ us

Now that you’ve got treespeak and familyspeak down, you can take a closer look at what a node is. In a general sense, a node is a point in a larger system. A leaf seems like an obvious node in a tree. However, in XML, each part of an XML document structure is a node, be it the trunk, branch, or leaf. Also, to make matters slightly more complicated, even attributes of a these parts are considered nodes. In fact, there are actually six different node types: element, attribute, namespace, processing instruction, comment, and text.

The XML snippet below contains several of these node types:

<?xml version=”1.0” encoding=”UTF-8”?>

<film name=”Braveheart”>

  <!-- Last modified 2/01 --> 

  <storyline>William Wallace unites the 13th Century Scots in their battle to overthrow English rule.</storyline>  

</film>

This snippet can be expressed as a tree structure, as shown in Figure 3-4.

Figure 3-4: Node hierarchy of the family tree XML file.

Figure 3-4: Node hierarchy of the family tree XML file.

You’ll notice one difference between the tree structure shown in Figure 3-4 and the XML code — the additional text nodes. These text nodes actually represent the “hidden text” in between the various elements. Although no actual text is between the film element and the comment or between the comment and the storyline element, invisible carriage return characters are present to start a new code line. These characters are considered part of the XML document by default, so they’re not added to the document tree. (In Chapter 13 you find out how to tweak some of these whitespace settings.)

Tip

An XML document tree is often called a Document Object Model (or DOM). Although this may sound like technospeak, a DOM is only the exercise of looking at a document in a tree-like manner.

TechnicalStuff

Because the XSLT processor reads the document as a tree, it has the entire tree available throughout the transformation process. This tree-like approach is much different than simpler XML parsers, such as the Simple API for XML (SAX), which reads an XML document sequentially and therefore deals with elements one at a time.

Working with trees

You need a solid understanding of document trees so that you know how XSLT works to get information from the XML source document and to output it in the result document. But, fortunately, you never actually have to worry about the mechanics of traversing the document tree (a practice sometimes called tree walking).

Many other programming languages make use of tree structures to describe hierarchical information. Yet working with trees can be a complex task if you have to write the code to actually walk through the tree, making sure every nook and cranny in it is found.

Certainly one of the tremendous benefits of XSLT is that it removes the burden of tree walking by doing all this hard stuff for you. You get to say, “I want all the nodes that match this pattern,” and XSLT then goes off to handle the request. You don’t need to concern yourself with the implementation details of this process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset