Built-in templates do things behind the scenes
Think of XML documents as trees
What a root node is and isn’t
Why the selected node isn’t the same as current node
XPath abbreviations
Location steps and paths
Using xsl:apply-templates, xsl:copy, xsl:copy-of, and xsl:value-of
How the axis dictates the order of the selected node set
Types of XPath expressions
When to use curly brackets
Whitespace in your result documents
X SLT may be logically structured, but it sure does have some peculiarities that can leave you scratching your head if you don’t consider them as you create your stylesheets.
A built-in template rule is like the man behind the screen in the Wizard of Oz — it takes action, but if you don’t realize it, you’ll be confused about what happened and why.
The XSLT processor uses built-in template rules to process any node that is not matched with a template rule you explicitly define in your stylesheet. Each node type has different built-in template rules that are applied to it:
Element nodes have a built-in template that changes child nodes everywhere (children of both the current and root nodes) by removing their tags but preserving their content.
Text and attribute nodes have a built-in template rule that copies their text straight into the result tree.
Processing instructions, comments, and namespaces have a built-in template rule that strips them from the result document.
See Chapter 4 for more information on built-in templates.
Always keep in mind that an XSLT processor doesn’t read an XML document sequentially — one tag at a time — as you or I do; instead, the processor treats the source like a tree-like structure of hierarchical information. Within that tree, relationships among the various parts dictate how the processor reads and navigates the document during the transformation process.
Each XML document has a main document element that contains all the other elements inside its open and close tags. An xsl:stylesheet element, for example, contains template rules and all other parts of an XSLT stylesheet, so it acts as the document element of an XSLT stylesheet. Child elements of the document element are the equivalent of the first-level branches of a tree. These child elements may also have children, much like smaller branches. The XSLT processor works its way through the entire tree until it retrieves each leaf and branch and assembles it based on this hierarchy.
Each leaf and branch in the document tree is called a node. Elements are the most common type of node that you work with, but there are actually seven different node types: element, attribute, namespace, processing instruction, comment, and text. With that in mind, an element node has children not only when it contains other elements, but also when it contains attributes and text.
If you want more information on document trees, check out Chapter 3.
At first glance, you may naturally look at the following XML snippet and conclude that animals is the root node of this document tree:
<animals>
<cats>
<tigers/>
<lions/>
<tabby/>
</cats>
<dogs>
<collie/>
<doberman/>
</dogs>
</animals>
Although animals is the highest level element (known as the document element), it is not the root node. The root node is a “built-in” node and automatically serves as the ancestor of all nodes in the document tree. You never actually see the root node show up in your document — it’s just there; a given. Therefore, in the preceding example, animals is a child of the root node.
To demonstrate, the following template rule uses / to retrieve the root node:
<xsl:template match=”/”>
<!-- Do something -->
</xsl:template>
When run on the preceding XML snippet, the animals element is not returned, but the root above it in the tree hierarchy.
The current node (or context node) of a document tree is the node that the XSLT processor is “on” during its walk through the tree. However, don’t confuse the current node with the selected node or nodes. The current node is the starting point for the XSLT processor for a given location step (an XPath expression used to retrieve nodes from a source tree), but it is the location step that actually determines what node or set of nodes is actually selected.
XPath allows you to use abbreviations to write the axis part of a location step. These shortcuts enable you to write XPath expressions more quickly, but they can also be confusing until you learn the clipped syntax. The ones to memorize appear in Table 18-1.
Axis | Abbreviation |
---|---|
child:: | Doesn’t need to be explicitly defined, |
so you can leave it off. | |
attribute:: | @ |
self::node() | . (single period) |
parent::node() | .. (double period) |
/descendant-or-self::node()/ | // |
For more information on axes, see Chapter 5.
When you create result trees, you may not be sure when to use xsl:apply-templates, xsl:copy, xsl:copy-of, xsl:value-of, and other XSLT instructions inside your template rules. The following guidelines can help you decide what to do:
Use xsl:apply-templates when you want to return the content and text nodes of the current element and its children, but not the surrounding element tags.
Use xsl:copy to preserve the current node’s start and end tags during processing, but not its children or attributes. Content inside the tags is included only if you add an xsl:apply-templates instruction inside the xsl:copy element.
Use xsl:copy-of when you want to copy the whole kit ’n caboodle — the current node’s tags, content, attributes, and children. This instruction copies all the nodes returned from its required select attribute.
Use xsl:value-of when you want to convert the result to text. The conversion process removes all tags and elements. If the result is a single node, its content is converted to text. If the result is a node set, the first node in the set is used in the conversion.
I explain these instructions fully in Chapter 4.
When the XSLT processor walk the tree to select nodes, the axis part of the location step specifies the direction in which the processor walks. Each of the following axes goes top-to-bottom, left-to-right, much like you read a page in this book: child, self, parent, descendant, following-sibling, following, and descendant-or-self. The remaining axes — ancestor, ancestor-or-self, preceding, and preceding-sibling — travel in reverse order. Finally, when working with attribute and namespace axes, the nodes are always unordered.
Chapter 5 gives you more information on axis values.
XPath is used to create expressions, but some types of expressions are more important to XSLT than others. In a generic sense, an expression is a string of XPath instructions that the XSLT processor evaluates to produce a result, which may be a number, string, Boolean value, or a node set. However, XSLT is most interested in a particular kind of expression called a location path, which is a set of instructions that specify what nodes to bring back to the XSLT stylesheet. The location path then consists of a series of smaller parts called location steps. A location step consists of an axis, a node test, and an optional predicate and takes the following form: axis::nodetest[predicate].
Check out Chapter 5 for more details on XPath expressions.
Curly braces are used in attribute value templates to tell the XSLT processor to evaluate what’s inside each of them as an expression, rather than as normal text. In the output tree, the curly braces and expression are replaced with a resulting string. However, keep in mind that curly braces only evaluate expressions inside attribute values, not outside them.
Consider the following XML snippet:
<film name=”Henry V”>
<director>Kenneth Branagh</director>
<runtime>137</runtime>
</film>
Suppose I want to transform the preceding source by using the following XSLT code:
<xsl:template match=”film”>
<!-- Curly braces work inside of attributes -->
<movie director=”{director}” length=”{runtime}”/>
<movie newlength=”{100+60}”/>
<!-- Curly braces do not work here -->
The director is {director} and length is {runtime} and newlength is {100+60}
<!-- Instead, use xsl:value-of outside of attributes -->
The director is <xsl:value-of select=”director”/> and length is <xsl:value-of select=”runtime”/> and newlength is <xsl:value-of select=”100+60”/>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match=”director”/>
<xsl:template match=”runtime”/>
The result is:
<movie director=”Kenneth Branagh” length=”137” /><movie newlength=”160” />
The director is {director} and length is {runtime}
The director is Kenneth Branagh and length is 137 and newlength is 160
In looking at the transformation, notice that the first part of the template surrounds the element name with curly braces to return the value of the director and runtime elements inside attribute values. In this context, XPath then evaluates director and runtime as element names rather than as plain text. Similarly, XPath evaluates 100+60 as an expression.
The second part of the template shows what happens when you try to use curly braces outside attribute values. These are simply treated as literal text in the output document.
The final part of the template illustrates how you use xsl:value-of to evaluate the same XPath expressions outside attribute values.
See Chapter 4 for more information on attribute value templates.
You need to think about several factors as you consider whitespace in your result document, because whitespace has origins in both your XSLT stylesheet and the underlying XML source document.
Inside the XSLT stylesheet, whitespace is usually stripped out of the template before any transformation occurs. However, whitespace is preserved in the following cases:
Text nodes that contain nonwhitespace characters.
Any whitespace text appearing inside a xsl:text element.
When the closest ancestor of a text node has an xml:space attribute with the value of preserve.
Whitespace inside the source XML document follows similar rules, except that you can declare default whitespace rules by using the xsl:preserve-space or xsl:strip-space instructions. Therefore, any text node that occurs inside the range of xsl:preserve-space is preserved.
See Chapter 13 for more details on whitespace.