Chapter 13

“Gimme Some Space” and Other Output Issues

In This Chapter

bullet Working with whitespace

bullet Creating more readable documents through indenting

bullet Adding XML comments to your result document

bullet Creating processing instructions

B ecause the whole purpose of XSLT is to generate new documents from other documents, the transformation language ought to have considerable flexibility in determining what the resulting document structure looks like. After all, if not, why use it? Fortunately, XSLT does have several ways to tweak the result of your transformation. In Chapter 9, I explain how you can sort and number the content. In this chapter, I discuss some of the more advanced issues concerning XSLT output.

Gimme Some Space

Whitespace is a term used to describe those invisible characters inside a document. You know, all those characters that you never see, but you know they’re there, such as spaces, tabs, carriage returns, and line feeds. They’re kind of like those creepy microscopic creatures you see on PBS specials that supposedly crawl all over you and me. Yuck! I’m itching all over just thinking about it . . . let me change the subject and talk about something much more pleasant: whitespace.

Whitespace is one of those tricky issues in XSLT, because so many variables that determine what whitespace appears in the result tree are at play. Whitespace has origins in the XML source, the template rules of the XSLT stylesheet, and specific space-related XSLT elements, such as xsl:strip-space.

And if that isn’t complicated enough, in what must be nothing more than a sick joke, some XSLT processors handle whitespace quite differently from others, which can result in varying outputs.

Warning(bomb)

msxsl3, the 3.0 version of Microsoft’s msxsl processor, is particularly problematic in how it deals with whitespace. Unlike other processors, such as Saxon, whitespace is automatically stripped out by default, both from the original XML source document and the XSLT stylesheet. In fact, trying to figure out how to even add whitespace back into it becomes a frustrating exercise. Happily, with version 4.0 of the msxsl processor, this default behavior has been changed to reflect what you’d expected from standard XSLT processors.

Fortunately, in many cases, whitespace in the result document is not that significant of an issue. Nonetheless, when you are trying either to preserve a specific format or to format the result document in a specific manner, then knowing how to work with whitespace becomes important.

Whitespace in XSLT stylesheets

The general rule of thumb is that, inside the XSLT stylesheet, whitespace is stripped out of the template before any transformation occurs. However, you can make sure the processor preserves the whitespace based on how you work with the text nodes and xsl:text instructions.

Whitespace in text nodes

Whitespace in text nodes is normally ignored, but when a text node contains nonwhitespace characters, then whitespace characters are automatically preserved. To demonstrate how this works, take a look at my sample XML source in Listing 13-1.

Listing 13-1: afifilms.xml

<!-- American Film Institute Top 10 Films -->

<!-- afifilms.xml -->

<topfilms createdby=”AFI”>

  <film place=”1” date=”1941”>Citizen Kane</film>

  <film place=”2” date=”1942”>Casablanca</film>

  <film place=”3” date=”1972”>The Godfather</film>

  <film place=”4” date=”1939”>Gone With The Wind</film>

  <film place=”5” date=”1962”>Lawrence Of Arabia</film>

  <film place=”6” date=”1939”>The Wizard Of Oz</film>

  <film place=”7” date=”1967”>The Graduate</film>

  <film place=”8” date=”1954”>On The Waterfront</film>

  <film place=”9” date=”1993”>Schindler’s List</film>

  <film place=”10” date=”1952”>Singin’ In The Rain</film>

</topfilms>

Suppose I want to use this list of the American Film Institute’s top ten films to generate a list of each film and the date it was made. I can create such a result with this code:

  <xsl:template match=”film”>

   <xsl:apply-templates/>

   <xsl:value-of select=”@date”/>

  </xsl:template>

The template rule then generates the following result:

  Citizen Kane1941

  Casablanca1942

  The Godfather1972

  Gone With The Wind1939

  Lawrence Of Arabia1962

  The Wizard Of Oz1939

  The Graduate1967

  On The Waterfront1954

  Schindler’s List1993

  Singin’ In The Rain1952

Although you can easily forget about it, a text node is actually between the xsl:apply-templates and xsl:value-of instructions. However, because all the characters that the text node contains are whitespace (carriage return, line feed), the text node is ignored in the output. Therefore, the following two ways of expressing the code produce the same output:

   <xsl:apply-templates/>

   <xsl:value-of select=”@date”/>

and

   <xsl:apply-templates/><xsl:value-of select=”@date”/>

To make the output more readable, I can add literal text between these two instructions to make each list item into a sentence. The new template rule looks like:

  <xsl:template match=”film”>

   <xsl:apply-templates/> was made in <xsl:value-of select=”@date”/>

  </xsl:template>

This revised template produces the following result:

  Citizen Kane was made in 1941

  Casablanca was made in 1942

  The Godfather was made in 1972

  Gone With The Wind was made in 1939

  Lawrence Of Arabia was made in 1962

  The Wizard Of Oz was made in 1939

  The Graduate was made in 1967

  On The Waterfront was made in 1954

  Schindler’s List was made in 1993

  Singin’ In The Rain was made in 1952

However, imagine that I alter the text between the xsl:apply-templates and xsl:value-of instructions in the template rule by adding line break between the text node:

  <xsl:template match=”film”>

   <xsl:apply-templates/> was

   made in <xsl:value-of select=”@date”/>

  </xsl:template>

The results in this case show the line break:

  Citizen Kane was

   made in 1941

  Casablanca was

   made in 1942

  The Godfather was

   made in 1972

  Gone With The Wind was

   made in 1939

  Lawrence Of Arabia was

   made in 1962

  The Wizard Of Oz was

   made in 1939

  The Graduate was

   made in 1967

  On The Waterfront was

   made in 1954

  Schindler’s List was

   made in 1993

  Singin’ In The Rain was

   made in 1952

The XSLT processor can’t ignore the line break in this template rule because nonwhitespace characters appear in the same text node. The whitespace characters are all preserved along with the adjoining nonwhitespace characters.

Whitespace inside xsl:text

Any whitespace appearing inside the xsl:text element is automatically preserved, making it a good tool to control exactly what whitespace you want to appear in the result document. For example, if I want to add an XML comment to precede each item in the list, I add an xsl:comment instruction to the template. (I discuss the use of xsl:comment later in the chapter.) But if I use the following snippet, the comment appears on the same line as the list entry:

  <xsl:template match=”film”>

    <xsl:comment>List entry</xsl:comment>

    <xsl:apply-templates/> was made in <xsl:value-of select=”@date”/>

  </xsl:template>

The result is:

  <!--List entry-->Citizen Kane was made in 1941

  <!--List entry-->Casablanca was made in 1942

  <!--List entry-->The Godfather was made in 1972

  <!--List entry-->Gone With The Wind was made in 1939

  <!--List entry-->Lawrence Of Arabia was made in 1962

  <!--List entry-->The Wizard Of Oz was made in 1939

  <!--List entry-->The Graduate was made in 1967

  <!--List entry-->On The Waterfront was made in 1954

  <!--List entry-->Schindler’s List was made in 1993

  <!--List entry-->Singin’ In The Rain was made in 1952

Just as you found out in the preceding section, whitespace is ignored between </xsl:comment> and <xsl:apply-templates> tags. Therefore, to add a line break between the comment and line text and after each item, you need to use xsl:text :

  <xsl:template match=”film”>

  <xsl:comment>List entry</xsl:comment><xsl:text>

  </xsl:text>

  <xsl:apply-templates/> was made in <xsl:value-of select=”@date”/><xsl:text>

  </xsl:text>

  </xsl:template>

So, even though the xsl:text instruction contains nothing but a carriage return, the XSLT processor preserves it because whitespace that falls between the start and end tags of xsl:text is considered significant. The text generated is as follows:

  <!--List entry-->

    Citizen Kane was made in 1941

    

  <!--List entry-->

    Casablanca was made in 1942

    

  <!--List entry-->

    The Godfather was made in 1972

    

  <!--List entry-->

    Gone With The Wind was made in 1939

    

  <!--List entry-->

    Lawrence Of Arabia was made in 1962

    

  <!--List entry-->

    The Wizard Of Oz was made in 1939

    

  <!--List entry-->

    The Graduate was made in 1967

    

  <!--List entry-->

    On The Waterfront was made in 1954

    

  <!--List entry-->

    Schindler’s List was made in 1993

    

  <!--List entry-->

    Singin’ In The Rain was made in 1952

Whitespace in source XML documents

When creating XML documents, I often want to visually show the hierarchy of the document structure by indenting each level, but I don’t want this whitespace to actually show up in my result tree. Although this seems logical, it causes problems with the XML processor, because it doesn’t know whether or not those whitespace characters are significant. Because preserving information that could be significant is better than deleting it, the XML processor preserves all whitespace outside the start and end tags of the XML elements. For example, I can change the spacing of the source document I’ve been using so that it looks like this:

<!-- American Film Institute Top 25 Films -->

<topfilms createdby=”AFI”>

  <film place=”1” date=”1941”>Citizen Kane</film>

          <film place=”2” date=”1942”>Casablanca</film>

  <film place=”3” date=”1972”>The Godfather</film>

               <film place=”4” date=”1939”>Gone With The Wind</film><film place=”5” date=”1962”>Lawrence Of Arabia</film>

  <film place=”6” date=”1939”>The Wizard Of Oz</film>

   <film place=”7” date=”1967”>The Graduate</film><film place=”8” date=”1954”>On The Waterfront</film>

  <film place=”9” date=”1993”>Schindler’s List</film>

  <film place=”10” date=”1952”>Singin’ In The Rain</film>

</topfilms>

To show how this whitespace is carried over to the result document, I create a basic template rule

  <xsl:template match=”film”>

    <xsl:apply-templates/>

  </xsl:template>

After transformation, the output is as follows:

  Citizen Kane

          Casablanca

  The Godfather

               Gone With The WindLawrence Of Arabia

  The Wizard Of Oz

   The GraduateOn The Waterfront

  Schindler’s List

  Singin’ In The Rain

Using xsl:strip-space and xsl:preserve-space

You can use the xsl:strip-space element to get rid of all this extra whitespace in the source document. This element has a single required attribute named elements. You use the elements attribute to list the names of elements containing whitespace that you want to strip. If you want to add more than one element name, separate the names with (ironically enough) whitespace. You can also use * to specify all elements.

For my example, I want to specify the topfilms element, because all the extra whitespace is part of its content:

<xsl:strip-space elements=”topfilms”/>

By adding this as a top-level element to my stylesheet, the transformation now looks quite different:

Citizen KaneCasablancaThe GodfatherGone With The WindLawrence Of ArabiaThe Wizard Of OzThe GraduateOn The WaterfrontSchindler’s ListSingin’ In The Rain

Tip

The xsl:strip-space removes whitespace only for the elements specified by the elements attribute and doesn’t strip whitespace from the descendents of those elements. In this example, if I want to remove any extra whitespace appearing in film elements, I need to explicitly add it to the elements attribute value: elements=”topfilms film”.

The xsl:preserve-space element preserves whitespace in the source document. By default, XSLT conserves space already, so this element is needed only to offset the use of xsl:strip-space. A common example of how developers use this element is when you want to remove the space in all elements except one or two. So if I want to remove all the whitespace in the source document, except for the whitespace inside the film elements, I use the following:

<xsl:strip-space elements=”*”/>

<xsl:preserve-space elements=”film”/> 

Remember

The xsl:strip-space and xsl:preserve-space elements are top-level elements for a stylesheet. If you put them inside a template rule, you get an error.

Preserving with xml:space

A second way of preserving whitespace in the source document is to add a special XML attribute named xml:space to one or more of the document elements. The xml:space attribute has two possible values:

bullet xml:space=”preserve” tells the processor to keep the whitespace for this element intact.

bullet xml:space=”default” tells the processor to return to its default setting.

The xml:space applies to the element that defines it as well as any of its descendants.

When the XSLT processor encounters an xml:space, it remembers the value as text nodes are processed. Text nodes take on the xml:space value of their closest ancestor.

Indenting Your Result Document

The xsl:output element includes the indent attribute, which enables you to specify whether the XSLT processor can indent the result document so that the document displays the hierarchy of the tree.

Indenting your result document can help others read it but doesn’t impact how the document is processed. For example, imagine you have a flat-looking XML file that you want to transform into something more readable. Start with the following source:

<topfilms createdby=”AFI”>

<film place=”1” date=”1941”>Citizen Kane</film>

<film place=”2” date=”1942”>Casablanca</film>

<film place=”3” date=”1972”>The Godfather</film>

<film place=”4” date=”1939”>Gone With The Wind</film>

<film place=”5” date=”1962”>Lawrence Of Arabia</film>

<film place=”6” date=”1939”>The Wizard Of Oz</film>

<film place=”7” date=”1967”>The Graduate</film>

<film place=”8” date=”1954”>On The Waterfront</film>

<film place=”9” date=”1993”>Schindler’s List</film>

<film place=”10” date=”1952”>Singin’ In The Rain</film>

</topfilms>

You can use the following stylesheet to copy all the elements into an indented output:

  <xsl:output method=”xml” indent=”yes”/>

  <xsl:template match=”/”>

    <xsl:copy-of select=”*”/>

  </xsl:template>

When the stylesheet is applied to the XML source, the XSLT processor indents each level of the result tree hierarchy, resulting in a much more legible document:

<topfilms createdby=”AFI”>

   <film place=”1” date=”1941”>Citizen Kane</film>

   <film place=”2” date=”1942”>Casablanca</film>

   <film place=”3” date=”1972”>The Godfather</film>

   <film place=”4” date=”1939”>Gone With The Wind</film>

   <film place=”5” date=”1962”>Lawrence Of Arabia</film>

   <film place=”6” date=”1939”>The Wizard Of Oz</film>

   <film place=”7” date=”1967”>The Graduate</film>

   <film place=”8” date=”1954”>On The Waterfront</film>

   <film place=”9” date=”1993”>Schindler’s List</film>

   <film place=”10” date=”1952”>Singin’ In The Rain</film>

</topfilms>

Warning(bomb)

By using the indent=”yes” option, you tell the XSLT processor that it can indent to show the document hierarchy. But, that does not necessarily mean that all processors support indenting. Some processors, like Saxon, provide explicit support for indenting, while others (msxsl) do not.

Adding Comments

XSLT includes the xsl:comment instruction to create an XML/HTML-like comment and add it to the result document. After transformation, the comment text you provide becomes surrounded by a <!-- and -->. For example, suppose I’d like to output the afifilms.xml document (refer to Listing 13-1) as an HTML-based numbered list. However, I’d like to add a comment at the top of the document identifying the list along with comments both before and after the list denoting its start and end. The following stylesheet can be used to do this:

<xsl:stylesheet xmlns:xsl=”http://www.w3.org/1999/XSL/Transform” version=”1.0”>

  <xsl:output method=”html”/>

  <!-- Add HTML element, comments -->

  <xsl:template match=”/topfilms”>

    <html>

    <xsl:comment>List created by <xsl:value-of select=”@createdby”/></xsl:comment>

    <xsl:comment>***** Start List *****</xsl:comment>

    <ol>

    <xsl:apply-templates/>

    </ol>

    <xsl:comment>***** End List *****</xsl:comment>

    </html>

  </xsl:template>

  <!-- Apply to each film -->

  <xsl:template match=”film”>

   <li><i><xsl:apply-templates/></i></li>

  </xsl:template>

</xsl:stylesheet>

The first template rule demonstrates the use of the xsl:comment. Any text placed between the start and end tags of the xsl:comment instruction are added as comments in the result tree. As you can see, you can include literal text or anything that evaluates to text, such as xsl:value-of. The second template rule adds the HTML code to create an italicized list (the li element creates a list item and i adds italics). The HTML code that is generated after the transformation is shown here:

<html>

   <!--List created by AFI-->

   <!--***** Start List *****-->

   <ol> 

   <li><i>Citizen Kane</i></li>

   <li><i>Casablanca</i></li>

   <li><i>The Godfather</i></li>

   <li><i>Gone With The Wind</i></li>

   <li><i>Lawrence Of Arabia</i></li>

   <li><i>The Wizard Of Oz</i></li>

   <li><i>The Graduate</i></li>

   <li><i>On The Waterfront</i></li>

   <li><i>Schindler’s List</i></li>

   <li><i>Singin’ In The Rain</i></li>

   </ol>

   <!--***** End List *****-->

</html>

Warning(bomb)

Avoid putting -- in your comment text or ending your text with a -. These characters are used to denote an end of comment tag and so could potentially goof up the resulting XML document structure. Some XSLT processors handle this syntax problem by adding a space around the last dash to prevent an error, but other processors may generate a runtime error.

You may have noticed that the XML comments from the original source document weren’t copied to the result document. If you read the discussion on built-in templates in Chapter 4, you may recall that XSLT automatically strips out comments. If you want to carry over comments from the source to the result document, you need to add the following template rule to the stylesheet:

<xsl:template match=”comment()”>

  <xsl:copy/>

</xsl:template>

The <!-- American Film Institute Top 25 Films --> comment is then included in the output.

Adding Processing Instructions

XML processing instructions provide a way for documents to contain instructions for applications that deal with the output document. Although you can use processing instructions for any custom purpose, probably the most common use for a processing instruction today is to attach an XSLT stylesheet with an XML document (see Chapter 10 for more information on this instruction):

<?xml-stylesheet href=”defaultstyles.css” type=”text/css”?>

Processing instructions stand apart from other XML elements due to their <? prefix and ?> suffix.

Warning(bomb)

Although it looks like one, XML declarations like <?xml version=”1.0”?> are technically not processing instructions. Therefore, you cannot use xsl:processing-instruction to add an XML declaration to your output document. Instead, use the xsl:output instruction with method=”xml” defined to have the processor automatically add an XML declaration to the top of your result document.

You can create processing instructions and add them to your result document with the handy xsl:processing-instruction element. It has two parts:

bullet The name attribute specifies the name of the processing instruction.

bullet The content of the element is used for adding any other name/value pairs that you need.

Suppose, for example, you have an application that’s using a processing instruction called xsl-my_custom_instruction. To generate the following instruction in the output file:

<? my_custom_instruction lang-”en” customid=”kimmers” type=”absolute” ?> 

You use the following XSLT:

<xsl:processing-instruction name=”xsl-my_custom_instruction”>

  lang=”en” customid=”kimmers” type=”absolute” 

</xsl:processing-instruction>

Putting xsl:processing-instruction into action, I use the instruction to add an xml-stylesheet reference to the film list stylesheet I use earlier in this chapter:

<xsl:template match=”/”>

  <html><xsl:text>

  </xsl:text>

  <xsl:processing-instruction name=”xml-stylesheet”>href=”liststyle.css” type=”text/css”</xsl:processing-instruction>

  <xsl:comment>List created by <xsl:value-of select=”@createdby”/></xsl:comment>

  <xsl:comment>***** Start List *****</xsl:comment>

  <xsl:apply-templates/>

  <xsl:comment>***** End List *****</xsl:comment>

  </html>

</xsl:template>

Warning(bomb)

You can’t include ?> as part of your processing instruction definition or you generate a processing error. The symbol ?> is reserved for the ending of a processing instruction.

Tip

By default, all processing instructions contained in the source document are removed during transformation. However, to override this behavior and copy processing instructions from the source directly to the result document, use the following template rule:

<xsl:template match=”processing-instruction()”>

  <xsl:copy/>

</xsl:template>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset