XML, the Perl Way

A Closer Look at XSLT

by Michel Rodriguez
Boardwatch Magazine

As XSLT is getting more and more popular as a server and client-side tool to display XML, it is time to have a closer look at it.

XSLT is a language for transforming the structure of an XML document. It is the first part of XSL (XML Stylesheet Language), a two-part standard designed to display XML. The second part, XSL-FO is aimed at high-quality display of XML documents, mostly for printing documents. XSL-FO is not yet ready, but, as it turns out, XSLT by itself can be used to generate HTML, text or any text-based format, hence its growing popularity among XML developers.

This column will cover what exactly XSLT is, where to use it, some basic examples and a list of available products. An important part of XSLT is XPath, a language used to select parts of an XML document, and it will also be covered.

What is XSLT?

XSLT is a W3C recommendation that defines a language to transform the structure of a tree into another tree. The XSLT language itself has an XML syntax.

The conceptual model is:

The XSLT processor builds a tree from the original XML document. The XSLT stylesheet (also an XML document) is also turned into a tree before being executed. The processor then applies the transformations expressed in the XSLT stylesheet to the original tree, to generate another tree. The XSLT processor can then output the resulting tree either as XML, HTML or text.

Of course XSLT allows for several input documents to be used, and for the style sheet to be actually split over several files. At the moment the standard allows only one output document to be generated, but this will change in the future, and XSLT products already offer this feature.

First example

Here is an example of an XSLT style sheet.

For an extremely complex XML document like:

<? xml version="1.0"?>
<doc>Hello World</doc>
A stylesheet to turn this document into an (equally complex) HTML would be:
<html>
  <xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <head><title>Watch this:</title>
  <body>
    <h1>
      <xsl:value-of select="doc"/>
    </h1>
  </body>
</html>

The stylesheet here is just a template, with the xsl:value-of tag simply returning the content of the element selected by the select attribute. This attribute contains an XPath expression, here a very simple one, "doc", which returns the list of elements (here just one element - doc) in the document.

The xmlns attribute in the xsl element puts XSLT element in a separate XMLnamespace, so as not to interfere with user elements with the same name as XSLT elements. It declares xsl as being the XSLT namespace: all xsl:xxx elements are in the XSLT namespace and will be processed by the XSLT processor. The xsl prefix could actually be replaced by any other prefix, as long as the value of the xmlns:xxx attribute has the standard value "http://www.w3.org/1999/XSL/Transform".

More Than One Way To Do It

A slightly more complex example would turn an XML document containing data about plans into an HTML table.

Here is the XML document:

<?xml version="1.0"?>
<stats>
  <player id="grice">
    <name>Rice, Glen</name><ppg>16.1</ppg><apg>1.2</apg>
  </player>
  <player id="tkukoc">
    <name>Kukoc, Toni</name><ppg>15.1</ppg><apg>2.1</apg>
  </player>
  <player id="jrose">
    <name>Rose, Jalen</name><ppg>17.8</ppg><apg>4.5</apg>
  </player>
</stats>
And now an XSLT stylesheet that generates an HTML document listing players and their ppg numbers:
<html>
  <xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <head><title>Players Points Per Game</title></head>
  <body>
    <h1>Players PPG</h1>
    <table>
      <xsl:for-each select="//player">
        <tr>
          <td><xsl:value-of select="name"/></td>
          <td><xsl:value-of select="ppg"/></td>
        </tr>
      </xsl:for-each>
    </table>
  </body>
</html>

This is just slightly more complex, with the for-each element used to go through each player element, applying the inner part of the template to each in turn.

In this stylesheet a single template pulls data from the original XML document. While this is appropriate for simple templates, a more modular approach makes more sense for complex transformations.

Here is an equivalent stylesheet using a push processing model:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="stats">
  <html>
    <head><title>Players Points Per Game</title></head>
    <body>
      <h1>Players PPG</h1>
      <table><xsl:apply-templates></table>
    </body>
  </html>
</xsl:template>

<xsl:template match="player"><tr><xsl:apply-templates></tr></xsl:template>

<xsl:template match="name | ppg"><td><xsl:value-of select="."></td></xsl:template>

<xsl:template match="apg"/></xsl:stylesheet>

This stylesheet includes four templates, which are applied to the elements matched by the match attribute. As several templates are used, the overall syntax of the stylesheet changes from a simplified html-like one to the normal XSL one, where the template elements are wrapped in an xsl:stylesheet element.

The new important element here is xsl:apply-templates, which will cause the XSLT processor to look for templates in the stylesheet to apply to all the children of the matched elements in a template. Children of the current XML elements are pushed by the XSLT processor so that other templates can process them. Following that model, stylesheets can be developed in a modular way, "one template at a time."

Note: the <xsl:template match="apg"/> element is there only to prevent the apg element from being displayed as the default behavior for elements that do not have a template to output them as-is.

XPATH

At this point, it is worth taking a closer look at at XPath.

XPath is another W3C recommendation that defines how to address parts of an XML document. XSLT uses XPath expressions to determine to which parts of the input document a template should be applied (through the match attribute), and to retrieve parts of the input document into the resulting one (through the select attribute).

XPath is used to define the ... path to XML elements, either from the root of the document or from the current element (the one to which a pattern applies). XPath is quite intuitive so here are some examples that give a good feel for its power:

? . : select the current element
//player : select all player elements in the document
player : select all player elements children of the current element player/name : select name elements that are direct children of a player element
table//name : select name elements that are descendant of a table element
player[@id="grice"] : select player elements with an attribute id equals to "grice"
player[name="Glenn Rice"] : select player elements including a child name whose text is "Glenn Rice"]
name | ppg : select name or ppg elements children of the current element

XSLT processors

A good number of XSLT processors, most of them open sources or at least free (as in free beer), have been released in the last year. Big companies like Microsoft (MSXML3), Oracle (Oracle XSL), Apache (Xalan, using IBM code); smaller ones like Gingerall (Sablotron) and individual developers as Michael Kay (Saxon) or James Clark (XT, one of the editors of the XSLT recommendation) all offer XSLT product.

Most of these products are written in Java or C++ and are available on most platforms.

Conclusion

There is a lot more to XSLT than I could discuss in this quick introduction, including instructions to sort elements, to pass parameters to a stylesheet, advanced string and number formats.

More information can be found (as usual) on the W3C Web site (www. w3c.org), the XML Cover's pages (www.oasis-open.org/cover/) or the Zvon site (www.zvon.org). The book XSLT, by Michael Kay (Wrox, ISBN 1-861003-12-9), is also a great resource.

XSLT should not be used indiscriminately, some complex XML transformations are clearly better written using a "real" language such as Java or Perl. Its power and the availability of quality tools make it an ideal candidate for most of the XML processing, though.

Despite being fairly new, XSLT is already widespread in the XML community. The main reasons for this rapid adoption are the real need for XML transformation tools, the number of independent implementations that ensure that no single company will be able to control the standard and the fact that the recommendation draws upon the experience of similar efforts in the SGML world.


Note: this article was published in 2000 in Boardwatch magazine. More recent articles about XML and especially Perl & XML can be found on www.xmltwig.com