XML, the Perl Way

Using XML Schema Repositories

Designing XML documents through DTDs (document type definitions) or schemas, as described in last month's column, might be necessary. It might even be fun, but it is probably more efficient to avoid reinventing the wheel. Have a look at what kinds of designs are available, especially those designed by XML specialists.

This month I look at the various XML repositories ? where to get off-the-shelf XML schemas. (Schema is used here as a generic term to describe DTDs, official W3C schemas and various types of semi-proprietary schemas developed before the official W3C schema recommendation.)

XML schemas can be found in various places, such as:

Formal repositories

Biztalk.org

Biztalk (http://biztalk.org) is a Microsoft service that offers a repository of XML schemas in a Microsoft-specific syntax named XML-Data Reduced (XDR), and an architecture for exchanging those schemas.

Before I chase you away with the words, "Microsoft-specific", it is probably worth talking a bit about Microsoft's XML strategy. In short, Microsoft talks the talk as well as anybody in the area and it walks the walk quite properly, even if it sometimes staggers a bit.

On the plus side, Microsoft was part of the initial XML committee. It is probably fair to say that, without its involvement in the process, XML would never have happened. Since then it has been the first company to offer XML support in its browser, and it offers (for free) a set of XML tools, including a parser and an XML SDK, at http://msdn.microsoft.com/xml/default.asp.

Of course, in typical Microsoft fashion, it also created specific extensions to some standards. It is hard to blame the company, as the pace of standardization, even for an organization as reactive as the W3C, is quite slow. Of course, the biggest gripe against Microsoft is that Office 2000 still does not export proper XML, although it provides "XML islands" within documents.

All in all, Microsoft behaves as a responsible, if a little scary, player in the XML area.

Getting back to Biztalk, since the W3C recommendation on XML schemas is not yet ready, and since many users wanted stronger typing for attributes and elements than provided by DTDs, schemas are defined using XDR, a fairly easy-to-understand syntax. Microsoft announced that it will support the W3C recommendation on schemas as soon as it is officially released.

At the moment, the Biztalk repository, in the "Library" area of the Web site, includes around 300 schemas, submitted by 40 different companies. Note that more than half of these schemas originate from the Open Applications Group, a consortium including the major players in the computer industry (more on it below).

After registration, schemas, as well as their documentation and XML document examples, can be downloaded for free. Unless otherwise specified in the documentation, schemas can be used and modified at will. Biztalk plans to add an area with "protected" schemas, which could be accessed only by agreeing business partners.

Notable schemas here include HR information (such as resumes and job postings), project management, business documents (such as invoices and purchase orders) and trouble tickets, relational data base descriptions and addresses.

Biztalk is a useful resource, though a little difficult to navigate. So far, few of the schemas seem to be using the "Biztalk Framework", a set of specific elements promoted by Microsoft, which would enable XML documents to be routed and delivered by dedicated Microsoft servers running Microsoft software. This is probably a good thing.

XML.org

XML.org (http://xml.org) is a budding repository of XML schemas created in June by the Organization for the Advancement of Structured Information Standard (OASIS), whose members include Sun, IBM and Xerox. Technically, the repository is very powerful, with options to search for DTDs, W3C schemas and XRD, as well as for a specific element or attribute. Alas, it seems to include only a handful of DTDs and a couple of schemas, among which are the inevitable resume and purchase order.

It is too early at this point to guess whether XML.org will be successful, or how many schemas it will host in the future; however in its present state, its registry is not very useful.

Standard and trade organizations

Standard and trade organizations have long been busy promoting interoperability and scale economies, so it is only natural that they would start writing XML schemas. Standard bodies, of course, did not stop there and most of what they design are XML applications.

The Open Applications Group, at http://www.openapplications.org , is a consortium formed by most of the main players in the computer industry. It has published an impressive list of schema (in DTD and XDR forms) dealing mainly with e-business and software integration. Those schemas are also available on Biztalk.org.

XML applications

Applications are languages built on top of XML (defined in XML) for various purposes. Successful XML applications attract developers and gain tool support .SVG (Scalable Vector Graphics) or MathML are examples of XML applications, more on them next month.

Docbook (http://www.oasis-open.org/docbook/ ) is a standard DTD for software documentation, used, for example, for the Linux and KDE docs. Docbook is a complex DTD that can (and should) be customized before usage. The good news is that it comes with a number of tools to convert it to various formats, from PDF to LaTeX, including HTML.

An overview of standard XML schemas would not be complete without mentioning the now-famous DESSERT, a DTD for representing recipes on computers. Document Encoding and Structuring Specification for Electronic Recipe Transfer at http://www.formatdata.com/dessert/index.html

XML FOR ISPs

Oddly enough, there are very few XML schemas written by or for ISPs, one being Covad's oddly named xLink (Xlink is also a W3C specification that defines links in XML), at http://xlink.covad.com.

The xLink system is used by Covad to receive work orders from its customer and to monitor the status of those orders. It is a good example of "XML at work" at a major ISP. Composed mainly of two DTDs, one for requests and one for responses, it makes heavy use of entities, but is, overall, quite understandable.

Although it is not yet completed and will eventually be a complete XML application, W3C's P3P, Platform for Privacy Protection (ihttp://www.w3.org/P3P/), already includes an XML DTD that can serve both as an exemple of a DTD and as a project that could impact a lot of ISP's.

Of course, for ISPs operating satellite links, SML (Spacecraft Markup Language, at http://www.interfacecontrol.com/sml/) might be useful, but I doubt they are the core market of the language.

Even if no existing schema exactly fits the requirements for a system, all of the resources discussed in this column can still be used. Reading a couple of examples first certainly makes it easier to figure out how to write (and document) schemas.

There is no need to reinvent some of the common structures (like addresses) that can be found in a lot of the publicly available material. Starting from an existing schema and customizing it is a great way to jumpstart an XML project and maybe re-use some of the tools available for the original schema.

RESOURCES

James Tauber's Schema.net (http://www.schema.net) includes links to a number of schemas, sorted by category for easy accessibility. The XML Cover Pages (http://www.oasis-open.org/cover/) include a long list of links to various XML schemas and applications. The Q&A Markup Language (QAML) is a good example of a useful schema that can be found there (although in true XML fashion, a new FAQ schema is being developed by members of the XHTML mailing list).


Note: this article was published in 2000 in Boardwatch magazine. More recent articles about XML and especially Perl & XML can be found on www.xmltwig.com