Dropping in <!DOCTYPE>-Dropping Alternatives

In XML, entities are declared within the Doctype declaration. Entities are recursively expanded when they are encountered by a parser, so when hand-authoring XML, these entities are often used as a macro facility. In December of 2005, Tim Bray wrote Drop the <!DOCTYPE>, in which he (not surprisingly) argued against the use of Doctype declarations. If you were to follow his advice, you would not be able to use the entity "macro" facility. Personally, I try to avoid depending on any of the features offered by Doctype declarations, but this article isn't about my opinion on that topic, but is instead about XSLT.

In response to Tim's article and some heady push-back from arguments calling for the demise of the Doctype declaration, Norman Walsh wrote a nifty XSLT script for allowing a high-level, user-configurable macro facility for XML. This was good stuff; I was ready to be very excited about this approach until I discovered that his work required XSLT 2. Unfortunately, my XML toolkit of choice—4Suite—does not currently have support for XSLT 2.

I really like XSLT, and so as I have come across XSLT 2 scripts like Norm's, I have been forced recently to evaluate different XSLT solutions. Do I try to push for more XSLT 2 support in toolkits that I use, or do I focus on XSLT 1 with enhancement through efforts such as EXSLT? As part of this process, I decided to see if I could reimplement (or rather, modify) Norm's script to use only XSLT 1 with EXSLT extensions. I was successful; my version of ml-macro.xsl uses XSLT 1 plus extensions from the common, regular expression, and function EXSLT modules. (Unfortunately, the use of the regular expression module currently excludes libxml2/libxslt.) My version comes out at about 80 lines longer than Norm's.

Not surprisingly, I often learn something new or come up with some interesting ideas about a language whenever I write code in that language, and this probably holds true of the XSL family of languages even more than others. Here's what I learned from this round:

You can't have both a single quote and a double quote in XPath expressions. Thus, if you want to have both single quotes and double quotes in an XPath literal, you're going to need to concat a set of strings. The mess around line 318 of my script shows an example of this.
In XSLT 2, regular expressions take advantage of XML Schema regular expression syntax. This syntax provides some nice character classes for XML-specific string sets, such as the '\c' class, which matches any name character. I sure don't want to build that character class by hand, so my script uses an ASCII subset of that class. I think it would be good if EXSLT switched to this regular expression syntax; they are both fundamentally based on Perl regular expressions, so I don't think it would be too much of a problem.
We need a version of the document function that cannot quit when it cannot retrieve a resource. Does XSLT 2 offer this? I can't tell from its specification. In any case, I am in favor of such an extension for EXSLT.

There are definitely some things that I like about the XSLT 2 solution. Sequences seem to be easy to use, and the analyze-string construct is elegant. Clearly XSLT 2 can do the job with slightly less code, and I think my need to convert to node-sets at several points is somewhat costly. But XSLT is here now, and with EXSLT it can do the task. If we wanted, we could even add the analyze-string machine to the EXSLT specification. Is this a typical task for XSLT 2, though? Is there a bunch of other functionality that will ultimately prove unapproachable by XSLT 1 + EXSLT? Honestly, I don't know. I'd love to hear some feedback on this whole issue. On the other hand, once your toolkit supports XSLT 1, how much more difficult would it be to just go ahead and implement XSLT 2 (or at least the "basic XSLT processor" model)? As we add more and more to EXSLT, does it make the migration to an actual XSLT 2 implementation that much easier?

Others use more traditional, procedural languages as their XML power tool. As I continue this XSLT investigation, I'm going to try to implement ml-macro in Python using 4Suite, as well. That will help me understand the different feel of different approaches. I'll post the results (or at least the code) here when I'm finished. For right now, I'm just glad I can leverage Norm's macro technique using XSLT 1 and EXSLT.

Dropping in <!DOCTYPE>-Dropping Alternatives

Abstract