Some thoughts on XML v.NEXT

XML is 10 years old! Hooray! To celebrate, Norman Walsh put some more thought into what might go into a new version of XML, and Simon St.Laurent jump started an xml-dev thread to discuss what might be taken out of the old version of XML. I just wanted to provide some commentary on the features (both extant and imaginary) that people have been discussing.

Let's start with Norm's list. I think most of his suggestions are excellent. The specification should be as readable as ever; XML Namespaces (and, I would add, xml:id, xml:base, and the XML Infoset) should be included in the core specification while DTDs should be divorced from the core. I'm particularly excited about #7 on his list: allow XML documents to have multiple root elements. I think this provides a better closure for the underlying tree model, which allows you to work with arbitrary sections of an XML document (node-sets, in XPath 1.0 terminology) as if it were its own XML document. I also want to find out if it might make sense to allow an application to process the first N root elements even if there is a well-formedness error in root element N+1. I don't think an XML 1.0 parser has any good indication about when a document is complete based on SAX events (as mentioned by Bob DuCharme) because an XML 1.0 document can have any number of comments or processing instructions after the root element, so I don't worry about losing that feature.

I'm quite concerned about the problem of mapping QNames to URIs. I think that generally speaking, concatenating the namespace URI and the local name of a QName to get a new URI that is meant to correspond to the QName should be discouraged. Simple concatenation is simply not a proper two-way mapping, producing truly scary URIs for a number of well-established namespaces. There are a few notable exceptions, such as RDF/XML, in which this procedure is well-established and tended carefully, but those should remain the exceptions, not the rule, and we should find a good solution to this problem and replace RDF/XML at our first convenience.

Without DTDs, there would be significant pressure for a macro facility that named entities (a DTD feature) currently provide. Norm's most recent suggestion is interesting, but I prefer leaving macro resolution as an application-provided feature, such as the ml-macro facility with which Norm experimented previously. That said, deferring macro resolution would disallow macros for use in constructing namespace URIs, of which I have seen a fair bit in the wild in hand-authored documents. That might support some of the voices, however, that suggest that we ought to take a good hard look at XML Namespaces.

I have long been frustrated by the lack of significant whitespace in attribute values, so I am strongly in favor of John Cowan's suggestion to eliminate attribute normalization. I think that XML 2.0 should preserve all whitespace in a document, but offer hooks to a processor (such as a validation engine) to enable that processor to express when this whitespace might be significant or insignificant.

I don't know enough about internationalization to make any truly informed statements about the recent discussion surrounding the drafts of XML 1.0 Fifth Edition, so I will leave that analysis to others. I just hope that at the end of it we can reach a workable compromise, and that the W3C continues to provide strong leadership.

I think there is plenty that we can do to both streamline and enhance XML, and it seems like there is a fair amount of enthusiasm for it in the community. I hope we can continue to move forward with this!

Some thoughts on XML v.NEXT

Abstract