Infinitesque

Subjects:

xml:base vs. RFC 3986 Grudge Match

Principal author:
John L. Clark

Abstract

The spirit of the xml:base Recommendation and that of RFC 3986 are at odds with one another. The solution? A very simple new member of the URI family of specifications.

URIs can be useful to identify resources when working on the Web, but are unwieldy and verbose. Some schemes of URIs are descendants of hierarchical filesystem concepts. When using such URI schemes, users can then form relative references which resolve to absolute URIs through a natural path traversal concept. These relative references can usually be brief and more easily manageable than the corresponding absolute URI.

The URI specification, RFC 3986, describes a mechanism for locating the URI to use when resolving these relative references. This URI is called the base URI for a given context. RFC 3986 also describes what should happen if a relative reference resolves to the same URI as the base URI. In such a situation, the reference is known as a "Same-Document Reference", and software that processes such Same-Document References should refer to the current document when resolving such references. As a result, base URIs should be used to indicate the actual URI of the resource in question.

The xml:base Recommendation gives users the ability to change the base URI in effect for a given section of an XML document. This is meant as a convenience to allow users to use relative references where they would otherwise need to use an absolute URI. The author of RFC 3986, Roy T. Fielding, asserts that this type of behavior is an abuse of the base URI mechanism.

 

[A] person is deliberately abusing the base URI by assigning it an unrelated URI for the purpose of creating an artificial shorthand notation for external references.

 
 --Roy T. Fielding

In effect, the contract that RFC 3986 lays down is that the base URI provided to an application should be considered to be the URI of that context. This is not unreasonable, because if you change the base URI in a given context to that of an external resource, then any Same-Document References in that context could refer to the current document and not the external resource. We can see this problem through close examination of the example from section 3 of the xml:base Recommendation, which I reproduce here for discussion.

<?xml version="1.0"?>
<doc xml:base="http://example.org/today/"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <paragraph>See <link xlink:type="simple" xlink:href="new.xml">what's
      new</link>!</paragraph>
    <paragraph>Check out the hot picks of the day!</paragraph>
    <olist xml:base="/hotpicks/">
      <item>
        <link xlink:type="simple" xlink:href="pick1.xml">Hot Pick #1</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick2.xml">Hot Pick #2</link>
      </item>
      <item>
        <link xlink:type="simple" xlink:href="pick3.xml">Hot Pick #3</link>
      </item>
    </olist>
  </body>
</doc>

Let's assume that the URI for this document is http://example.org/today/. This use of xml:base tries to make it easy to refer to the URIs http://example.org/hotpicks/pick1.xml, http://example.org/hotpicks/pick2.xml, and http://example.org/hotpicks/pick3.xml. If the xml:base on the olist element were /hotpicks/pick2.xml instead of /hotpicks/, however, then the relative reference in the second item would become a Same-Document Reference according to RFC 3986, and even though it resolves to http://example.org/hotpicks/pick2.xml, which is a separate resource from http://example.org/today/, processors should treat http://example.org/hotpicks/pick2.xml as the current document. A similar problem occurs if you change the relative reference in the second item to #pick2 instead of pick2.xml. Another good example and description of this problem can be found in Sjoerd Visscher's How to use base URIs. These two specifications differ in their understanding of identity. Also, any other specification that wanted to be able to use “an artificial shorthand notation for external references” would come into conflict with RFC 3986.

So what do I propose as a solution to this problem? I believe that if users want such a shorthand for quickly specifying external references, they should call this shorthand something different. A different name would put this technology outside the scope of the mechanisms built in to RFC 3986. For example, I might call these shorthand references just that - "shorthand references", and specify that they resolve against application-specified "grounding URIs". There is no difference in the mechanism for resolution, but we no longer assume anything about base URIs because we explicitly don't use them.

As something of a compromise, even, we might allow shorthand references to resolve against a grounding URI if one is present, and a base URI otherwise. In such a way users have the option of specifying references with either of the desired set of properties. To fix the specific problem with xml:base and base URI abuse, I would simply word the xml:base specification in such a way that it uses shorthand references instead of relative references and note that these shorthand references ought to be resolved against grounding URIs as defined by the xml:base attributes. If anyone is interested in working on such a specification (minimal as it may be), I would be happy to draft one, otherwise I'll just use these concepts internally in my own projects.

This page was last modified on 2005-11-19 00:00:00Z.

This page was first published on .

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

See the version of this page with comments enabled to read or add comments.