Infinitesque

Subjects:

Resolving URIs in 4Suite

Principal author:
John L. Clark

Abstract

This article describes how to customize the way that 4Suite finds objects associated with URIs (a process known as resolving URIs).

1 Introduction

As a library for processing XML, 4Suite is quite Web-savvy. XML documents often take advantage of Web Architecture in order to identify and locate dependent and otherwise related resources, so really, it has to be. XInclude—an XML dialect for transclusion—identifies content to transclude using URIs (and other technologies for peeking into documents). XSLT uses URIs to include and import additional stylesheet components. XML uses URIs for DTD system identifiers. In general, the 4Suite APIs often allow an object to be retrieved by resolving its URI; in addition, 4Suite provides full access to the layer that manages this mapping from a URI to the object identified, which can be a very powerful tool.

In 4Suite, the main point of entry to resolving URIs is the InputSource, which descends conceptually from the class of the same name in SAX. In the SAX version, an InputSource wraps up metadata about the object identified by a URI; 4Suite's InputSource extends this by allowing the user to inject novel mechanisms through which the InputSource can find the object associated with its URI and any other resources related to this resource.

In 4Suite, InputSource objects are created using InputSourceFactory objects; the InputSourceFactory class provides two options for customizing the way in which InputSource objects that it creates will resolve URIs into file-like objects. At one level, a user can specify an XML Catalog that will be used to map URIs to alternate URIs before the references are resolved. At a lower level, a user can define a custom resolver class that takes full control of the resolving process. This article describes and illustrates these two approaches.

2 Using XML Catalogs

XML Catalogs are XML documents that can define a mapping from URIs to other URIs. Although they are often used for describing the local filesystem location of cached resources from the Internet, they are more generally useful for describing resolvable URIs for URIs that may have unsupported URI schemes or that you do not want to or can not resolve for whatever other reason. 4Suite provides two pre-initialized InputSourceFactory objects that use two different sets of XML Catalogs. The first, Ft.Xml.InputSource.DefaultFactory, uses the set of XML Catalogs defined in the two environment variables and , as well as a few other system XML Catalog files. The second, Ft.Xml.InputSource.NoCatalogFactory, uses no XML Catalog files at all. The section of the 4Suite Users' Manual titled InputSource objects describes how to use these built-in InputSourceFactory objects to create InputSource objects. As the name would imply, Ft.Xml.InputSource.DefaultFactory is used by default in those operations that don't explicitly specify an InputSourceFactory.

When the pre-initialized InputSourceFactory are not adequate (such as when you want to set up your XML Catalogs programmatically), you can create a custom InputSourceFactory, which can then be used in the expected way. In order to have this custom InputSourceFactory use a custom XML Catalog, you create a new Ft.Xml.Catalog object and provide it to the InputSourceFactory using the catalog parameter. This process is illustrated in the following example (which uses an InputSource created from the custom InputSourceFactory to process an import instruction in an XSLT script):

from Ft.Xml.Xslt import Processor
from Ft.Xml import Catalog, InputSource
from Ft.Lib import Uri
import sys

# New processor:
processor = Processor.Processor()

# The XSLT with custom URI references that we want to use:
XSLT = """
<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:import href="myscheme:target"/>
</xsl:stylesheet>
"""

# Turn the catalog filename into the corresponding `file` URI.
catalog_URI = Uri.OsPathToUri('catalog.xml')

# Load the catalog.
theCatalog = Catalog.Catalog(catalog_URI)

# Create a new `InputSourceFactory` object using our catalog instead of the
# default set of catalogs.  Note that this will not use any catalogs listed
# in the environment (such as in `XML_CATALOG_FILES`); it is possible to
# augment the default set of catalogs instead of replacing them, and I may
# try to cover that later.
theInputSourceFactory = InputSource.InputSourceFactory(catalog = theCatalog)

# Create input sources for the XSLT and the input document (which will also
# be the XSLT document, for simplicity).
xsltSource = theInputSourceFactory.fromString(XSLT,
  u"tag:jlc6@po.cwru.edu,2007-08-10:exampleDoc")
inputDoc = theInputSourceFactory.fromString(XSLT,
  u"tag:jlc6@po.cwru.edu,2007-08-10:exampleDoc")

processor.appendStylesheet(xsltSource)
processor.run(inputDoc, outputStream=sys.stdout)

This requires an external catalog file creatively named catalog.xml, which maps from our proprietary URI scheme to a URI in the filesystem:

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
  "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <uri name="myscheme:target" uri="identity.xslt"/>
</catalog>

3 Custom URI resolvers

An XML Catalog gives us a lookup table for mapping URIs (and URI prefixes) to custom locations, but sometimes we might want to have full control over how we resolve those URIs. A consumer might have special application-specific rules for how content is returned for particular URIs. The object that resolves URIs in this type of situation should be an instance of a subclass of the Ft.Lib.Uri.FtUriResolver class. This class must define normalize and resolve methods. You install this resolver by passing an instance of this class as the resolver parameter to constructor of the InputSourceFactory that you will use to create InputSource objects that require the custom resolution semantics.

In the following example, we want to resolve a custom URI to the contents of an in-memory string, using an exceedingly naive subclass of FtUriResolver:

from Ft.Xml.Xslt import Processor
from Ft.Xml import InputSource
from Ft.Lib.Uri import UriException, FtUriResolver, Absolutize
import sys, StringIO

# New processor
processor = Processor.Processor()

# The XSLT with custom URI references that we want to use:
XSLT = """
<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:import href="myscheme:target"/>
</xsl:stylesheet>
"""

# The "object" containing the resource that we want to obtain when we
# resolve "myscheme:target" (in this case, an identity transform in a
# string, but it could be anything in any sort of object that could present
# a file-like interface).
XSLT_IMPORT = """
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>
"""

class UnoResolver(FtUriResolver):
  # For some reason, subclasses of `FtUriResolver` must define a `normalize`
  # method, even though `FtUriResolver` provides such a method.  Thoughts,
  # anyone?
  def normalize(self, uriRef, baseUri):
    return Absolutize(uriRef, baseUri)

  def resolve(self, uri, baseUri=None):
    if uri == "myscheme:target":
      return StringIO.StringIO(XSLT_IMPORT)
    else:
      raise UriException(UriException.RESOURCE_ERROR, loc=uri,
        msg="This resolver only handles one URI, and that wasn't it!")

# Create a new `InputSourceFactory` object using an instance of our
# UnoResolver instead of the default resolver.
theInputSourceFactory = InputSource.InputSourceFactory(
  resolver = UnoResolver())

# Create input sources for the XSLT and the input document (which will also
# be the XSLT document, for simplicity).
xsltSource = theInputSourceFactory.fromString(XSLT,
  u"tag:jlc6@po.cwru.edu,2007-08-10:exampleDoc")
inputDoc = theInputSourceFactory.fromString(XSLT,
  u"tag:jlc6@po.cwru.edu,2007-08-10:exampleDoc")

processor.appendStylesheet(xsltSource)
processor.run(inputDoc, outputStream=sys.stdout)

This page was last modified on 2008-02-10 19:07:00-05:00.

This page was first published on .

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

See the version of this page with comments enabled to read or add comments.