Infinitesque

Subjects:

A-prefixed URIs: Overview

Principal author:
John L. Clark

Abstract

People want to be able to take web identifiers, or Uniform Resource Identifiers (URIs), and copy them into their web browsers to obtain information related to such identifiers. Designating similar but not identical URIs to those used to locate documents may help in solving some web architecture problems. This report summarizes these problems and discusses this solution proposal.

 

I think it's fair to assume that most users and programmers and people who don't spend their lives thinking about web architecture have a mental model that can be expressed informally as "URLs point to web pages".

 
 --Norman Walsh

The World Wide Web (or "the Web") is most commonly recognized as a very powerful tool for locating and reading documents (such as web pages) shared across significant distances. These documents are located using Uniform Resource Locators, commonly known as URLs [BerEtAl94]. URLs have a particular syntax that allows computer programs to be able to talk to the machine that may have the document, request the document, and then make a copy of the document — if it is available — over the Internet for local viewing. For example, the URL http://www.w3.org/ refers to the main web page of the website for the World Wide Web Consortium, or W3C. (The W3C is an international organization with a large role in helping to develop common specifications for communication on the Web.)

Users of the global Web have come to realize that in addition to locating documents, these URLs serve a more general purpose. They give us a means to concretely refer to these documents to discuss them. That is, in addition to locating a given document, a URL also identifies that document. When you say "I want to see a copy of http://www.w3.org/ in my web browser", you want to use the URL as a locator. When you say "http://www.w3.org/ has an attractive layout", instead of finding and retrieving the document, you are identifying the document in order to comment on it. Not surprisingly, if Uniform Resource Locators are used to locate documents, then we say that Uniform Resource Identifiers, or URIs [BerEtAl04], are used to identify these documents.

The intent of using these URLs as identifiers as well as locators was originally to be able to make notes and other annotations about the corresponding documents. People also wanted to make notes about things besides documents on the Web, however. Since URIs proved to be so effective at identifying things on the Web, people started using URIs for identifying things besides web documents, so that both web documents and other sources of information could be discussed using the same mechanism. (This is the desire for a uniform mechanism, the "U" in both URL and URI.)

These things that can be discussed on the web are called resources (the "R" in both URL and URI). While we always say that a resource has an identifier, so that it can be discussed, only some resources actually have a location on the Web. For example, we might want to identify both the W3C and the W3C's home page to discuss each of them (possibly noting the relationship between the two), but while we can certainly look at their home page, there is no way we could download the entire World Wide Web Consortium (at least with current technology). As a result, every URL for a resource is also a URI for the same resource, but there are many URIs which do not provide a location for the resource that they identify, and so are not URLs.

Note

The report Architecture of the World Wide Web, Volume One [JacWal04] provides a very good introduction to many web architecture concepts.

The format, or syntax, of URIs and several types of URLs is well-defined in a number of specifications ([BerEtAl94], [BerEtAl04]). This syntax always indicates a scheme, which is located at the beginning of all URIs. For URLs, the scheme indicates the procedure, or protocol, for obtaining the document to which the URL refers. For example, the common http scheme indicates that the Hypertext Transfer Protocol (HTTP) should be used to retrieve documents located using that scheme.

The problem that this proposal aims to address is very simple. What happens if you use the http scheme, or other schemes commonly used for URLs, to identify resources that do not have a location on the Web? Such URIs could easily also be used as URLs, referring to documents on the web which would then have the same identifier as the original resource. For example, someone might use http://www.w3.org/ to identify the W3C organization. There is then a problem when one wants to identify the W3C home page. People tend to use URLs in this way (to identify organizations and other non-Web resources) in order to make the associated web page easy to find. The ambiguity still exists, however, and causes problems when you try to be precise in your discussions about various resources.

In order to solve this problem, I propose a class of URI schemes closely related to (but purposefully different from) the "related" URL schemes. I call this class of URI schemes A-prefixed URIs because they are constructed by adding the single character 'a' to the beginning of common URL schemes. The A-prefixed URI scheme corresponding to the http scheme, then, would be the ahttp scheme. For example, one might refer to the W3C organization using the URI ahttp://www.w3.org/; the W3C's main web page would still be identified by and located at http://www.w3.org/. In this way the identifiers are clearly related while still distinct. I suggest that whenever you want to refer to some non-Web resource that has a related web page or other web document, you identify the non-Web resource using the A-prefixed URI that corresponds to the document's URL. I further suggest that authors should always use URL prefixes for Web documents.

Note

I am currently working on a second report, which will more carefully describe this solution and inspect its features. This report will also summarize some very informal research which indicates that users will be able to infer URLs from A-prefixed URIs.

References

T. Berners-Lee(W3C/MIT)R. Fielding(Day Software)L. Masinter(Adobe)Uniform Resource Identifier (URI): Generic Syntaxhttp://gbiv.com/protocols/uri/rev-2002/draft-fielding-uri-rfc2396bis-07.html2004-09

T. Berners-Lee(W3C/MIT)L. Masinter(Xerox PARC)M. McCahill(University of Minnesota, Computer and Information Services)Uniform Resource Locators (URL)http://www.ietf.org/rfc/rfc1738.txt1994-12

Ian Jacobs(W3C)Norman Walsh(Sun Microsystems, Inc.)Architecture of the World Wide Web, Volume Onehttp://www.w3.org/TR/webarch/2004-12-15

This page was last modified on 2005-06-12 00:00:00Z.

This page was first published on .

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.

See the version of this page with comments enabled to read or add comments.