Chapter 10. Content Negotiation

 

“In a defeat there would be a roundabout vindication of himself. He thought it would prove, in a manner, that he had fled earlier because of his superior powers of perception. A serious prophet upon predicting a flood should be the first man to climb a tree. This would demonstrate that he was indeed a seer.”

 
 --The Red Badge of Courage—Stephen Crane

Content negotiation is another phase of URL mapping (see Chapter 9, “URL Mapping”), but sufficiently important to warrant its own chapter. When the client requests a document from the server, if there is any ambiguity as to what document to give to the client, there is negotiation between the parties regarding which document is the best suited to the needs of the client.

Which means that the end-user decides what representation of the document she most wants to see.

With the increasing diversity of your Web site's potential audience it is increasingly important to cater to their needs and provide them with content in a usable format. Content negotiation provides a way to do this seamlessly and invisibly. Two users might visit exactly the same URL on your Web site and get completely different content—perhaps in a different language, for example—based on the preferences they have set in their browsers.

The basic idea behind content negotiation is that the Web site should be available in a variety of different representations, and that clients should be able to select the one that best meets their preferences and needs.

For example, one user might need to view your Web site in English and likes to have documents as HTML, whereas another might prefer French and a plain text document so that it can be read to them by a screen reader.

Many Web sites provide links from their home page to the French or English versions, but content negotiation enables Apache to automatically figure out what version of the site the user needs, and just give it to them.

Content negotiation might better be called content selection, because no actual negotiation takes place. The term negotiation implies a conversation between the server and the client in which some sort of compromise is reached. What really happens is much simpler than that.

Client Preference

The first, and most important, part of the content negotiation is for the client to communicate the document types that she prefers to receive. This is done with one or more Accept*[1] header, each of which might have associated quality factors.

Accept Headers

When the client makes a request she sends with it a list of document types that she is willing to accept, and the relative preferences that she places on those document types. This is done with four Accept HTTP headers, which the client might send. These four headers—Accept, Accept-Language, Accept-Encoding, and Accept-Charset—list the document types that the client is willing to accept, as well as those that she prefers to accept.

For example, to indicate that a media type of HTML is acceptable to the client, the following Accept header would be sent:

Accept: Text/html

The server, on the other side of the equation, maps filenames to media types, language, character sets, and/or encoding method, with file extensions or other techniques discussed in Chapter 8, “MIME and File Types.” These factors are compared to the various Accept* headers that the client has sent, and the most appropriate representation of the document is selected.

Quality Factor

The quality factor is an important consideration in the decision of which representation of the document will be the best for the client. The client has an opportunity to send a quality factor along with each of the document types that she has indicated she will Accept. The quality factor is a number between 0 and 1, which indicates the relative preference of receiving that document type in relation to the other specified document types.

The following example shows a client indicating that it most prefers to receive documents as HTML, but will accept any text document as a fall back if no HTML representation is available.

Accept: text/html; q=1.0, text/*; q=0.5

The quality factors indicate that resources of type text/html are most preferred, at a quality factor of 1.0. That means, if a representation of the resource is available in that media type, it should be given a preference (or probability) higher than any other media type. The second media type listed, text/*, indicates that any document with a major-media type of text should be considered, at a much lower preference. The * is a wildcard, matching any secondary media type. Thus, documents of type text/plain or text/csv would match that wildcard.

Because the quality factor is an optional argument, most browsers actually omit this part of the header. For example, the browser that I am currently using (Netscape 4.72) passes the following Accept header:

Accept: image/gif, image/jpeg, image/pjpeg, image/png, */*

Presumably, this is intended to indicate an order of preference of the given image formats. However, the specification indicates that types with no quality factor are all considered at the same level of preference, as you will see in the following description of the negotiation process.

The other Accept headers, also, can attach a quality type to the various accepted representations. For example, to indicate that the client prefers documents in the French language, but will also accept German documents, at a reduced preference, the following header might be sent

Accept-Language: fr; 1.0, de; 0.8

Similarly, preferences can be expressed regarding the content encoding, and the character set used.

Negotiation Methods

After the client has sent the preferences and the associated weighting factors for each preference to the server, the server then uses this information to determine which of the available representations of a given resource most closely matches the preferences of the user.

This is done in one of two ways: MultiViews, or by using a type-map file created on the server listing the various representations and the associated qualities of those documents.

Type Map File

A type map file, as the name indicates, is a file containing a map of the files in a particular directory, and the various MIME types that should be associated with each. For a given URL, all representations of that URL should be listed by filename, and their respective properties listed with it.

The name of the type map file is specified by using an AddHandler directive, with the type-map handler:

AddHandler type-map .var

With the proceeding example, files with an extension of .var will be used as type map files.

An example of a type map file follows:

URI: about

URI: about.html.en
Content-type: text/html
Content-language: en

URI: about.html.fr.de
Content-type: text/html;charset=iso-8859-2
Content-language: fr, de

The map above describes one resource, which can be retrieved with the URL (relative to whatever directory we are currently in) of about.html, or just about. The first entry in the map lists only the base reference of about. The next two entries list two alternate representations of the same document, the first one in English, the second in French and German, with a character set of iso-8859-2.

When a client sends an Accept header indicating that French or German is the preferred language, the second listed variant of the document is returned.

If you want to indicate that one particular representation of the document is a better quality than another , you can do so with the qs parameter. The following example, from the mod_negotiation documentation, shows a resource that is available as a jpeg image, a gif image, and as ASCII art, in decreasing order of image quality.

URI: foo

URI: foo.jpeg
Content-type: image/jpeg; qs=0.8

URI: foo.gif
Content-type: image/gif; qs=0.5

URI: foo.txt
Content-type: text/plain; qs=0.01

The Content-type attribute can also specify a level, which specifies the specification version for particular file types. For example, HTML 2 would be specified as

Content-type: text/html; level=2

In addition to the examples shown above, other attributes of the file can be specified from the list below:

  • URIThe URI of the file containing the particular variant. These are interpreted as URLs relative to the current directory. They might not be a fully qualified URL pointing at a different server.

  • Content-TypeThe media type (or MIME type) of the resource. A quality factor (the qs parameter) may be given, and a character set may be specified.

  • Content-LanguageThe language of the media type.

  • Content-EncodingThe manner in which the content is encoded, if any.

  • Content-LengthThe size of the file in bytes.

  • DescriptionA description of the file. If no variant of the file matches the client's requirements, a list of available variants is provided, with these descriptions.

Given all of this information, Apache selects the variant that it is going to send to the client in the following way.

For each the various factors to be considered, the quality factor provided by the client (if any) is multiplied by the quality of service factor from the type-map file, if any, to arrive at the overall quality factor of the document. The document (or documents) with the highest value are selected. If, at any point, there is only one document remaining for consideration, this file is sent to the client.

The various factors are considered in the following order:

  • Media type

  • Language

  • Media-type level

  • Character set

  • Content encoding

  • Smallest content length

If, by the end of this procedure, a single document has not been selected, the user is presented with a list of the available documents, with their descriptions, and is able to choose one.

MultiViews

Content negotiation by MultiViews is simpler to configure, and does what you want most of the time. It does not allow you the fine level of control that actually writing a type-map file does, and it gives up most of your control over what resource the client gets, deferring entirely to the preferences specified by the client.

MultiViews is turned on via the Options directive:

Options +MultiViews

This can be set for a directory (in a directory section, or in a .htaccess file) or in the main server configuration. Note that setting Options All does not turn on MultiViews. It must be set explicitly.

When MultiViews is turned on, Apache will search the directory for files that match the requested resource, and create a type map for that resource on the fly. For example, if a URL http://www.example.com/testing/index were requested, Apache would look through the testing directory for files with names starting with index. For each one it would create an in-memory type map, using any associations that have been added to the file with one of the various directives used for this purpose.

These associations can be added to files in the following ways:

Media type (MIME type) AddType, ForceType, entries in the mime.types file
Content encoding AddEncoding
Character set AddCharset
Language AddLanguage, DefaultLanguage

For more details on this process see Chapter 8, “MIME and File Types,” in which these directives, and various related ones, are discussed.

Mostly what you are missing when you use MultiViews rather than a type-map file, is the capability to indicate the relative quality of various representations of a file, therefore, control of the selection process is given entirely to the client, by way of the Accept headers.

In the event that a “best” language cannot be determined from the client configuration, the server is able to set a preferred order in which the languages should be considered. This is done with the LanguagePriority directive, which lists in order, the preferred languages on the server side. Presumably, this will express the order of quality of the document. Perhaps, for example, the site is a French site, which has been translated into German. The German documents, being translations, and not the originals, might have a lower quality of information. This would be indicated with the LanguagePriority directive below:

LanguagePriority fr de en

In this scenario, the English version of any given document will be served only if the client explicitly requests it, or if there are no French or German variants of the resource.

Noncompliant Browsers

As mentioned above, most browsers do not correctly[2] set the quality factor on Accept headers. By passing multiple media types in an Accept header, for example, in the order of preference, they make the assumption that the order has something to do with which media type will in fact be preferred. So you will often see Accept headers like the following.

Accept: text/html, text/plain, text/*, */*

Because of the ambiguity involved in this list, Apache applies the following rules to figure out what is meant:

Fully qualified media types for which no quality factor is specified (such as text/html or image/png) are interpreted as having a quality factor of 1.0.

Media types of the format type/* are interpreted as having a quality factor of 0.02.

The wildcard media type */* is interpreted as having a quality factor of 0.01.

With this scheme, types with the format type/* are preferred over a wildcard of the form */*, and explicit media types are preferred over those. However, there is still no ordering of preference between various fully qualified media types.

These adjustments are made only if there were no quality factors specified anywhere in the Accept header. If any quality factors are specified in the Accept header, then the above scheme is not used, and it is presumed that the specified media types are the ones that should be used. The exception to this is that fully qualified media types (those with both a type and a subtype) are given a quality factor of 1.0 if none was specified.

Caching

Ordinarily, negotiated documents are not cached. This is so that multiple clients behind the same proxy all get the document that is best for them. If the CacheNegotiatedDocs directive is turned on, then negotiated documents might be cached. Although this reduces network traffic, it means that some clients behind a proxy might get the document that is best for someone else in their office, and not necessarily for them.

This configuration is set by adding the following directive:

CacheNegotiatedDocs on

The default value is off.

Summary

Content negotiation provides a transparent way to give clients documents that are best suited to their needs without them having to explicitly select it from a list of options. Using this technique eliminates the need to explicitly list the various languages in which your site is available, for example, because your clients will automatically get the version they need based on the language preference configured in their browser.



[1] Accept* is used here to indicate that there are multiple headers starting with Accept.

[2] Or at all!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset