Chapter 9. URL Mapping

 

“When you've only got two ducks, they're always in a row.”

 
 --Me

When Apache receives a request for a URL it has to figure out how that URL maps to actual content that it needs to send back to the client. Usually, the URL gets mapped directly to a file that is read off of the disk and sent as-is out to the client. Occasionally, the URL maps to a program of some variety that generates content, which then gets sent out.

This phase of figuring out what to send in response to a request is called the URL mapping phase. There are a number of directives that assist Apache in figuring out how to perform this mapping.

Location

The Location directive defines how a particular URL is to be treated, and does not necessarily indicate a file path, or refer to the file system at all. A Location directive will usually map a URL to a handler. A handler is a process that generates the content that will be displayed in response to the request. Handlers are described in additional detail in Chapter 14, “Handlers and Filters.”

The Location directive creates a section, like the Directory or Files directive. This section contains directives that will apply to requests matching the specified pattern.

For example:

<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Deny from all
    Allow from .your_domain.com
</Location>

In the previous example, requests starting with /server-status will be answered by the handler server-status. Additional directives might be placed in the section. In this case, access restrictions have been placed on who can get to the content served by this handler.

Alias

The Alias directive, and its close relative the AliasMatch directive, tell Apache to map URLs beginning with a certain thing to a particular part of the file system. Typically, this is used to map a URL to somewhere outside of the DocumentRoot, the place where documents are supposed to be served. (See the DocumentRoot section later in the chapter for more information).

The syntax of the Alias directive is as follows:

Alias /icons/ /usr/local/apache/icons/

If a server called www.apacheadmin.com had a directive such as this one, then a request for http://www.apacheadmin.com/icons/unknown.gif would result in Apache attempting to serve a file called unknown.gif out of the directory /usr/local/apache/icons rather than out of the main document root.

ScriptAlias

The ScriptAlias directive is a special case of the Alias directive. It indicates that, for the specified alias, files should be served out of the specified directory, and that it should be assumed that they are all CGI programs and, therefore, should be executed. (See Chapter 15, “CGI Programs,” for more details.)

The syntax of the ScriptAlias is as follows:

ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/

The previous example is the equivalent of the following set of directives:

Alias /cgi-bin/ /usr/local/apache/cgi-bin/
<Directory /usr/local/apache/cgi-bin/>
Options +ExecCGI
AddHandler cgi-script *
</Directory>

You can refer to Chapter 4, “Configuration Directives” section Options +ExecCGI, and Chapter 14, “Handlers and Filters,” for more detail on these other directives.

Usually, the ScriptAlias directory will be where you will put all your CGI programs, although this is not required. Also, you might have more than one ScriptAlias directory.

AliasMatch and ScriptAliasMatch

In addition to the regular form of the Alias and ScriptAlias directives, there are also versions that use regular expressions to match a class of things, rather than just one particular string. For example, if you find that a large number of your users are misspelling a URL, you can compensate for that.

AliasMatch ^/[dD]rbacc?h?[ui]s(.*) /usr/local/apache/vhosts/drbacchus$1

This regular expression will match URLs that start with /drbacchus (with an upper or lowercase D) with one or two c's, perhaps missing the h, and with a, u, or an i before the s. This one directive fills the place of the 16 Alias directives that would be needed to check for these four possible variations.

See Appendix C, “Regular Expressions,” for a more complete treatment of the regular expression syntax available in Apache directives.

Redirect

The various Redirect directives (Redirect, RedirectMatch, RedirectPermanent, and RedirectTemp) serve a rather different purpose than the Alias directives. The Alias directives define an accepted URL and tell Apache where to serve the content from for that URL. The Redirect directives say that a particular resource is no longer at a given location, or never was, and tell the client to go elsewhere to get it.

Redirect /apache http://www.apacheadmin.com/

This directive actually sends a Redirect back to the browser, telling it to go to the new location. Because the redirect is actually handled by the browser, and not by the server, the browser will (usually) have the new URL displayed in the Location box, and the user will be able to see that they have been taken to a different site. Likewise, if they attempt to bookmark the site that they have arrived at, they will get the new URL, rather than the one they typed.

Redirect directives are very useful, and important, if and when you redesign your Web site. Your old URLs should continue to work, so that you don't confuse your loyal users. So, you should provide redirects from all the old URLs to the new places where that information is kept.

The Redirect directive takes an optional additional argument, which can set the status of the redirect. The status argument can be one of permanent, temp (the default), seeother, or gone. These arguments cause mod_alias to send different HTTP status codes, as shown in Table 9.1.

Table 9.1. HTTP Status Codes

Argument Status code Description
permanent 301 The resource has moved permanently.
temp 302 The resource has moved temporarily.
seeother 303 The resource has been replaced by another resource.
gone 410 The resource has been permanently removed. In this case, the URL argument should be omitted.

For example:

Redirect seeother /apache http://www.apacheadmin.com/

This informs the browser that the resource that was at the URL /apache on your server has been replaced with the new resource which is at http://www.apacheadmin.com/. Whether the browser chooses to do anything about this or not is a separate issue. For example, an intelligent browser would use this as a hint to update your bookmarks with the new location.

RedirectMatch

RedirectMatch, much like the AliasMatch directive described previously, accepts a regular expression, rather than the literal path, as its first argument. Otherwise, the syntax is the same as for Redirect.

For example:

RedirectMatch permanent ^/[dD]r[Bb]acc?h?us http://www.drbacchus.com/

The previous example will redirect any URL that looks like /drbacchus, but with the d and the b optionally uppercase, and the c and the h optionally missing, to the new URL http://www.drbacchus.com, and tell the client that the redirection is a permanent one.

RedirectTemp and RedirectPermanent

The RedirectTemp and RedirectPermanent directives are exactly equivalent to Redirect temp and Redirect permanent, respectively. That is, using the RedirectTemp directive is no different from using Redirect with the optional temp argument. These directives are for convenience only.

DocumentRoot

When Apache has finished running through the various Alias and Redirect directives as well as the Location directives, the assumption will be reached that the resource requested is simply a file resource, and should be loaded off of the file system and sent out to the client.

The DocumentRoot directive tells Apache where in the file system it should start looking for this file. The syntax of the directive is as follows:

DocumentRoot /usr/local/apache/htdocs

Apache will take the path of the requested document, prepend the value of DocumentRoot, and attempt to serve that file.

For example, if the requested URL is http://www.apacheadmin.com/services/apache/index.html, and the DocumentRoot is set to /usr/local/apache/vhosts/apacheadmin/htdocs, the Apache will attempt to serve the file /usr/local/apache/vhosts/apacheadmin/htdocs/services/apache/index.html.

If that file is there one of two things will happen. If there is no handler associated with the file it will simply be sent to the client with the appropriate MIME headers. Or, if there is a handler associated with the file, then the handler will be called, with the path to the file as an argument. A handler, simply stated, is a process defined for preprocessing a resource before it is sent to the client. Handlers will be treated in more detail in Chapter 14.

If the file is not there, then the client will receive an error message of some description, telling them that the document was not found. This is a 404 (document not found) error. If there is no ErrorDocument defined for this type of error, they will receive a simple, dynamically generated error message, which will tell them that the document could not be found. See the following section for how to deal with this more elegantly.

Error Documents

When something goes wrong, the end users typically get an unhelpful message. It is not helpful to them because it does not tell them any information that they can actually use, and it is not helpful to you as the server admin because you don't get any useful report about what went wrong.

If they request a resource that does not exist, they will typically get a message that says something like:

Not Found

The requested URL /foo was not found on this server.

Apache/1.3.20 Server at www.apacheadmin.com Port 80

This does not help them. Sure, they know that what they were looking for is not there, but they don't know what to do about it. They don't know where the document has moved to, or if the information is just not available, and they don't know how to ask you, the site admin, where to go look for it. And, worse yet, they think that it is their fault.

And, as the admin, it does not tell you anything useful. In your logs, you'll see an entry like this:

[Sat Aug 11 22:32:25 2001] [error] [client 192.168.1.3] File does not
exist: /usr/local/apache/vhosts/apache/htdocs/foo

This tells you that someone was requesting a document you did not have, but you don't really know what he was looking for. You don't know if he made up the URL, followed a link from somewhere, or read it in the newspaper. And you don't have any way to contact him for this information.

The way around this is to provide a more useful error document. This is done with the ErrorDocument directive:

ErrorDocument 404 /errors/notfound.html

In the previous example, any “not found” error will receive the document located at the URL /errors/notfound.html, rather than the auto-generated “not found” error page.

This enables you to have a error document page that does not make the user feel like he broke something, but gives him useful information about how to find stuff on your site. Perhaps it could also provide a handy way to contact you about how he got to that page and what he was expecting to see when he got there.

If you make it a CGI program, or other dynamically generated page, you can capture their referrer—that is, where they came from, making it very easy for you to figure out who has bad links to your site.

Error documents can be more specific than this. For 404 pages (resource not found), for example, you might want different error documents per directory. In one directory, you might want to display a default document for all invalid requests. In another directory you might want to redirect the request to a CGI program that outputs a custom error message that helps the user find what they want—perhaps guessing an alternative URL based on the URL the entered—or helps him notify someone about what he was looking for.

You can provide custom error documents for any error condition. For a 403 (authorization require) you could display a page where the user can apply for a user account. For a 500 (server error) you can display a page that masks the fact that your CGI programs are not working, and sends the bug report to you in a useful form, rather than to the user in a useless one.

URL Rewriting

Occasionally, you'll want to take an incoming request, and, based on certain criteria, send it somewhere else. Perhaps you want to send people to different URLs based on what browser they are using, what time of day it is, or what IP address they are coming from. Fortunately, there is a way to do this.

mod_rewrite is a delightful module that enables you to take a request as it comes in, and modify the requested URL before it is passed on to the URL mapping process described previously.

For the full scoop on mod_rewrite, you need to read the URL rewriting guide, which you can find at http://httpd.apache.org/docs-2.0/misc/rewriteguide.html.

Summary

When the user requests a URL, Apache goes through a rather lengthy process to figure out exactly what it is that gets sent to the client. This is called the URL mapping phase. At the end of the process, if the document was still not located, or if there was some other error encountered, the user receives either an auto-generated error page, or a document specified by an ErrorDocument directive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset