Chapter 14. Handlers and Filters

 

“His voice,” thought Will, “I never noticed. It's the same color as his hair.”

 
 --Something Wicked This Way Comes —Ray Bradbury

Although much of the content on a normal Web site will be static content such as HTML files, image files, and various other media files, some requests will need to be handled differently to generate some portion of the document. This is usually done with a handler. And, in Apache 2.0, it can also be done with a filter. In this chapter, we'll talk about seven standard handlers that are part of modules that come with Apache, and then we'll look at creating a custom handler of your own. Finally, we'll talk about filters, what they are, and how to use one.

Handlers

A handler is, simply stated, any process that is called when a particular resource is requested. Usually this process will do something to the resource before it is sent out to the client, although occasionally, a <Location> will be pointed directly at a handler, and there will be no actual file resource involved at all. A number of examples of will be described in this chapter.

Configuration

Handlers are configured primarily with the four directives Action, AddHandler, RemoveHandler, and SetHandler. Action is used to create a handler, and the other three directives are for using an existing handler. The Action directive will be discussed in section “Custom Handlers,” and the other three will be described here.

AddHandler

The AddHandler directive is used to map a particular handler to files with a particular extension. The syntax of the directive is

AddHandler cgi-script .cgi

The first argument is the name of the handler, which is any of the handlers that will be described in the remainder of this chapter, or a handler that you have created using the Action directive, as described in the section “Custom Handlers.”

The second argument is a file extension. For the duration of the scope of the directive all files with that extension will be processed by the specified handler, rather than being served verbatim to the client.

Note that the file extension is not case sensitive, and might be specified with or without the dot. See Chapter 8, “MIME and File Types,” for comments about putting multiple extensions on a single file.

SetHandler

Although AddHandler associates a handler with a particular file extension, the SetHandler directive causes all files in a given scope (a <Directory> section, or a <Files> section, for example) to be served via the specified handler.

<FilesMatch ".(pl|cgi|exe)$">
SetHandler cgi-script
</FilesMatch>

SetHandler can also be applied to a <Location>, so that the entire <Location> is served via that handler, and there is entirely no relation between resources served and the file system.

The ScriptAlias directive is equivalent to setting SetHandler cgi-script on the entire specified directory. Thus, the following block is equivalent to one ScriptAlias directive.

Alias /cgi-bin/ /usr/local/apache/cgi-bin/

<Directory /usr/local/apache/cgi-bin>
SetHandler cgi-script
</Directory>

RemoveHandler

As the name implies, the RemoveHandler directive negates the effect of a SetHandler directive. This is particularly useful if you want to use a SetHandler directive in one directory, but not in subdirectories thereof.

In this example, we're setting the send-asis handler to deal with all .html files in the directory /usr/local/apache/htdocs/news, but removing that association for the stories subdirectory.

<Directory /usr/local/apache/htdocs/news>
SetHandler send-asis .html
</Directory>

<Directory /usr/local/apache/htdocs/news/stories>
RemoveHandler .html
</Directory>

Action

The Action directive creates a handler. It tells Apache to map a particular handler name to a given program that provides the implementation for a handler by the same hand. We will talk more about custom handlers in section titled “Custom Handlers.”

default-handler

default-handler is, as you would expect, the default handler that is used by Apache to serve static content, such as HTML documents, image files, and other files which do not require preprocessing of any kind.

AddHandler default-handler html

Although you could use this syntax to add the default handler to HTML documents, it is not actually necessary because this is the handler that will be use by default.

send-as-is

The send-as-is handler tells Apache to serve the file as is, without adding the usual batch of HTTP headers on to it, such as the Content-type, but to use the contents of the file itself to provide these headers. This means that you need to make sure the file contains valid headers in the first few lines. A send-as-is file might look like:

Status: 200
Content-type: text/html

<html>
<head><title>send-as-is</title></head>
<body>
<h2>send-as-is</h2>

The <code>send-as-is</code> handler tells Apache to serve the file as
is, without adding the usual batch of HTTP headers on to it.

</body></html>

In order for this to be served as is you would use a directive like the following:

AddHandler send-as-is asis

This will cause every file with an extension of .asis to be served to the client in this manner.

Note that the file must contain a blank line after the HTTP headers to indicate that the headers have ended. Content after the first blank line is assumed to be the body of the document.

This would be used for files you want to set very specific HTTP headers for, as well as for files you don't want the server to add any of its own headers to. Note in the example that even the HTTP Status: header needs to be specified. This technique can be used, for example, to turn off caching (with the NoCache header), or to set a particular language mapping, without having to turn on content negotiation.

cgi-script

The cgi-script handler tells Apache to treat the file as a program, to execute it, and to send the output of the program to the client.

The following example tells Apache to treat all files with the .pl extension as CGI program.

AddHandler cgi-script .pl

Note that files that are already contained in a ScriptAlias'ed directory are automatically handled with the cgi-script handler. See Chapter 9, “URL Mapping,” for more information.

CGI programming is covered in detail in Chapter 15, “CGI Programs.”

imap-file

Almost all image maps are now implemented as client-side HTML image maps, but not very long ago all image maps had to be handled on the server.

Client-Side Image Maps

An image map, in case you don't know, is an image embedded in an HTML page, which is divided up into zones, so that clicking different parts of the image takes you to different URLs. This is usually handled by creating a <map> in your HTML file containing one or more <area> tags, which define the various zones in the image. This might look something like the following:

<map name="linkmap">
<area shape="rect" alt="A TAB" coords="12,14,12,14" href="#A" title="A TAB">
<area shape="rect" alt="B TAB" coords="43,5,47,7" href="#B" title="B TAB">
<area shape="rect" alt="C TAB" coords="84,5,118,29" href="#C" title="C TAB">
<area shape="default" nohref>
</map>

You would then use this map by linking an image to it:

<img src="/images/tab.jpg" USEMAP="#linkmap" border="0">
<br clear=all>

Because the image mapping is handled entirely within the browser, there is no need to contact the server to figure out the mapping from the image to the desired URLs.

This is now considered the preferred way to handle image maps because all GUI browsers released in the last two or three years contain the capability to understand these HTML client-side image maps.

Server-Side Image Maps

Not so very long ago, however, there was a very real need to provide an alternative for those clients that did not know how to deal with client-side image maps. Widespread support for client-side image maps did not happen until about 1997.

There were a variety of different ways to deal with image maps, including several CGI programs that compared the coordinates of the click with a map file. I first implemented image maps using CGI code by Vivek Khera.

You can read more about the not-so-good old days of server-side image mapping at http://hoohoo.ncsa.uiuc.edu/docs/tutorials/imagemapping.html.

In Apache 1.1, however, a handler was introduced that would handle the mapping of image maps without involving CGI programs.

By using the imap-file handler, and creating a map file on the server to which an image is hyperlinked, you could implement image maps very easily.

A map file contains one or more lines that follow the general format

shape URL coordinates

For example,

rect http://www.serverop.com/ 10,10 75,112

Coordinates are measured from the top-left corner of the image, not relative to the Web page as a whole. When you have created a map file, you would use it by linking the image to it as follows:

<A HREF="/maps/imagemap.map">
    <IMG ISMAP SRC="/images/imagemap.gif">
</A>

With an appropriate AddHandler directive, Apache would know that the .map file should be handled as an image map file:

AddHandler imap-file .map

Because you are unlikely to use server-side image maps, and because this is not a book on HTML, more information is not provided here. If you want to learn more about image maps, client- or server-side read a book on HTML or the Apache documentation on mod_imap.

server-info

The server-info handler should be configured as a <Location> section. It displays detailed information about your server configuration. This handler is provided by the mod_info module. If you have this module built into your server you can activate its use as follows:

<Location /server-info>
SetHandler server-info
</Location>

Note that this is turned on with the SetHandler directive, which tells Apache to use the specified handler for all requests; not just those that match a particular file extension.

Accessing the URL /server-info on your server will give you

  • A complete listing of the modules you have compiled into your module.

  • The basic server settings, as well as the version number of the server build, and when it was built.

  • The directives that each module offers followed by a list of the values each module has been given.

Please note that this information is loaded out of the server configuration file not out of the in-memory copy of the configuration, which was loaded when the server started, or restarted. Therefore, it is possible that the files have been modified since the server was started or restarted and, therefore, this information might be out of sync with the actual configuration in which the server is operating.

It is recommended, for security reasons, that you limit access to this handler to trusted hosts. This can be done by adding the following lines into the above <Location> section:

Order deny,allow
Deny from all
Allow from your.host.name

server-status

The server-status handler gives you a convenient way to display what the server is doing right now. It is, like the server-info handler, a handler that will be set up in a <Location> section, rather than being associated with a file type:

<Location /server-status>
SetHandler server-status
</Location>

Note that mod_status must be built into the server for this configuration to have any effect.

Visiting the URL you have created with this Location directive will provide you with a Web page that describes the current status of the server, including how long it has been running, how many children are currently active, and what state each one is in, among other things. If you set the directive ExtendedStatus On then you get more detailed information in the report.

With ExtendedStatus turned off, you will receive the following pieces of information. First, you'll see general information about your Apache server build:

Apache Server Status for buglet.rcbowen.com

Server Version: Apache/1.3.20 (Unix) mod_perl/1.26
Server Built: Jul 16 2001 21:34:30

This will be followed by general “uptime” information about the server. That is, when it was restarted and how long it has been up.

Current Time: Friday, 07-Sep-2001 23:02:08 EDT
Restart Time: Friday, 07-Sep-2001 22:56:32 EDT
Parent Server Generation: 1
Server uptime: 5 minutes 36 seconds

Following this, you will see a bird's-eye view of how many child processes are running, and what they are doing. A key that explains what each of the various characters means accompanies this. This is called the “scoreboard”.

6 requests currently being processed, 7 idle servers

_WK_WKK_K___._..................................................
................................................................
................................................................
................................................................

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"L" Logging, "G" Gracefully finishing, "." Open slot with no current process

Finally, you'll get an explicit listing of each child process by PID (process ID) and what state each is in.

PID Key:

   5742 in state: _ ,   5743 in state: W ,   5744 in state: K
   5745 in state: K ,   5746 in state: _ ,   5747 in state: R
   5748 in state: K ,   5749 in state: K ,   5756 in state: _
   5759 in state: _ ,

With ExtendedStatus turned on, you get just about the same information, with one exception. Rather than just getting a listing of PIDs and the state of that child, you get much more detailed information per connection. This information is arranged in an HTML table, divided into the columns shown in Table 14.1.

Table 14.1. Field in Server-Status Report

Fieldname Explanation
Srv Child Server number—generation
PID OS process ID
Acc Number of accesses this connection / this child / this slot
M Mode of operation
CPU CPU usage, number of seconds
SS Seconds since beginning of most recent request
Req Milliseconds required to process most recent request
Conn Kilobytes transferred this connection
Child Megabytes transferred this child
Slot Total megabytes transferred this slot
Client The address of the client
VHost The virtual host from which the content was requested
Request The actual request

For example, the following table shows typical values that might appear in one row of the table that server-status generates. Next to each value, I've indicated what that value means to you. Note that this data is just one row of a larger table, with one such row for each currently active connection to the server.

Table 14.2. Field Provided by Server-Status

Field name Example value Explanation
Srv 3-2 The third child process, in the second generation
PID 2520 This particular process is process ID 2520
Acc 10/41/41 10 accesses this connection, 41 total for this child, and 41 total for this slot
M _ Waiting for connection
CPU 0.18 0.18 CPU seconds were used in processing this request
SS 9 9 Seconds since the beginning of the most recent request
Req 4 4 Milliseconds required to process the most recent request
Conn 0.0 0.0 Kilobytes transferred this connection
Child 0.11 This child process has transferred a total of 0.11 Kilobytes
Slot 0.11 Child processes in this slot have collectively transferred a total of 0.11 Kilobytes
Client 209.152.205.5 Client IP address
VHost www.apacheadmin.com Virtual host name
Request  GET /index.html HTTP/1.0 Client requested /index.html with HTTP version 1.0

It is recommended, particularly if you have ExtendedStatus turned on, that you restrict access to the URL of this report. Note that it shows exactly what host is accessing your site, and what URL they are looking at. This could be considered an invasion of privacy to allow random Web users to view this sort of information. To restrict access to this handler to just yourself, you would add the following lines into the <Location> section in your configuration file:

Order deny,allow
Deny from all
Allow from my.host.name.com

server-parsed

server-parsed documents are documents that might contain SSI (Server-Side Include) directives. Associating them with the server-parsed handler causes Apache to parse these documents on their way out of the pipe, looking for these directives, and, if finding them, replacing these directives with the content that they generate.

For more information on SSI, see Chapter 16, “Server-Side Includes.”

Please note that although the server-parsed handler has been superseded by the INCLUDES filter with Apache 2.0, it will in fact still work. Apache just silently replaces server-parsed with INCLUDES behind the scenes.

type-map

The type-map handler and type-map files were discussed in Chapter 10, “Content Negotiation.” A type-map describes the various available representations of a particular resource that are available for content negotiation.

Custom Handlers

In addition to the handlers that are included in a standard installation of Apache, you can implement your own handler quite simply, providing you a way to process files as they are served out to the client.

This can be done, for example, by writing a CGI program to process your file, and creating a handler using the Action directive. In the following example, we'll implement an idea mentioned in the Apache documentation, which suggests that someone might create a handler to add a footer to HTML pages.

You would add these directives to your server configuration file:

Action add-footer /cgi-bin/footer.pl
AddHandler add-footer .html

The CGI program located at /cgi-bin/footer.pl would look like this:

#!/usr/bin/perl

print "Content-type: text/html

";

my $file = $ENV{ PATH_TRANSLATED} ;

open FILE, "<$file";
print while <FILE>;
close FILE;
print qq~

FOOTER GOES HERE
~;

This simple CGI program, and the previous directives, will add footer text to every .html file that is served out of your server. Unfortunately, it also means that every .html file that your server sends out requires that a Perl CGI program be run, which might cause a substantial performance hit on your server.

Filters

The concept of filters is added in Apache 2.0. A filter is a process that is applied to content as it is sent to the client or to data as it is received from the client.

The nice thing about filters is that you can chain them. That is, you can apply several filters to content as it is being sent out to the client, and specify the order in which they are applied, as opposed to only applying one handler to a particular resource. This solves a problem that is frequently asked about on the various newsgroups pertaining to the Apache server. The question is “how do I evaluate Server-Side Includes embedded in the output of a CGI program?” The answer has always been “you can't.” But with Apache 2.0 filters, now you can, by passing CGI output through the INCLUDES filter.

There are two types of filters that we'll be talking about, but there is very little difference between them, either in concept or in operation. Input filters intercept data coming in from the client—the HTTP request—and modifies it in some way before it reaches the request processing mechanism. Output filters intercept data as it leaves the server and modifies it in some way before it gets sent out to the client.

At the time of this writing, the INCLUDES filter is the only one distributed with Apache 2.0.

Configuration for Filters

Four directives are provided for specifying filters; they are AddInputFilter, SetInputFilter, AddOutputFilter, and SetOutputFilter. Additionally, there are two directives provided by the experimental module mod_ext_filter, which are used to specify external commands or programs to be used as output filters.

AddInputFilter

Like handlers, filters are mostly defined by file extensions. The AddInputFilter directive maps incoming requests to an input filter by the file extension of the requested file.

More than one filter can be specified in the order in which they are to be applied. The syntax of the directive is as follows:

AddInputFilterfilter1;filter2;filter3 ext1 ext2

filter1, filter2, and filter3 are the names of input filters to be applied to the incoming request, and ext1 and ext2 are file extensions to which the filter is to be applied.

No real example is given here because there are no input filters that ship with Apache 2.0 at this time.

SetInputFilter

Like AddInputFilter, SetInputFilter specifies an input filter that will be applied to requests. Rather than specifying a file extension, SetInputFilter works throughout a scope, such as a <Directory> or <Location> section.

The syntax looks similar to the AddInputFilter syntax, but without the file extensions.

SetInputFilterfilter1;filter2;filter3

This directive should be set within a section, such as <Location>, <Directory>, or <Files>.

No real example is given here because there are no input filters that ship with Apache 2.0 at this time.

AddOutputFilter

The AddOutputFilter directive associates one or more filters with resources with a particular file extension. This causes the filter to be run on the content as it is sent out to the client. Note that you can chain several filters in a row if you want:

AddOutputFilterfilter1;filter2;filter3 ext1 ext2

At this time, there is only one filter that is shipping with Apache 2.0, and that is the INCLUDES filter, which provides the same functionality as the server-parsed handler. This is enabled using the following directive:

AddType text/html .shtml
AddOutputFilter INCLUDES .shtml

This enables SSI parsing for files with a .shtml extension. You can read more about SSI in Chapter 16.

SetOutputFilter

The SetOutputFilter directive causes all resources in the scope to be passed through the specified output filter. This directive should be used within a carefully defined scope, as Apache will attempt to run the filter on files within the scope, even it if might not particularly make sense to do so. For example, if you attempt to run the INCLUDES filter on JPEG files, there is a possibility of corrupting those images, as well as the fact that passing binary files through a text filter will cause a substantial performance degradation.

AddType text/html .shtml
<FilesMatch ".shtml(..+)?$">
 SetOutputFilter INCLUDES
</FilesMatch>

Note that more than one output filter can be specified, by providing a list of the filters, separated by semicolons, in the order that you want to have them applied.

INCLUDES Filter

The INCLUDES filter, which ships with Apache 2.0, and is implemented in the module mod_include, provides for server-parsed HTML, and the capability to fill in values for various expressions embedded in the HTML. You can read more about SSI (Server-Side Includes) in Chapter 16, which is devoted to this topic.

Summary

Handlers provide a way to provide dynamic content, either by processing a file as it is sent out to the client, or by mapping a <Location> to a process for producing dynamic content, which is completely independent of any documents in the document directories. Filters are the Apache 2.0 way to provide this functionality. They have the added benefit that you can chain multiple filters together and have them run in turn on the content as it is sent to the client, or as it is being received from the client.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset