Chapter 17. mod_perl

 

“These then, were the mountains, which the Lutherans believe they can remove with their faith. I greatly doubt it.”

 
 --The Monk and the Hangman's Daughter—Ambrose Bierce

As we discussed in the Chapter 15, “CGI Programs,” CGI has a lot of problems. Primarily, it boils down to one thing—it's slow. mod_perl addresses most of the problems that cause slowness in CGI programs. There are some tradeoffs, but most of them are acceptable, given the benefits.

What Is mod_perl?

mod_perl is an Apache module that embeds a Perl interpreter into the Apache process. This enables you to do a number of rather cool things. Perhaps most importantly, it gives you access to the entire Apache API from within Perl.

First, it enables you to run CGI programs written in Perl without having to launch the Perl interpreter. Because launching the program is the primary cause for slowness in most CGI programs, this results in a speed improvement of as much as 3000% for unmodified CGI programs.

Secondly, because the Perl interpreter is inside the Apache process, you can have persistence of state within the Perl interpreter. This means that your code can compile once, and never have to be compiled again. Perl is, contrary to common belief, a compiled language. It just compiles immediately before runtime, rather than compiling once and being stored as a binary file for later execution.

When Perl programs are executed under mod_perl, the program is cached in its compiled state so the next time the program is invoked, the compiled form of the program is already in memory, and just has to be executed. Because the compilation of your CGI program is the other major place where time is spent in CGI execution, you can see that this is another source of speedup.

Thirdly, and most importantly, mod_perl enables you to write Apache modules in Perl. Writing Apache modules might seem like a rather daunting task to the Apache beginner, and mod_perl makes this functionality available to anyone who knows a little Perl. (Of course, it has to be good Perl. More details later).

How Does It Work?

This will be something of an oversimplification because it is not necessary that the full details of mod_perl be explained for the purposes of an Apache administrator. But some points made in this chapter will make more sense if you understand something about the way mod_perl works.

mod_perl embeds a Perl interpreter in the main parent Apache process. Each Apache child then contains its own memory space that is used in conjunction with the Perl interpreter to execute Perl code. This means that data (and code) that is in the main Perl interpreter can be used by all the child processes. However, data that is only in one child process is not accessible to the others.

Installation

Installing mod_perl is simple in most cases, if you are not installing Apache with an unusual assortment of modules. If you are only going to be using modules that come with the default distribution of Apache, you should be able to follow this procedure.

The “Simple” Form

First, download the latest Apache distribution, and the latest mod_perl distribution. You will find these at http://httpd.apache.org/ and http://perl.apache.org/, respectively.

Unpack these distributions somewhere convenient. /usr/src is a recommended location, so that you'll know where all your installed packages live.

Change into the mod_perl directory, and type something like the following:

perl Makefile.PL 
APACHE_PREFIX=/usr/local/apache 
APACHE_SRC=../apache-1.3.20/src 
DO_HTTPD=1 
USE_APACI=1 
EVERYTHING=1 
APACI_ARGS='—enable-module=rewrite,—enable-module=speling'

Note that those backslashes at the end of each line are continuation characters. That means that you can either type them, if you like, or just type the entire command as one single line.

After this has finished doing things, you should then type:

make && make install

The Gory Details

To back up a little, here's a blow-by-blow of what that long command line did.

The first line, perl Makefile.PL, is the command that starts the process that builds and installs mod_perl and Apache on your server. Perl comes with a set of utilities that assist in the generation of makefiles, which are scripts that automate a build process. Makefile.PL is a Perl program that generates a makefile.

When running Makefile.PL, a number of arguments can be passed in, which affect the behavior of the build.

The first such argument is APACHE_PREFIX, which is the location Apache is to be installed. This is the equivalent to the -prefix argument that we used when building Apache earlier in the book.

APACHE_SRC gives the location of the Apache source code, which you have presumably unpacked in the same directory where you unpacked the mod_perl code. In the given example, we are building with Apache version 1.3.20, and so the source code for Apache is in the directory ../apache-1.3.20/src relative to the current location.

The DO_HTTPD argument tells mod_perl to build and install Apache when it builds itself. By default, it will just build the mod_perl portion, but not rebuild Apache.

USE_APACI causes the build process to use APACI to configure and build Apache. This was discussed in Chapter 2, “Acquiring and Installing Your Apache Server,” and is the Apache autoconf tool.

EVERYTHING=1 tells mod_perl to build and install everything that came with it, including all the Perl modules and support programs.

Finally, perhaps the most important part of the command, the APACI_ARGS argument lists the arguments that we want passed on to APACI to build Apache. This is where you put all the arguments that you would have passed to ./configure if you were building Apache without mod_perl. In my example, I have added the modules mod_rewrite and mod_speling by using the -enable-module argument, which is passed directly to the Apache configuration.

Typing make && make install compiles the mod_perl and Apache source code into binary form, and installs it in all the correct locations, as we discussed in Chapter 2.

Start It Up!

After you have typed the previous commands, you can restart your Apache server with the apachectl utility. Using this procedure, you can have a new Apache installation in place in just a few minutes. If you already have Apache installed and configured, the build and installation process will preserve your existing configuration files, and just install the various binaries over the ones that are already there. Consequently, unless you are removing modules that you had installed before, you should be able to install a new Apache and continue using your existing configuration files with no ill effects.

Configuration

You won't be configuring mod_perl in the same manner that you have configured other modules that we have talked about. mod_perl provides some additional functionality, as well as a plethora of configuration directives, but we will be using them in slightly different ways than you are used to. Additionally, we'll be putting actual Perl code into configuration, which might strike you as rather odd the first time through. You should bear in mind that mod_perl is not a traditional module, but an interface between Apache and Perl.

Consequently, most of the configuration information that I'll be giving is in the specific context of particular examples.

PerlRequire

The PerlRequire directive specifies the location of a Perl script that is to be run when the server starts up. Because the Perl interpreter is in the main Apache parent process, and any Perl code that is cached in that parent process is available to all child processes, Perl code executed at server start time is always available to the child processes.

So, any code that is run by the server at run time will be cached in the parent process, and available to the children.

PerlRequire should be used to load modules or constants that will be used by all the child processes. This will save memory because it is loaded into memory only once for the parent process, rather than having one copy in memory for each child process.

For example, in your main server configuration, you might have the following:

PerlRequire /usr/local/apache/vhosts/clueful/conf/preload.pl

Then, in the file preload.pl you would have the following:

use Apache::DBI;
use DBI;
use CGI qw(:Standard);
use MyCompany::Utils;
1;

Any child process that tries to load DBI will get it from the parent Apache process, rather than having to load it from a disk each time.

Note that the file loaded by PerlRequire needs to return a true value to tell Perl that it was successfully loaded. This is accomplished by putting a 1; as the last statement of the file. This is a no-op, but returns a true value.

Actually, DBI is a special case. mod_perl comes with a special-purpose module called Apache::DBI, which is a wrap-around DBI and provides persistent database connections. The first time you make a connection to a database, Apache::DBI intercepts calls to DBI and caches that connection so that all future database connections are made through the connection that is already open. This results in yet another enormous performance improvement, as you never have to wait for a database connection except for the first time. You will still make your DBI calls exactly as before, and Apache::DBI will automatically take over as needed.

Note

DBI is the Perl DataBase Interface, which provides a uniform API (Application Programmer Interface) to just about any database you might ever encounter.

Make sure that your PerlRequire script has Apache::DBI in it before DBI. You might also want to add a call to connect_on_init, which establishes the database handle during server startup, so that all the Apache children can share the database connection. It will look like this:

Apache::DBI->connect_on_init( $database, $username, $password );

The arguments $database, $username, and $password are the same as in the DBI connect method. See the DBI documentation for additional details.

CGI Under mod_perl

The most common usage of mod_perl, although not the most useful, is to run CGI programs under mod_perl, to get the speed benefits of mod_perl without actually rewriting any code.

There are two ways to run CGI programs under mod_perlApache::Registry and Apache::PerlRun. Although the former is the better of the two, the latter is good if you have existing CGI code that works fine as CGI, but is not written well enough to survive under the stricter requirements of Apache::Registry.

Apache::Registry

Apache::Registry is the preferred of the two methods, because it gives you all the benefits of using mod_perl, and removes all the things that make CGI unpleasant.

Apache::Registry works by compiling your CGI program once, the first time that it is run, and then storing that compiled form for future reference. The next time that the resource is requested, mod_perl loads it out of the cache, and executes the version that it has already compiled, saving the two most expensive parts of CGI execution—launching Perl in the first place, and compiling your CGI program. The first person to request the resource after a server restart (or after a new Apache child has been spawned) will get CGI-speed performance, but everyone after that will experience the speedup.

Configuration

All you have to do to run CGI programs as Apache::Registry is set up a PerlHandler for a particular directory and put CGI programs in it. Most of the time, what you want to do is point this at the place where you already have your CGI programs. This enables you to run the programs either in CGI mode, or in Apache::Registry mode, by just changing the URL.

This configuration will look like the following:

Alias /perl/  /usr/local/apache/cgi-bin/

<Location /perl>
  SetHandler  perl-script
  PerlHandler Apache::Registry
  Options +ExecCGI
</Location>

All files located in the directory /usr/local/apache/cgi-bin/ will be executed through Apache::Registry if accessed via the URL /perl/. So, for example, to access a CGI program called test.cgi located in that directory, you would access the URL http://your.server.name/perl/test.cgi

Yes, that's really all there is to it. Almost.

Caveats

One of the problems with CGI, which was not mentioned in the CGI chapter, is that CGI enables you to get away with really bad code. This might be viewed as an advantage for the beginner who wants to write a working CGI program in a few minutes, but in the long run it's a real problem because you end up with a lot of CGI programs that are difficult to maintain. Additionally, you end up with enormous Web sites containing thousands of terrible CGI programs, which not only teach very bad programming style to beginners, but they give Perl a bad name.

For our purpose the important thing to know is that bad Perl code will run just fine as CGI, but when run under mod_perl it causes problems. The reason for this is in the very thing that makes mod_perl useful—its persistence. It is common (but bad) practice for CGI programmers to use global variables, for example. Using them in a mod_perl environment causes that variable to be visible not only throughout the current execution of the CGI program, but throughout all executions of the program under the same child process. Consequently, if you modify that variable during one execution that change will be seen the next time the program is invoked. This can be particularly problematic if the variable contains information that you distinctly don't want shared across multiple instances of the program, such as a username, or a count, for example.

Careless Perl CGI programs running under mod_perl can have two unpleasant side effects. As alluded to previously, you can end up with unexpected values of variables because of that variable getting modified in another instance of the program. An even more dangerous side effect is that you can end up leaking memory if you are using variables in an unsafe manner. Because the lifetime of a CGI program (under mod_cgi) is very brief, the program runs and goes away immediately, and the leaked memory is immediately reclaimed. However, under mod_perl, the leaked memory is permanently lost—at least until Apache is restarted. The symptom that you will see is that the various Apache child processes will grow in memory usage until all your available resources are consumed and your server grinds to a halt or swaps excessively.

The solution is simple—don't write bad code. Or, stated more pragmatically, make sure that all your Perl code runs using strict and -w (or, in Perl 5.6, warnings). That is, make sure that every one of your Perl CGI programs starts with the lines

use strict;
use warnings;

Or, under versions of Perl prior to 5.6, append -w to the end of the #! line at the start of your program. This should look something like

#!/usr/bin/perl -w

Any code that runs under these conditions without printing warning messages should be fine under Apache::Registry.

Additionally, you should read the document at http://perl.apache.org/dist/cgi_to_mod_perl.html for more tips on migrating your Perl CGI programs to mod_perl.

Apache::PerlRun

If you have existing Perl CGI programs that work under mod_cgi, and you really don't want to spend the time to make them work under mod_perl, there is a halfway solution. Apache::PerlRun gives you some of the benefits of mod_perl, but without having to make sure that your code is strict safe.

This is only recommended as an interim solution while you whip your CGI programs into place, or for programs that you have acquired from some other source and don't really understand where you can make modifications.

Configuration

Configuring Apache to use Apache::PerRun looks very much like the configuration for Apache::Registry. And, you can run CGI programs with all three methods (mod_cgi, Apache::Registry, and Apache::PerlRun) out of the same directory at the same time, even running the same program all three ways, if you like.

The configuration for Apache::PerlRun looks like the following:

Alias /cgi-perl/  /usr/local/apache/cgi-bin/

<Location /cgi-perl>
  SetHandler  perl-script
  PerlHandler Apache::PerlRun
  Options +ExecCGI
  PerlSendHeader on
</Location>

Now, a file called test.cgi located in the directory /usr/local/apache/cgi-bin/ can be accessed via the URL http://your.server.name/cgi-perl/test.cgi and will be run through Apache::PerlRun.

The PerlSendHeader directive is added here because, by default, mod_perl does not send the HTTP headers mod_cgi provides for you.

What It Does

Rather than getting the full benefit of mod_perl, all you get is the benefit of having the Perl interpreter resident in memory, when you use Apache::PerlRun. Your CGI programs are still loaded from disk and compiled each time when they are requested, so anything nasty that is being done with global variables, or other unpleasantness, will have short-term affect, and then the program will go away, releasing any memory that it might have leaked.

Note that any modules you are using will be cached by mod_perl, so they will not have to load and compile each time.

Comparing Performance

With the configurations previously shown, and with a ScriptAlias directive also pointing at the same directory, you can do a direct head-to-head comparison of performance on the same CGI program. The URLs http:://your.server.name/cgi-bin/test.cgi, http://your.server.name/cgi-perl/test.cgi, and http://your.server.name/perl/test.cgi should all be pointing to exactly the same CGI program, but executed in three different ways, and executed in progressively better time, in the order listed here. The exact performance improvement will depend on the complexity of the code, but you should see a noticeable improvement.

It is important to note, as discussed in more detail in the “Common Problems” section that follows, each Apache child will have to cache the CGI program before you start seeing an improvement in performance. So, if you have 10 Apache child processes running, you will have to hit reload 10 times (on average) before you start seeing a child that has the code cached and gives you better performance.

Apache Handlers with mod_perl

Although by far the most common use of mod_perl is for CGI programs, it is actually much more powerful when used to write Apache handlers in Perl. Doing this enables Apache to do a lot of the things that CGI programs have to do for themselves. You don't have the startup cost associated with CGI programs, and, your programs are called directly by Apache, rather than second-hand through mod_cgi. And, because most of your programs will call methods directly out of the Apache API, for things such as printing headers, doing redirects, and writing to the log files, you gain an additional speed improvement over using either Apache::Registry or Apache::PerlRun.

mod_perl gives you access to the full Apache API, so that you can do useful things that are difficult or impossible with CGI.

Writing a mod_perl Handler

A mod_perl handler is an Perl module that contains a handler method. This method gets called by Apache, and passed an Apache::Session object. The method is expected to generate content that is sent to the client.

Note

“Method” is just a fancy name for a function. It is the common terminology used in object-oriented programming. However, you don't necessarily need to know anything about OO programming to write a mod_perl handler.

The Apache::Session object, which is passed to your method, is useful for things such as cookies, environment variables, authentication information, and so on—the sorts of things that you got from the environment when using CGI. It can also be used to make calls directly to the Apache API. This can be used to get form contents, to redirect to another location, or for a wide variety of other tasks.

This chapter does not attempt to be comprehensive with regard to mod_perl, so you should check out some of the resources listed in section “Where to Get More Information” to get more details.

Example mod_perl Handlers

As in Chapter 15, we'll start with a very simple example to show you what a mod_perl handler looks like. The following code example displays the text mod_perl in your browser window.

Note again that the use warnings; should be removed if you are running a version of Perl earlier than 5.6.

package ApacheAdmin::ExampleOne;
use strict;
use warnings;

sub handler {
    my $r = shift;
    $r->content_type('text/html'),
    $r->status(200);
    $r->send_http_header;

    print "mod_perl";
}

When Apache calls your handler it loads the module and calls the handler method, sending any output to the browser.

The argument passed in, which is referred to in this example as $r, is the Apache::Session object. It is traditionally represented as $r because it is usually a request or response object. Also, the Apache code itself always refers to the Apache session object with the variable r, you will see it represented this way in many examples as well as a lot of the documentation.

Also note that the line use warning; will not work if you are using a version of Perl earlier than 5.6. You are encouraged to upgraded to at least that version of Perl.

Installing the Example mod_perl Handler

mod_perl handlers are Perl modules, and are found by mod_perl in the same way that Perl finds modules. That is, it looks through directories listed in the special variable @INC, which is a list of directories where Perl can find modules.

So, to install your new mod_perl handler, you need to either put it in a directory listed in @INC, or put it's location into @INC. For the moment, we'll do the latter.

The name of our module is ApacheAdmin::ExampleOne. When Perl looks for this module, it will look for a file called ExampleOne.pm, in a directory called ApacheAdmin, a subdirectory of one of the directories listed in @INC.

Perl provides an easy way to add directories to @INC with the use lib pragma. The syntax of this function is as follows:

use lib '/path/to/directory';

Save the example Perl code in a file called ExampleOne.pm, and put it in a directory called ApacheAdmin, in some convenient location. For example, you might place this in your home directory, so that the full path to the file is /home/rbowen/ApacheAdmin/ExampleOne.pm.

Then, in the file that you have noted in your PerlRequire directive add the following directive:

use lib '/home/rbowen/';

This directive added the specified directory to @INC so that Perl knows to look there for Perl modules, and therefore can find your module. Now you're ready to configure mod_perl to run your handler.

Configuring the mod_perl Handler

Now that you have the handler installed so that Perl can find it, you need to tell Apache how to use it. As with other handlers, mod_perl handlers are configured via a section directive in the main server configuration file. Because these handlers are not related to file-system resources, they will be configured with a Location section. You just need to tell Apache what module to call when asked for the particular resource.

This configuration will look like the following:

<Location /apacheadmintest>
    SetHandler perl-script
    PerlHandler ApacheAdmin::ExampleOne
</Location>

If you want to call a function in your module that is called something other than handler, you will need to specify this explicitly in the configuration, as shown here:

<Location /apacheadmintest>
    SetHandler perl-script
    PerlModule ApacheAdmin::ExampleOne
    PerlHandler ApacheAdmin::ExampleOne::other_method
</Location>

This enables you to have several different locations fielded by several different functions within a single module.

An Example That Is a Little More Useful

The previous example did not actually do anything useful. The example that follows, although it does not do much more, contains a useful element that will be part of most handlers that you write—the capability to acquire the contents of a form that was sent to you. Rather than having to rely on a CGI library to get this content, you can request it directly from mod_perl, as shown here.

package ApacheAdmin::ExampleTwo;
use strict;
use warnings;

sub handler {
    my $r = shift;

    my %form = $r->method eq 'POST' ? $r->content : $r->args;

    $r->content_type('text/html'),
    $r->status(200);
    $r->send_http_header;

    foreach my $key ( keys %form ) {
        print $key . ' => ' . $form{ $key}  . '<br>';
    }
}

This rather inelegant example prints to the browser the contents of either any form posted to it or any arguments passed on the URL command line via the ?key=value&key=value syntax. See Chapter 15 for more information about the format of this syntax.

For further discussion of the code shown here, please see Appendix D, “mod_perl Example Code.”

Common Problems

There are a number of mistakes that every beginner mod_perl programmer makes at least once, and several of them are listed here. More of them can be found in Stas Bekman's tutorials, which you can find at http://perl.apache.org/.

Don't Exit!

Perl programs have a tendency to call exit when they are done. This indicates where the program ends, and nothing more is to be done. The exit function causes the Perl interpreter to stop immediately.

Remember, however, that the entire point of mod_perl is that the interpreter does not stop, ever. Consequently, calling exit is a very bad thing. The Perl interpreter exits, causing the particular mod_perl child to lose its mind. This has a number of results. The client does not get a complete document most of the time, because the Apache child never gets a message that the generated content is complete. And typically, that particular child becomes useless for future mod_perl-generated resources, often serving the same content (whatever it saw last) repeatedly, until the child is killed, or the Apache server is restarted.

So don't use exit in your handlers, for any reason.

Restart the Server

When you change a handler that you are working on, your changes will not be seen immediately because they are with CGI programs. Because mod_perl loads the modules into the Perl interpreter memory, and caches them there forever, your new version of the module will not be seen until the server is restarted.

Note that you usually will have to stop the Apache process, and start it again, rather than just doing a restart, to actually get all the cached modules reloaded.

You can overcome this with the directive PerlFreshRestart on in your main server configuration file. With PerlFreshRestart on all modules will get reloaded on a server restart. This will generate a lot of function was redefined warnings in your error log on server restart. This is normal, and should not unduly concern you.

Where Did You Get That Value?

If you find that some values are unexpected sometimes, you might be using global values carelessly. Remember that all global values are shared across all accesses to a given Apache child process.

This problem can be rather elusive if you are not very careful in testing. Because a normal Apache server will have many child processes at a time you need to make sure to test a condition sufficiently after a server restart. You should go through all the server children at least once, so that you are actually using a child that you had used before. It is a good idea to set MinSpareServers, MaxSpareServers, and StartServers very low for testing purposes so that you are dealing with a very small server pool and problems related to reusing child processes will show up more rapidly.

mod_perl on Windows

Running Apache with mod_perl on Windows is not recommended, but can be done. The documentation shipping with mod_perl states that “mod_perl is considered alpha under NT and Windows9x.” Development on mod_perl for Windows has been slow because of a lack of Windows expertise on the mod_perl team.

However, if you want to run mod_perl on Windows, it can be done in two ways: build it yourself from source, or install a binary distribution.

To build it yourself you will need to follow the detailed instructions in the file INSTALL.win32, which you will find in the top-level directory where you have unpacked your mod_perl distribution. Microsoft Visual C++ 5.0 or later is required for this.

To find a binary distribution, see http://perl.apache.org/distributions.html, which lists binary distributions of mod_perl, including those for Microsoft Windows.

Note that it is also possible to build mod_perl under Cygwin.

Where To Get More Information

If you need more information about mod_perl there are a number of sources that you should look at. I don't attempt to be comprehensive on the topic of mod_perl in this chapter, and if you are going to be doing a lot with it, you need to make sure that you familiarize yourself with some of these other resources.

First and foremost, http://perl.apache.org/ is the definitive source for information about mod_perl. It contains an enormous amount of documentation, including numerous examples, and is the first place that you should go in your quest for mod_perl understanding. In particular, you should read the mod_perl guide, by Stas Bekman, which you will find at http://perl.apache.org/guide/. Anything that you want to do is likely to be covered in there.

Bekman also has an excellent series of articles on the Apache Today Web site, starting at http://apachetoday.com/news_story.php3?ltsn=2000-12-07-002-06-NW-HW-PL, about mod_perl performance tuning.

Secondly, there are a few (very few) excellent books on the issue. The most useful book is Writing Apache Modules With Perl and C (“The eagle book”) by Doug MacEachern and Lincoln Stein. This book covers, as the title suggests, Apache module development, with a particular emphasis on mod_perl. And Andrew Ford has written the very excellent mod_perl Pocket Reference, which covers the basics of mod_perl usage, and is a great reference to have on your desk when you are actively working with mod_perl applications.

And, of course, there is the mailing list. You can get on the mailing list by sending an e-mail to . Make sure you peruse the archives before you start firing off questions because most of what you will encounter in your first few months with mod_perl are things that have been discussed before. You will find the archives at http://forum.swarthmore.edu/epigone/modperl/, as well as at a number of other places listed on the Web site.

Summary

mod_perl embeds a Perl interpreter in the Apache process, and solves most of the problems associated with using CGI for dynamic content generation. It demands a higher-quality of Perl code for CGI code. mod_perl enables you to write Apache handlers using only Perl.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset