Chapter 16. Server-Side Includes

 

Don't clarify things I already understand. It only confuses me.

 
 --Mary Cook

Server-side includes (SSI) are directives written directly into HTML pages that the server parses when the page is served to the browser. Rather than pass the page directly to the requesting client, the server opens and reads through the document, looking for SSI directives. If it encounters one, it replaces it with whatever content is produced by that directive.

SSI would be the right choice if, for example, you have an existing HTML page that needs a small amount of dynamically generated text inserted. SSI changes things such as the day's date, or when the document was last updated, and you don't have to change the document every day, or remember to update a date every time you make a change to the document.

In this chapter, you will learn how to enable SSI on your server and how to use the various directives available to you. You'll learn how to use server-side includes to add a small amount of dynamic information to an otherwise static HTML page.

You can accomplish various things with SSI directives: External text files can be included, CGI programs can be called, and environment variables can be accessed. And a simple, flow-control (if/else) structure is even available in Apache Version 1.2, so you can display content based on simple conditions.

The SSI directives are defined in the mod_include module, which is part of the standard batch of modules installed with Apache. Much of this functionality was already in the NCSA code when the Apache project began. Some of it, such as the flow-control portions, was added later.

The choice of when to use SSI and when to use CGI programs should be considered carefully, particularly for heavily loaded Web sites, because there are performance considerations either way. You might want to do some actual benchmark testing to see what your best approach is.

The decision whether to use SSI or CGI to accomplish a particular task isn't always clear-cut. Generally, you use CGI when the page is more dynamic rather than static and SSI if there's more static than dynamic.

Configuration for SSI

The default Apache configuration files don't enable SSI for any files. There are a variety of reasons for this ranging from security to performance. There are a number or reasons for not using SSI, and these will be discussed later in this chapter. Make sure that you enable SSI only for those portions of your site for which it is actually necessary.

The following sections show three ways to enable a particular document to be parsed for SSI directives. Whichever option you choose you must also enable the Includes option with the Options directive:

Options Includes

This might be set in the main server configuration file or an .htaccess file and can be configured for your whole server, a directory, or for a virtual host.

Enabling SSI by File Extension

The most common way to enable SSI processing is to indicate that all files with a certain filename extension (typically .shtml) are to be parsed by the server at the time they are served. This is done with the AddHandler directive, as we discussed in Chapter 14, “Handlers and Filters.”

In the configuration file httpd.conf you will find the following lines, if you are running Apache 1.3:

# To use server-parsed HTML files
#
#AddType text/html .shtml
#AddHandler server-parsed .shtml

Or, if you are running Apache 2.0, you'll find lines that look more like this:

# To use server-parsed HTML files
#
#<FilesMatch ".shtml(..+)?$">
#    SetOutputFilter INCLUDES
#</FilesMatch>

To enable all .shtml files for server-side parsing, simply uncomment those lines. For Apache 1.3, they should look like this:

# To use server-parsed HTML files
AddType text/html .shtml
AddHandler server-parsed .shtml

The AddType directive tells the server that all files with the file extension .shtml are to be served with a MIME type of text/html. The AddHandler line tells the server to enable the handler server-parsed for those same files. The server-parsed handler is also provided by mod_include module, and tells the server to parse these files for SSI directives.

For Apache 2.0, you should have

# To use server-parsed HTML files
#
<FilesMatch ".shtml(..+)?$">
    SetOutputFilter INCLUDES
</FilesMatch>

Note the directive that comes with Apache 2.0 makes an allowance for files with multiple extensions. By using the FilesMatch directive, this directive not only applies to files with a .shtml extension, but also to those with, for example, a .shtml.en extension.

There are two reasons not to use this approach of enabling SSI. First, you might need to change the name of all your files. Secondly, it's generally considered a bad idea to expose the mechanism that is working behind the scenes.

Changing Filenames

If you want to add SSI capability to an existing site, you would have to change the names of all files to which you wanted to add SSI directives and, consequently, change all links in other pages that referred to these pages. This is clearly a huge hassle. Additionally, you don't necessarily control all the pages that have links to your Web site, because they might be on other sites.

Some folks have addressed this hassle by simply SSI-enabling all files with the extension .html, in addition to .shtml files. This isn't recommended, but would be accomplished with the additional directive:

AddHandler server-parsed .html

The reason this is not a recommended solution is because this means that every HTML file served by a server in this configuration would have to be parsed for SSI directives. This slows down the process of serving content greatly because rather than just sending the file to the client Apache now has to examine every line of that file on its way out.

Don't Expose the Mechanism

A second reason not to enable SSI parsing on files by extension is one of philosophy rather than one of technology. In building a Web site, you should think of your user. One aspect of this is making URLs “guessable.” If users are looking for some specific information on your site, they should be able to guess at a URL and get to the information they're looking for. If you have .shtml filenames (or something equally nonintuitive, such as .asp), it makes it less likely that users will correctly guess a URL containing the information they came for.

More importantly, exposing the mechanism by way of the filename—that is, using .asp or .jsp filenames, for example, locks you into that technology. If, at some later date you want to change from using ASP to using PHP you would need to once again change the names of all your files and break any links and bookmarks to your site. By using names with no particular mechanism associated with them, there is no need to make this kind of change later.

Using the XBitHack Directive

Fortunately, the XVitHack directive offers an alternative to these problems.

Although the name XBitHack seems to imply that this is a hack, and thus somewhat less desirable than other techniques, this is a widespread method for enabling SSI in files. The XBitHack directive enables server-side parsing for all documents on which the user-execute bit is set.

This feature is not available for Windows, because Windows NT doesn't have the concept of marking a file executable.

The XBitHack directive can appear in the server configuration file (httpd.conf) or an .htaccess file, and can be configured for the entire server, a directory, or a virtual host. The directive can be given one of three possible values:

  • on—. All files with the user-execute bit set are parsed for server-side includes, regardless of file extension.

  • off (Default)—Executable files aren't treated specially. Use this to turn off the directive for a subdirectory where it's undesirable. Remember that directives specified for a directory also apply to all subdirectories.

  • full—. The same as on, except that the group-execute bit is also checked. If it's set, the Last-modified date is set to the last-modified time stamp on the file itself. If the group-execute bit isn't set, no Last-modified date is sent to the client, which allows the page to be cached on the client end or by a proxy server.

For Example:

XBitHack on

Using XBitHack has two main advantages:

  • You don't need to rename a file and change all links to it simply because you want to add a little dynamic content.

  • Users looking at your Web content can't tell by looking at the filename that you are generating a page dynamically, so your wizardry is just that tiny bit more impressive. More importantly, the filename is easy to guess at, so a user can jump directly to the portion of your site that they are interested in.

Using SSI Directives

SSI directives look rather like HTML-comment tags. This is nice if you happen to have SSI directives in a page, but have SSI parsing turned off because these directives then don't display in the browser.

The syntax of SSI directives is the following:

<!—#element attribute=value attribute=value ... —>

The element can be any one of config, echo, exec, fsize, flastmod, include, printenv, set, if, elif, else, or endif.

config

The config element enables you to set various configuration options regarding how the document parsing is handled. Because the page is parsed from top to bottom, config directives should appear at the top of the HTML document, or at least before they are referred to. You can change a configuration option several times in a page, and it will apply to the portion of the following page, until the next time it is changed.

There are three configuration variables that can be modified with the config element.

errmsg

config errmsg sets the error message that is returned to the client if something goes wrong while parsing the document. This is usually [an error occurred while processing this directive], but can be set to anything with this directive. For example, you can place the following in your HTML document:

<!—#config errmsg="[It's broken]" —>
<!—#directive ssi="Invalid command" —>

Because the second directive is not valid the error configured in the config directive will be displayed in the location where the output from the directive should have been put, if it were a valid directive.

sizefmt

config sizefmt sets the format used to display file sizes. You can set the value to bytes to display the exact file size in bytes, or abbrev to display the size in kilobytes or megabytes. In the first of the two following examples, file sizes will be displayed as the exact number of bytes in the file, whereas in the second example, it will be rounded off to the nearest kilobyte or megabyte.

<!—#config sizefmt="bytes" —>
<!—#config sizefmt="abbrev" —>

See the fsize element for further examples of what this does.

timefmt

config timefmt sets the format used to display times and dates. The format of the value is the same as is used in the strftime function used by C (and Perl) to display dates, as detailed in Table 16.1:

Table 16.1. Date Formats

Template Meaning Range
%A Weekday name 'Sunday'–'Saturday'
%a Abbreviated weekday name 'Sun'–'Sat'
%d day of the month (leading zero) 01–31
%e day of the month (leading space) ` '1'..–`'31'
%B month name 'January'–'December'
%b Abbreviated month name 'Jan'–'Dec'
%m month as a decimal number 01–12
%Y year with century 1970–2038
%C Century number 00–99
%y year without century 00–99

Table 16.2. Time Formats

Template Meaning Range
%H Hour (24-hour clock) 00–23
%I Hour (12-hour clock) 01–12
%M Minute 00–59
%S Second 00–61
%Z Time zone name “EST”, “EDT”, “GMT”, and so on.
%p locale's equivalent of either 'AM' or 'PM'

Table 16.3. Shortcut Date and Time Formats

Template Meaning Range
%r The time in AM/PM notation %I:%M:%S %p
%R The time in 24-hour notation %H:%M
%T The time with seconds in 24-hour notation %H:%M:%S
%D the date %m/%d/%y

Table 16.4. Locale-Dependent Representations

Template Meaning
%x locale's appropriate date representation
%X locale's appropriate time representation
%c locale's appropriate date and time representation

Note

The locale is the combination of such things as the language, country, timezone, and other things relating to the location of the server which affect, among other things, how date and time information are displayed. Other locale-dependent things are language, character set, and currency, for example.

Table 16.5. Other

Template Meaning Range
%j day of the year (001–366)
%w weekday as a decimal number 0–6, where 0=Sun,6=Sat
%u weekday as a decimal number 1–7, where 1=Mon,7=Sun
%U Week number counting with the first Sunday as the first day of the first week
%V Week number counting with the first Monday as the first day of the first week
%t the tab character  
%n the newline character  
%% the percent symbol (%) character  

For example, you can place the following text directly into your HTML document:

<!—#config timefmt="%B %e, %Y" —>

See the following flastmod element for an example of this in action.

echo

The echo element will display the value of any variable. The variable can be any one of the variables displayed in Table 16.6, any environment variable, or variables that you define yourself with the set element, which we will see shortly. Times are displayed in the time format specified by timefmt, and file sizes are displayed in the format specified by sizefmt. The variable to be displayed is indicated with the var attribute.

Table 16.6. Built-In Variables

Variable Definition
DATE_GMT The current date in Greenwich Mean Time.
DATE_LOCAL The current date in the local time zone.
DOCUMENT_NAME The filename (excluding directories) of the document.
DOCUMENT_URI The (%-decoded) URL path of the document.
LAST_MODIFIED The date and time on which this file was last modified.

For example:

<!—#config timefmt="%B %e, %Y" —>

Today's date is <!—#echo var="DATE_LOCAL" —>.

exec

The exec element executes a shell command or a CGI program depending on the parameters provided. Valid attributes are cgi and cmd.

cgi specifies the URL of a CGI program to be executed:

<!—#exec cgi="/cgi-bin/unread_articles.pl" —>

The URL needs to be a local CGI, not one located on another machine. The CGI program is passed the QUERY_STRING and PATH_INFO that were originally passed to the requested document (see Chapter 15, “CGI Programs,” for an explanation of these terms) so the URL specified can't contain this information. It is recommended that you use the include virtual syntax, rather than using exec cgi.

cmd specifies a shell command to be executed. The results will be displayed on the HTML page. Example:

<!—#exec cmd="/usr/bin/ls -la /tmp" —>

In your configuration files (or in .htaccess) you can specify Options IncludesNOEXEC to disallow the exec directive because this is the most insecure of the SSI directives. Be especially cautious when Web users can create content (such as in a guest book or discussion forum) and these options are enabled! Users could potentially include SSI directives containing arbitrary commands that would be executed the next time the page was loaded.

fsize

The fsize element displays the size of a file, which is specified by either the file or virtual attribute. Size is displayed as specified with config sizefmt.

Using the file attribute specifies the file system path to a file, either relative to the root if the value starts with /, or relative to the current directory if not.

Using the virtual attribute specifies the relative URL path to a file. That is, it specifies the file path relative to the document root, if the value starts with /, or relative to the current directory if not.

For example:

<!—#config sizefmt="bytes" —>
/etc/passwd is <!—#fsize file="/etc/passwd" —> bytes.

flastmod

The flastmod element displays the last modified date of a file. The desired file is specified in the same manner as with the fsize directive. That is, you can specify the location of the file with either the file or virtual attribute. See the explanations of these attributes in the details of the fsize element.

In the following example, the directive shown will display the time and date when I last received e-mail:

<!—#config timefmt="%r" —>
You last received email at
<!—#flastmod file="/var/spool/mail/rbowen" —>.

Note

On Unix systems, the /var/spool/mail directory contains the mail files for each user.

Although this can be used for any file on the system, it is most frequently used to display the date the particular document you are looking at was last modified. When used this way, it is equivalent to using the following:

File was last modified <!—#echo var="LAST_MODIFIED" —>

include

The include element includes the contents of the specified file or URL into the HTML document. The file is specified with the file and virtual attributes, as described with fsize and flastmod. If the URI specified by the virtual attribute is a CGI program, and IncludesNOEXEC isn't set, the program will be executed and the results displayed. This is the preferred method of including the results of a CGI program, rather than using exec cgi, because you can pass a QUERY_STRING argument to the CGI program, for example.

<!—#include file="/etc/aliases" —>

<!—#include virtual="/cgi-bin/login.cgi?user=bob" —>
<!—#include virtual="/themes/header.html" —>

printenv

The printenv element is primarily useful for testing. It displays all defined environment variables.

<pre>
<!—#printenv —>
</pre>

The directive should be enclosed on HTML preformat tags because the output is in plain text, not in HTML.

Listing 16.1 is the output when the previous directive was put in an HTML page on my server.

Example 16.1. Output from the printenv Directive

DOCUMENT_ROOT=/usr/local/apache/htdocs
HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
HTTP_ACCEPT_CHARSET=iso-8859-1,*,utf-8
HTTP_ACCEPT_ENCODING=gzip
HTTP_ACCEPT_LANGUAGE=en
HTTP_CONNECTION=Keep-Alive
HTTP_HOST=rhiannon.rcbowen.com
HTTP_IF_MODIFIED_SINCE=Sun, 30 Sep 2001 01:23:49 GMT; length=1190
HTTP_PRAGMA=no-cache
HTTP_USER_AGENT=Mozilla/4.72 [en] (X11; U; Linux 2.4.4 i686)
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/usr/sbin:/bin:/usr/bin
REMOTE_ADDR=127.0.0.1
REMOTE_PORT=39082
SCRIPT_FILENAME=/usr/local/apache/htdocs/testing.html
SERVER_ADDR=127.0.0.1
[email protected]
SERVER_NAME=localhost
SERVER_PORT=80
SERVER_SIGNATURE=<ADDRESS>Apache/1.3.20 Server at localhost Port 80</ADDRESS>

SERVER_SOFTWARE=Apache/1.3.20 (Unix) mod_perl/1.26
GATEWAY_INTERFACE=CGI/1.1
SERVER_PROTOCOL=HTTP/1.0
REQUEST_METHOD=GET
QUERY_STRING=
REQUEST_URI=/testing.html
SCRIPT_NAME=/testing.html
DATE_LOCAL=Saturday, 29-Sep-2001 21:38:44 EDT
DATE_GMT=Sunday, 30-Sep-2001 01:38:44 GMT
LAST_MODIFIED=Saturday, 29-Sep-2001 21:37:17 EDT
DOCUMENT_URI=/testing.html
DOCUMENT_PATH_INFO=
USER_NAME=nobody
DOCUMENT_NAME=testing.html

Variables and Flow Control with SSI

The directives described so far enable you to display existing values. Although this is very useful, sometimes you want to define your own variables and do some limited scripting on an HTML page. Various other products offer server-side scripting embedded in HTML pages, and this shouldn't be thought of as rivaling those because it's very limited. However, it does enable you to do some simple functions without resorting to a third-party product.

The two aspects to this programming are variables and conditional statements. Variables are provided with the set directive and conditionals with an if/else flow control statement.

The set directive sets the value of a variable. Attributes are var and value. For example:

<!—#set var="animal" value="cow" —>

This example defines a variable called animal, and gives it the value of "cow".

When referenced in other SSI directives, the variable will be distinguished from plain text with the $ character. In this case, $animal can be used in place of any text in any SSI directive.

Within an echo directive, the var value is understood to be a variable, and the $ isn't required.

In a larger string, where the variable might run up against other text, curly brackets ({ } ) are used to delimit the variable from the rest of the string:

<!—#set var="basepath" value="/home/rbowen/public_html" —>

Basepath = <!—#echo var="basepath" —><br>

index.html was last modified
<!—#flastmod file="${ basepath} /index.html" —><br>
<!—#config sizefmt="bytes" —>
test.html is <!—#fsize file="${ basepath} /test.html" —> bytes<p>

Variables can be used, as in the preceding example, to define a string that will be used later in several other directives. This is useful for one-location configuration changes; it also saves you a lot of unnecessary typing.

By using the variables set with the set directive along with the various environment and include variables, you can use a limited flow-control syntax to generate a certain amount of dynamic content on server-parsed pages.

Conditional flow-control is implemented with the directives if, elif, else and endif.

The syntax of these conditional functions is as follows:

<!—#if expr="test_condition" —>

<!—#elif expr="test_condition" —>

<!—#else —>

<!—#endif —>

The test condition can be a string, which is considered true if non-empty, or various comparisons of two strings. Available comparison operators are =, !=, <, <=, >, and >=. If the second string has the format /string/, the strings are compared with regular expressions. Multiple comparisons can be strung together with && (AND) and || (OR). Any text appearing between the if/elif/else directives will be displayed on the resulting page. An example of such a flow structure follows:

<!—#set var="agent" value="$HTTP_USER_AGENT" —>

<!—#if expr="$agent = /Mozilla/" —>

Mozilla!

<!—#elif expr="$agent= /MSIE/" —>

Internet Explorer

<!—#else —>
Something else!
<!—#endif —>

This code will display Mozilla! if you are using a browser that passes Mozilla as part of its USER_AGENT string, and Something else!, otherwise.

Security Considerations

The security considerations involved in using server-side includes have been mentioned throughout the chapter, and are just summarized here.

Whenever possible, use the IncludesNoEXEC argument to Options, rather than using Includes, so that arbitrary commands cannot be executed from within Web pages.

Make sure that Includes is not turned on if there is any chance that Web users might be able to create content that is part of an HTML document, such as with a guest-book application, or a discussion forum of some variety. This could potentially enable them to execute arbitrary commands on the server.

If you have AllowOverride Options turned on you should be aware that the user can then put Options +Includes in their .htacess. Before you turn on any of the AllowOverride options you should consider all the various ways in which that freedom might be used.

Summary

Server-side includes were extremely popular in the early days of the World Wide Web for things such as hit counters and cute little messages that told you what time it was and where you were visiting from. Fortunately, the appeal has worn off, although you still see them on some beginner sites. However, SSI can still be used for some genuinely useful things, particularly now that the if/elsif/else flow-control directives are available. They provide for dynamic content that can be calculated at runtime without having to fork off an entirely new CGI process.

This chapter covered configuring your server to permit SSI and went through the available SSI directives and their uses.

There's a good article about SSI on the Apache Week Web site at http://www.apacheweek.com/features/ssi, which covers most of the same material but offers different examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset