IN THIS CHAPTER
Don't clarify things I already understand. It only confuses me. | ||
--Mary Cook |
Server-side includes (SSI) are directives written directly into HTML pages that the server parses when the page is served to the browser. Rather than pass the page directly to the requesting client, the server opens and reads through the document, looking for SSI directives. If it encounters one, it replaces it with whatever content is produced by that directive.
SSI would be the right choice if, for example, you have an existing HTML page that needs a small amount of dynamically generated text inserted. SSI changes things such as the day's date, or when the document was last updated, and you don't have to change the document every day, or remember to update a date every time you make a change to the document.
In this chapter, you will learn how to enable SSI on your server and how to use the various directives available to you. You'll learn how to use server-side includes to add a small amount of dynamic information to an otherwise static HTML page.
You can accomplish various things with SSI directives: External text files can be included, CGI programs can be called, and environment variables can be accessed. And a simple, flow-control (if/else) structure is even available in Apache Version 1.2, so you can display content based on simple conditions.
The SSI directives are defined in the mod_include
module, which is part of the standard batch of modules installed with Apache. Much of this functionality was already in the NCSA code when the Apache project began. Some of it, such as the flow-control portions, was added later.
The choice of when to use SSI and when to use CGI programs should be considered carefully, particularly for heavily loaded Web sites, because there are performance considerations either way. You might want to do some actual benchmark testing to see what your best approach is.
The decision whether to use SSI or CGI to accomplish a particular task isn't always clear-cut. Generally, you use CGI when the page is more dynamic rather than static and SSI if there's more static than dynamic.
The default Apache configuration files don't enable SSI for any files. There are a variety of reasons for this ranging from security to performance. There are a number or reasons for not using SSI, and these will be discussed later in this chapter. Make sure that you enable SSI only for those portions of your site for which it is actually necessary.
The following sections show three ways to enable a particular document to be parsed for SSI directives. Whichever option you choose you must also enable the Includes
option with the Options
directive:
Options Includes
This might be set in the main server configuration file or an .htaccess
file and can be configured for your whole server, a directory, or for a virtual host.
The most common way to enable SSI processing is to indicate that all files with a certain filename extension (typically .shtml
) are to be parsed by the server at the time they are served. This is done with the AddHandler
directive, as we discussed in Chapter 14, “Handlers and Filters.”
In the configuration file httpd.conf
you will find the following lines, if you are running Apache 1.3:
# To use server-parsed HTML files # #AddType text/html .shtml #AddHandler server-parsed .shtml
Or, if you are running Apache 2.0, you'll find lines that look more like this:
# To use server-parsed HTML files # #<FilesMatch ".shtml(..+)?$"> # SetOutputFilter INCLUDES #</FilesMatch>
To enable all .shtml
files for server-side parsing, simply uncomment those lines. For Apache 1.3, they should look like this:
# To use server-parsed HTML files AddType text/html .shtml AddHandler server-parsed .shtml
The AddType
directive tells the server that all files with the file extension .shtml
are to be served with a MIME
type of text/html
. The AddHandler
line tells the server to enable the handler server-parsed
for those same files. The server-parsed
handler is also provided by mod_include
module, and tells the server to parse these files for SSI directives.
For Apache 2.0, you should have
# To use server-parsed HTML files # <FilesMatch ".shtml(..+)?$"> SetOutputFilter INCLUDES </FilesMatch>
Note the directive that comes with Apache 2.0 makes an allowance for files with multiple extensions. By using the FilesMatch
directive, this directive not only applies to files with a .shtml
extension, but also to those with, for example, a .shtml.en
extension.
There are two reasons not to use this approach of enabling SSI. First, you might need to change the name of all your files. Secondly, it's generally considered a bad idea to expose the mechanism that is working behind the scenes.
If you want to add SSI capability to an existing site, you would have to change the names of all files to which you wanted to add SSI directives and, consequently, change all links in other pages that referred to these pages. This is clearly a huge hassle. Additionally, you don't necessarily control all the pages that have links to your Web site, because they might be on other sites.
Some folks have addressed this hassle by simply SSI-enabling all files with the extension .html
, in addition to .shtml
files. This isn't recommended, but would be accomplished with the additional directive:
AddHandler server-parsed .html
The reason this is not a recommended solution is because this means that every HTML file served by a server in this configuration would have to be parsed for SSI directives. This slows down the process of serving content greatly because rather than just sending the file to the client Apache now has to examine every line of that file on its way out.
A second reason not to enable SSI parsing on files by extension is one of philosophy rather than one of technology. In building a Web site, you should think of your user. One aspect of this is making URLs “guessable.” If users are looking for some specific information on your site, they should be able to guess at a URL and get to the information they're looking for. If you have .shtml
filenames (or something equally nonintuitive, such as .asp
), it makes it less likely that users will correctly guess a URL containing the information they came for.
More importantly, exposing the mechanism by way of the filename—that is, using .asp
or .jsp
filenames, for example, locks you into that technology. If, at some later date you want to change from using ASP to using PHP you would need to once again change the names of all your files and break any links and bookmarks to your site. By using names with no particular mechanism associated with them, there is no need to make this kind of change later.
Fortunately, the XVitHack
directive offers an alternative to these problems.
Although the name XBitHack
seems to imply that this is a hack, and thus somewhat less desirable than other techniques, this is a widespread method for enabling SSI in files. The XBitHack
directive enables server-side parsing for all documents on which the user-execute bit is set.
This feature is not available for Windows, because Windows NT doesn't have the concept of marking a file executable.
The XBitHack
directive can appear in the server configuration file (httpd.conf
) or an .htaccess
file, and can be configured for the entire server, a directory, or a virtual host. The directive can be given one of three possible values:
on
—. All files with the user-execute
bit set are parsed for server-side includes, regardless of file extension.
off
(Default)—Executable files aren't treated specially. Use this to turn off the directive for a subdirectory where it's undesirable. Remember that directives specified for a directory also apply to all subdirectories.
full
—. The same as on
, except that the group-execute
bit is also checked. If it's set, the Last-modified
date is set to the last-modified time stamp on the file itself. If the group-execute
bit isn't set, no Last-modified
date is sent to the client, which allows the page to be cached on the client end or by a proxy server.
For Example:
XBitHack on
Using XBitHack
has two main advantages:
You don't need to rename a file and change all links to it simply because you want to add a little dynamic content.
Users looking at your Web content can't tell by looking at the filename that you are generating a page dynamically, so your wizardry is just that tiny bit more impressive. More importantly, the filename is easy to guess at, so a user can jump directly to the portion of your site that they are interested in.
SSI directives look rather like HTML-comment tags. This is nice if you happen to have SSI directives in a page, but have SSI parsing turned off because these directives then don't display in the browser.
The syntax of SSI directives is the following:
<!—#element attribute=value attribute=value ... —>
The element can be any one of config
, echo
, exec
, fsize
, flastmod
, include
, printenv
, set
, if
, elif
, else
, or endif
.
The config
element enables you to set various configuration options regarding how the document parsing is handled. Because the page is parsed from top to bottom, config
directives should appear at the top of the HTML document, or at least before they are referred to. You can change a configuration option several times in a page, and it will apply to the portion of the following page, until the next time it is changed.
There are three configuration variables that can be modified with the config
element.
config errmsg
sets the error message that is returned to the client if something goes wrong while parsing the document. This is usually [an error occurred while processing this directive]
, but can be set to anything with this directive. For example, you can place the following in your HTML document:
<!—#config errmsg="[It's broken]" —> <!—#directive ssi="Invalid command" —>
Because the second directive is not valid the error configured in the config
directive will be displayed in the location where the output from the directive should have been put, if it were a valid directive.
config sizefmt
sets the format used to display file sizes. You can set the value to bytes
to display the exact file size in bytes, or abbrev
to display the size in kilobytes or megabytes. In the first of the two following examples, file sizes will be displayed as the exact number of bytes in the file, whereas in the second example, it will be rounded off to the nearest kilobyte or megabyte.
<!—#config sizefmt="bytes" —> <!—#config sizefmt="abbrev" —>
See the fsize
element for further examples of what this does.
config timefmt
sets the format used to display times and dates. The format of the value is the same as is used in the strftime
function used by C (and Perl) to display dates, as detailed in Table 16.1:
Table 16.1. Date Formats
Template | Meaning | Range |
---|---|---|
%A | Weekday name | 'Sunday'–'Saturday' |
%a | Abbreviated weekday name | 'Sun'–'Sat' |
%d | day of the month (leading zero) | 01–31 |
%e | day of the month (leading space) | ` '1'..–`'31' |
%B | month name | 'January'–'December' |
%b | Abbreviated month name | 'Jan'–'Dec' |
%m | month as a decimal number | 01–12 |
%Y | year with century | 1970–2038 |
%C | Century number | 00–99 |
%y | year without century | 00–99 |
Table 16.2. Time Formats
Template | Meaning | Range |
---|---|---|
%H | Hour (24-hour clock) | 00–23 |
%I | Hour (12-hour clock) | 01–12 |
%M | Minute | 00–59 |
%S | Second | 00–61 |
%Z | Time zone name | “EST”, “EDT”, “GMT”, and so on. |
%p | locale's equivalent of either | 'AM' or 'PM' |
Table 16.3. Shortcut Date and Time Formats
Template | Meaning | Range |
---|---|---|
%r | The time in AM/PM notation | %I:%M:%S %p |
%R | The time in 24-hour notation | %H:%M |
%T | The time with seconds in 24-hour notation | %H:%M:%S |
%D | the date | %m/%d/%y |
Table 16.4. Locale-Dependent Representations
Template | Meaning |
---|---|
%x | locale's appropriate date representation |
%X | locale's appropriate time representation |
%c | locale's appropriate date and time representation |
The locale is the combination of such things as the language, country, timezone, and other things relating to the location of the server which affect, among other things, how date and time information are displayed. Other locale-dependent things are language, character set, and currency, for example.
Table 16.5. Other
Template | Meaning | Range |
---|---|---|
%j | day of the year | (001–366) |
%w | weekday as a decimal number | 0–6, where 0=Sun,6=Sat |
%u | weekday as a decimal number | 1–7, where 1=Mon,7=Sun |
%U | Week number | counting with the first Sunday as the first day of the first week |
%V | Week number | counting with the first Monday as the first day of the first week |
%t | the tab character | |
%n | the newline character | |
%% | the percent symbol (%) character |
For example, you can place the following text directly into your HTML document:
<!—#config timefmt="%B %e, %Y" —>
See the following flastmod
element for an example of this in action.
The echo
element will display the value of any variable. The variable can be any one of the variables displayed in Table 16.6, any environment variable, or variables that you define yourself with the set
element, which we will see shortly. Times are displayed in the time format specified by timefmt
, and file sizes are displayed in the format specified by sizefmt
. The variable to be displayed is indicated with the var
attribute.
Table 16.6. Built-In Variables
Variable | Definition |
---|---|
DATE_GMT
| The current date in Greenwich Mean Time. |
DATE_LOCAL
| The current date in the local time zone. |
DOCUMENT_NAME
| The filename (excluding directories) of the document. |
DOCUMENT_URI
| The (%-decoded) URL path of the document. |
LAST_MODIFIED
| The date and time on which this file was last modified. |
For example:
<!—#config timefmt="%B %e, %Y" —> Today's date is <!—#echo var="DATE_LOCAL" —>.
The exec
element executes a shell command or a CGI program depending on the parameters provided. Valid attributes are cgi
and cmd
.
cgi
specifies the URL of a CGI program to be executed:
<!—#exec cgi="/cgi-bin/unread_articles.pl" —>
The URL needs to be a local CGI, not one located on another machine. The CGI program is passed the QUERY_STRING
and PATH_INFO
that were originally passed to the requested document (see Chapter 15, “CGI Programs,” for an explanation of these terms) so the URL specified can't contain this information. It is recommended that you use the include virtual
syntax, rather than using exec cgi
.
cmd
specifies a shell command to be executed. The results will be displayed on the HTML page. Example:
<!—#exec cmd="/usr/bin/ls -la /tmp" —>
In your configuration files (or in .htaccess
) you can specify Options IncludesNOEXEC
to disallow the exec
directive because this is the most insecure of the SSI directives. Be especially cautious when Web users can create content (such as in a guest book or discussion forum) and these options are enabled! Users could potentially include SSI directives containing arbitrary commands that would be executed the next time the page was loaded.
The fsize
element displays the size of a file, which is specified by either the file
or virtual
attribute. Size is displayed as specified with config sizefmt
.
Using the file
attribute specifies the file system path to a file, either relative to the root if the value starts with /
, or relative to the current directory if not.
Using the virtual
attribute specifies the relative URL path to a file. That is, it specifies the file path relative to the document root, if the value starts with /
, or relative to the current directory if not.
For example:
<!—#config sizefmt="bytes" —> /etc/passwd is <!—#fsize file="/etc/passwd" —> bytes.
The flastmod
element displays the last modified date of a file. The desired file is specified in the same manner as with the fsize
directive. That is, you can specify the location of the file with either the file
or virtual
attribute. See the explanations of these attributes in the details of the fsize
element.
In the following example, the directive shown will display the time and date when I last received e-mail:
<!—#config timefmt="%r" —> You last received email at <!—#flastmod file="/var/spool/mail/rbowen" —>.
On Unix systems, the /var/spool/mail
directory contains the mail files for each user.
Although this can be used for any file on the system, it is most frequently used to display the date the particular document you are looking at was last modified. When used this way, it is equivalent to using the following:
File was last modified <!—#echo var="LAST_MODIFIED" —>
The include
element includes the contents of the specified file or URL into the HTML document. The file is specified with the file
and virtual
attributes, as described with fsize
and flastmod
. If the URI specified by the virtual
attribute is a CGI program, and IncludesNOEXEC
isn't set, the program will be executed and the results displayed. This is the preferred method of including the results of a CGI program, rather than using exec cgi
, because you can pass a QUERY_STRING
argument to the CGI program, for example.
<!—#include file="/etc/aliases" —> <!—#include virtual="/cgi-bin/login.cgi?user=bob" —> <!—#include virtual="/themes/header.html" —>
The printenv
element is primarily useful for testing. It displays all defined environment variables.
<pre> <!—#printenv —> </pre>
The directive should be enclosed on HTML preformat tags because the output is in plain text, not in HTML.
Listing 16.1 is the output when the previous directive was put in an HTML page on my server.
Example 16.1. Output from the printenv
Directive
DOCUMENT_ROOT=/usr/local/apache/htdocs HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* HTTP_ACCEPT_CHARSET=iso-8859-1,*,utf-8 HTTP_ACCEPT_ENCODING=gzip HTTP_ACCEPT_LANGUAGE=en HTTP_CONNECTION=Keep-Alive HTTP_HOST=rhiannon.rcbowen.com HTTP_IF_MODIFIED_SINCE=Sun, 30 Sep 2001 01:23:49 GMT; length=1190 HTTP_PRAGMA=no-cache HTTP_USER_AGENT=Mozilla/4.72 [en] (X11; U; Linux 2.4.4 i686) PATH=/usr/local/sbin:/usr/local/bin:/sbin:/usr/sbin:/bin:/usr/bin REMOTE_ADDR=127.0.0.1 REMOTE_PORT=39082 SCRIPT_FILENAME=/usr/local/apache/htdocs/testing.html SERVER_ADDR=127.0.0.1 [email protected] SERVER_NAME=localhost SERVER_PORT=80 SERVER_SIGNATURE=<ADDRESS>Apache/1.3.20 Server at localhost Port 80</ADDRESS> SERVER_SOFTWARE=Apache/1.3.20 (Unix) mod_perl/1.26 GATEWAY_INTERFACE=CGI/1.1 SERVER_PROTOCOL=HTTP/1.0 REQUEST_METHOD=GET QUERY_STRING= REQUEST_URI=/testing.html SCRIPT_NAME=/testing.html DATE_LOCAL=Saturday, 29-Sep-2001 21:38:44 EDT DATE_GMT=Sunday, 30-Sep-2001 01:38:44 GMT LAST_MODIFIED=Saturday, 29-Sep-2001 21:37:17 EDT DOCUMENT_URI=/testing.html DOCUMENT_PATH_INFO= USER_NAME=nobody DOCUMENT_NAME=testing.html
The directives described so far enable you to display existing values. Although this is very useful, sometimes you want to define your own variables and do some limited scripting on an HTML page. Various other products offer server-side scripting embedded in HTML pages, and this shouldn't be thought of as rivaling those because it's very limited. However, it does enable you to do some simple functions without resorting to a third-party product.
The two aspects to this programming are variables and conditional statements. Variables are provided with the set
directive and conditionals with an if/else flow control statement.
The set
directive sets the value of a variable. Attributes are var
and value
. For example:
<!—#set var="animal" value="cow" —>
This example defines a variable called animal
, and gives it the value of "cow"
.
When referenced in other SSI directives, the variable will be distinguished from plain text with the $
character. In this case, $animal
can be used in place of any text in any SSI directive.
Within an echo directive, the var
value is understood to be a variable, and the $
isn't required.
In a larger string, where the variable might run up against other text, curly brackets ({ } ) are used to delimit the variable from the rest of the string:
<!—#set var="basepath" value="/home/rbowen/public_html" —> Basepath = <!—#echo var="basepath" —><br> index.html was last modified <!—#flastmod file="${ basepath} /index.html" —><br> <!—#config sizefmt="bytes" —> test.html is <!—#fsize file="${ basepath} /test.html" —> bytes<p>
Variables can be used, as in the preceding example, to define a string that will be used later in several other directives. This is useful for one-location configuration changes; it also saves you a lot of unnecessary typing.
By using the variables set with the set
directive along with the various environment and include variables, you can use a limited flow-control syntax to generate a certain amount of dynamic content on server-parsed pages.
Conditional flow-control is implemented with the directives if
, elif
, else
and endif
.
The syntax of these conditional functions is as follows:
<!—#if expr="test_condition" —> <!—#elif expr="test_condition" —> <!—#else —> <!—#endif —>
The test condition can be a string, which is considered true if non-empty, or various comparisons of two strings. Available comparison operators are =
, !=
, <
, <=
, >
, and >=
. If the second string has the format /
string/
, the strings are compared with regular expressions. Multiple comparisons can be strung together with && (AND)
and || (OR)
. Any text appearing between the if/elif/else
directives will be displayed on the resulting page. An example of such a flow structure follows:
<!—#set var="agent" value="$HTTP_USER_AGENT" —> <!—#if expr="$agent = /Mozilla/" —> Mozilla! <!—#elif expr="$agent= /MSIE/" —> Internet Explorer <!—#else —> Something else! <!—#endif —>
This code will display Mozilla!
if you are using a browser that passes Mozilla as part of its USER_AGENT
string, and Something else!
, otherwise.
The security considerations involved in using server-side includes have been mentioned throughout the chapter, and are just summarized here.
Whenever possible, use the IncludesNoEXEC
argument to Options
, rather than using Includes
, so that arbitrary commands cannot be executed from within Web pages.
Make sure that Includes
is not turned on if there is any chance that Web users might be able to create content that is part of an HTML document, such as with a guest-book application, or a discussion forum of some variety. This could potentially enable them to execute arbitrary commands on the server.
If you have AllowOverride Options
turned on you should be aware that the user can then put Options +Includes
in their .htacess
. Before you turn on any of the AllowOverride
options you should consider all the various ways in which that freedom might be used.
Server-side includes were extremely popular in the early days of the World Wide Web for things such as hit counters and cute little messages that told you what time it was and where you were visiting from. Fortunately, the appeal has worn off, although you still see them on some beginner sites. However, SSI can still be used for some genuinely useful things, particularly now that the if/elsif/else
flow-control directives are available. They provide for dynamic content that can be calculated at runtime without having to fork off an entirely new CGI process.
This chapter covered configuring your server to permit SSI and went through the available SSI directives and their uses.
There's a good article about SSI on the Apache Week Web site at http://www.apacheweek.com/features/ssi, which covers most of the same material but offers different examples.