IN THIS CHAPTER
“Whoever we are, we will become what is said about us.” | ||
--David Williamson |
MIME, which stands for Multipart Internet Mail Extensions, is defined in RFC's 2045–2049, and was initially developed to allow e-mail messages to contain non-ASCII information.
In this chapter, we'll talk about the use of MIME in HTTP, and the available configuration directives that Apache provides to use MIME.
HTTP is heavily dependent on MIME. Each file that is sent to a client browser is prefaced by a MIME header, which tells the browser what sort of document it is receiving. In the absence of a MIME header, the browser would know only that it was receiving a series of bits, and would not know what to make of them. Introduced by a MIME header, that stream of bits becomes a useful piece of data, and the browser can display it appropriately.
HTTP includes a great deal of information in headers, which are in the form: Variable: value
and come before the body, or main content portion, of the HTTP transaction. The particular header that specifies the content type of the data that is being sent, is the Content-Type
header, and has a very specific format.
Content-Type: major/minor
major
indicates the major category of content to which this content stream belongs. This is a general term that describes the content type, such as text
, image
, video
, audio
, and application
.
minor
indicates the specific type of file that this content should be treated as. Typically, this is a specific file format, such as html
, gif
, quicktime
, mp3
, or msword
.
Armed with these two pieces of information, we have a very specific idea of how to deal with the file. The browser will know how to render the file into a readable (or viewable, or audible, or whatever) format, or what external application to launch to deal with the data. Or, for a MIME type of application/unknown
, or anything else that the browser does not recognize, the typical behavior is to ask you to download the file and save it somewhere, or specify some particular application with which to open the file.
mod_mime
provides several directives for manipulating MIME information, both for setting particular MIME types, and for telling Apache how to react in the presence of particular MIME types.
The following directives affect the usage of MIME type on files. All these directives, except where specified, can appear anywhere—in the main server configuration file, VirtualHost
sections, Directory
sections, or in .htaccess
files.
The TypesConfig
directive specifies the location of the MIME types configuration file. This is the primary location where MIME types are mapped to filename extensions. The filename specified is assumed to be relative to the ServerRoot
directory, unless it starts with a slash, in which case, it is assumed to be a complete absolute path.
The default location of this file is conf/mime.types
.
The format of this file is very simple, listing a MIME type, followed by one or more file extensions that are to be mapped to that type. The following is an excerpt from the mime.types
file that comes with Apache:
application/mac-binhex40 hqx application/octet-stream bin dms lha lzh exe class so dll application/x-tex tex audio/x-realaudio ra image/gif gif image/ief ief image/jpeg jpeg jpg jpe text/html htm html text/sgml sgml sgm text/tab-separated-values tsv video/mpeg mpeg mpg mpe
If you want to add additional MIME types to the server mapping it is recommended that you use the AddType
directive, rather than adding them to the TypesConfig
file. The reason for this is very simple—it gives you a more reliable way to keep track of which MIME types ship in the default configuration and which were added later. Knowing what you changed is an important part of figuring out what went wrong when something is not working as expected.
Note that the file extension is case insensitive, and can be specified with or without the period.
The AddType directive has the same syntax as a line in the TypesConfig
file, and serves the same purpose. It can be placed in the main server configuration file, in any restricting section (such as a <Directory>
section, or a <Files>
section), or in a .htaccess
file.
AddType image/png .png
Note that the extension is case insensitive[1], and can be expressed with or without the leading dot.
To go with the AddType
directive, there is a RemoveType
directive. This is particularly useful if you use AddType
for a particular directory, but don't want its subdirectories to inherit the configuration.
<Directory /www/docs/products> Options +Includes +ExecCGI AddType application/x-httpd-cgi cgi </Directory> <Directory /www/docs/products> Options -ExecCGI RemoveType cgi </Directory>
You only need to mention the name of the extension, and any and all MIME types that are associated with that extension will be removed. It will then revert back to the default type, specified by the DefaultType
directive.
The DefaultType
directive is not part of mod_mime
, but is instead part of the core Apache API. The value of this directive determines how files are sent to the client if Apache is unable to determine what MIME type should be associated with it from the file extension.
The default value of this directive[2] is text/html so files of unknown type are served to the client as HTML. This is more important than it might initially appear. What this means is that if you start serving files of a new variety off of your site, such as gzipped tar files with a tgz file extension, Apache will quite cheerfully tell the browser that they are HTML files, and you will get garbage in your browser window. This can be especially confusing if you do your testing with Microsoft Internet Explorer, which usually tries to be helpful and figure out the file type, even if there is not a valid MIME type associated with the HTTP transaction. Thus, if you were to test such a file with Internet Explorer, you would mistakenly think that things were correctly configured. This emphasizes the importance of setting up MIME types correctly, and also of testing with more than one browser.
The DefaultType
directive, in addition to being set for the main server, can be set in <Directory>
sections as well, to specify that files of unknown types in that directory should have a particular type.
Similar to the DefaultType
directive, ForceType
indicates a MIME type for a set of files. However, rather than setting the type on files of unknown type, it forces all matching files (all files in the <Directory>
, or files matched by a <Files>
directive, for example) to a particular MIME type, regardless of file extension. Note that it can be used only in one of these sections, or in a .htaccess
file.
A file of a particular MIME type can additionally be encoded a particular way to simplify transmission over the Internet. Although this usually will refer to compression, it can also refer to encryption, or to an encoding such as UUencoding, which is designed for transmitting a binary file in an ASCII (text) format.
By using more than one file extension (see the section in this chapter titled “Files with Multiple Extensions”) you can indicate that a file is of a particular type, and also has a particular encoding.
For example, you might have a Microsoft Word document file, which is pkzipped to reduce its size. If the .doc
extension is associated with the Microsoft Word file type, and the .zip
extension is associated with the pkzip file encoding, then the file Resume.doc.zip
would be known to be a pkzipped Word document.
The Encoding
directives (AddEncoding
and RemoveEncoding
) are provided by mod_mime
to specify these encodings.
The default, if no Encoding
directives are specified, is that there is no encoding—that is, the file is simply sent as is. For this reason there is no DefaultEncoding
directive.
The AddEncoding
directive associates a particular content encoding with a particular file extension. This directive can be used in any context, for example:
AddEncoding pkzip .zip
This directive will ensure that any file with a file extension of .zip
will be delivered with a Content-encoding: pkzip
header.
It is often also desirable to not send the Content-encoding
header in certain situations. For example, you might have a directory of .gz
files—gzipped distributions of software packages, perhaps. These .gz
files are sent with a Content-encoding
of gzip
. In a subdirectory, you have files containing descriptions of each of the gzip
files. For convenience, you give these files the same name as the file they describe. These description files are to be downloaded as plain text. The following configuration, which could be placed in the main server configuration file, or in a .htaccess
file, would accomplish this:
<Directory /path/to/downloads> AddEncoding gzip .gz </Directory> <Directory /path/to/downloads/descriptions> RemoveEncoding gz ForceType text/plain </Directory>
The ForceType
directive is used here to ensure that the files are displayed as plain text in the browser window.
Finally, in addition to file type and the file encoding, another important piece of information is what language a particular document is in, and what character set the file should be displayed in. For example the document might be written in the Vietnamese alphabet, or Cyrillic, and should be displayed as such. This information is also transmitted in MIME headers. Although the character set is useful for the browser to determine how to display the document, the language and the character set are also used in the process of content negotiation (See Chapter 10, “Content Negotiation”). It determines which document to give to the client when there are alternative documents in more than one language or more than one character set.
To convey this further information, Apache optionally sends a Content-Language
header, to specify the language that the document is in, and can append additional information onto the Content-Type
header to indicate the particular character set that should be used to render the information correctly.
Content-Language: en, fr Content-Type: text/plain; charset=ISO-8859-2
The language specification is the two-letter abbreviation for the language. The charset
is the name of the particular character set that should be used. For a full listing of the two-letter abbreviations that may be used, see the documents ISO 639 and ISO 639-2.
Mirroring the directives for MIME types, languages, and character sets have directives provided my mod_mime
for adding, and removing, associations with particular file extensions.
These directives are AddCharset
, RemoveCharset
, AddLanguage
, RemoveLanguage
, and DefaultLanguage
.
There is not a DefaultCharset
directive, because the default character set is defined by the HTTP 1.1 specification as ISO-8859-1, and so it is unnecessary to have a directive to set this.
The AddCharset
directive associates a character set with a file extension, and causes Apache to send the character set information with the Content-type
header when files with that extension are served. This directive can be set in any context, for example:
AddCharset ISO-2022-JP .jis
The RemoveCharset
directive removes any association attached to the given file extension, for example:
RemoveCharset .jis
The AddLanguage
directive creates an association between a file extension and a particular language. Apache will send a Content-language
HTTP header, indicating the language of the document, when files with this extension are served. This directive can be set in any context, for example:
AddLanguage en .en AddLanguage fr .fr
Note that there can be only one language associated with any given file extension.
The RemoveLanguage
removes any language associations currently in effect for the specified file extension, for example:
RemoveLanguage .fr
With the capability to set attributes on a file by virtue of the file's extensions, the obvious question is: What if I want to set the language and the character set? Or the encoding and the language? Or all three?
This is accomplished very simply by giving the file several extensions. You can stack as many file extensions as you like onto a file, and the file attributes are accumulated, with the following rules:
If you give two (or more) file extensions that map the same attribute (such as two extensions that specify the file language, for example) then the one seen last (reading from left to right) is the one that is used.
index.html.en.fr
will be served with a Content-language
header specifying that it is French, not English, and will be served with a Content-type
of text/html
.
If you use an extension that is not recognized at all, it will cause Apache to forget the extensions it had figured out so far, and start all over again.
example.fr.pop.gif
will be served with a Content-type
of image/gif
, but with no Content-language
, because the pop
extension does not map to anything, and thus causes the mappings up to that point to be forgotten.
mod_mime
also defines directives for specifying handlers. A handler is a process that is defined for dealing with files of a particular type. For example, if we associate the handler cgi-script
with files with an extension of .cgi
, then those files will be executed, and the output sent to the client, rather than sending the file itself.
The directives AddHandler
, RemoveHandler
, and SetHandler
are all provided by mod_mime
.
Handlers actually have a chapter of their own, Chapter 14, “Handlers and Filters,” where these will be discussed in detail.
HTTP relies heavily on MIME headers for the delivery of content, both to tell the browser what to expect and to have it display the content correctly. mod_mime
provides most of the directives that deal with setting the MIME types on particular files.