Chapter 8. MIME and File Types

 

“Whoever we are, we will become what is said about us.”

 
 --David Williamson

MIME, which stands for Multipart Internet Mail Extensions, is defined in RFC's 2045–2049, and was initially developed to allow e-mail messages to contain non-ASCII information.

In this chapter, we'll talk about the use of MIME in HTTP, and the available configuration directives that Apache provides to use MIME.

MIME and HTTP

HTTP is heavily dependent on MIME. Each file that is sent to a client browser is prefaced by a MIME header, which tells the browser what sort of document it is receiving. In the absence of a MIME header, the browser would know only that it was receiving a series of bits, and would not know what to make of them. Introduced by a MIME header, that stream of bits becomes a useful piece of data, and the browser can display it appropriately.

HTTP includes a great deal of information in headers, which are in the form: Variable: value and come before the body, or main content portion, of the HTTP transaction. The particular header that specifies the content type of the data that is being sent, is the Content-Type header, and has a very specific format.

Content-Type: major/minor

major indicates the major category of content to which this content stream belongs. This is a general term that describes the content type, such as text, image, video, audio, and application.

minor indicates the specific type of file that this content should be treated as. Typically, this is a specific file format, such as html, gif, quicktime, mp3, or msword.

Armed with these two pieces of information, we have a very specific idea of how to deal with the file. The browser will know how to render the file into a readable (or viewable, or audible, or whatever) format, or what external application to launch to deal with the data. Or, for a MIME type of application/unknown, or anything else that the browser does not recognize, the typical behavior is to ask you to download the file and save it somewhere, or specify some particular application with which to open the file.

MIME Types Configuration Directives

mod_mime provides several directives for manipulating MIME information, both for setting particular MIME types, and for telling Apache how to react in the presence of particular MIME types.

MIME Types Configuration

The following directives affect the usage of MIME type on files. All these directives, except where specified, can appear anywhere—in the main server configuration file, VirtualHost sections, Directory sections, or in .htaccess files.

TypesConfig

The TypesConfig directive specifies the location of the MIME types configuration file. This is the primary location where MIME types are mapped to filename extensions. The filename specified is assumed to be relative to the ServerRoot directory, unless it starts with a slash, in which case, it is assumed to be a complete absolute path.

The default location of this file is conf/mime.types.

The format of this file is very simple, listing a MIME type, followed by one or more file extensions that are to be mapped to that type. The following is an excerpt from the mime.types file that comes with Apache:

application/mac-binhex40    hqx
application/octet-stream    bin dms lha lzh exe class so dll
application/x-tex       tex
audio/x-realaudio       ra
image/gif           gif
image/ief           ief
image/jpeg          jpeg jpg jpe
text/html htm html
text/sgml           sgml sgm
text/tab-separated-values   tsv
video/mpeg          mpeg mpg mpe

If you want to add additional MIME types to the server mapping it is recommended that you use the AddType directive, rather than adding them to the TypesConfig file. The reason for this is very simple—it gives you a more reliable way to keep track of which MIME types ship in the default configuration and which were added later. Knowing what you changed is an important part of figuring out what went wrong when something is not working as expected.

Note that the file extension is case insensitive, and can be specified with or without the period.

AddType

The AddType directive has the same syntax as a line in the TypesConfig file, and serves the same purpose. It can be placed in the main server configuration file, in any restricting section (such as a <Directory> section, or a <Files> section), or in a .htaccess file.

AddType image/png .png

Note that the extension is case insensitive[1], and can be expressed with or without the leading dot.

RemoveType

To go with the AddType directive, there is a RemoveType directive. This is particularly useful if you use AddType for a particular directory, but don't want its subdirectories to inherit the configuration.

<Directory /www/docs/products>
Options +Includes +ExecCGI
AddType application/x-httpd-cgi cgi
</Directory>

<Directory /www/docs/products>
Options -ExecCGI
RemoveType cgi
</Directory>

You only need to mention the name of the extension, and any and all MIME types that are associated with that extension will be removed. It will then revert back to the default type, specified by the DefaultType directive.

DefaultType

The DefaultType directive is not part of mod_mime, but is instead part of the core Apache API. The value of this directive determines how files are sent to the client if Apache is unable to determine what MIME type should be associated with it from the file extension.

The default value of this directive[2] is text/html so files of unknown type are served to the client as HTML. This is more important than it might initially appear. What this means is that if you start serving files of a new variety off of your site, such as gzipped tar files with a tgz file extension, Apache will quite cheerfully tell the browser that they are HTML files, and you will get garbage in your browser window. This can be especially confusing if you do your testing with Microsoft Internet Explorer, which usually tries to be helpful and figure out the file type, even if there is not a valid MIME type associated with the HTTP transaction. Thus, if you were to test such a file with Internet Explorer, you would mistakenly think that things were correctly configured. This emphasizes the importance of setting up MIME types correctly, and also of testing with more than one browser.

The DefaultType directive, in addition to being set for the main server, can be set in <Directory> sections as well, to specify that files of unknown types in that directory should have a particular type.

ForceType

Similar to the DefaultType directive, ForceType indicates a MIME type for a set of files. However, rather than setting the type on files of unknown type, it forces all matching files (all files in the <Directory>, or files matched by a <Files> directive, for example) to a particular MIME type, regardless of file extension. Note that it can be used only in one of these sections, or in a .htaccess file.

Encoding

A file of a particular MIME type can additionally be encoded a particular way to simplify transmission over the Internet. Although this usually will refer to compression, it can also refer to encryption, or to an encoding such as UUencoding, which is designed for transmitting a binary file in an ASCII (text) format.

By using more than one file extension (see the section in this chapter titled “Files with Multiple Extensions”) you can indicate that a file is of a particular type, and also has a particular encoding.

For example, you might have a Microsoft Word document file, which is pkzipped to reduce its size. If the .doc extension is associated with the Microsoft Word file type, and the .zip extension is associated with the pkzip file encoding, then the file Resume.doc.zip would be known to be a pkzipped Word document.

The Encoding directives (AddEncoding and RemoveEncoding) are provided by mod_mime to specify these encodings.

The default, if no Encoding directives are specified, is that there is no encoding—that is, the file is simply sent as is. For this reason there is no DefaultEncoding directive.

AddEncoding

The AddEncoding directive associates a particular content encoding with a particular file extension. This directive can be used in any context, for example:

AddEncoding pkzip .zip

This directive will ensure that any file with a file extension of .zip will be delivered with a Content-encoding: pkzip header.

RemoveEncoding

It is often also desirable to not send the Content-encoding header in certain situations. For example, you might have a directory of .gz files—gzipped distributions of software packages, perhaps. These .gz files are sent with a Content-encoding of gzip. In a subdirectory, you have files containing descriptions of each of the gzip files. For convenience, you give these files the same name as the file they describe. These description files are to be downloaded as plain text. The following configuration, which could be placed in the main server configuration file, or in a .htaccess file, would accomplish this:

<Directory /path/to/downloads>
  AddEncoding gzip .gz
</Directory>

<Directory /path/to/downloads/descriptions>
  RemoveEncoding gz
  ForceType text/plain
</Directory>

The ForceType directive is used here to ensure that the files are displayed as plain text in the browser window.

Character Sets and Languages

Finally, in addition to file type and the file encoding, another important piece of information is what language a particular document is in, and what character set the file should be displayed in. For example the document might be written in the Vietnamese alphabet, or Cyrillic, and should be displayed as such. This information is also transmitted in MIME headers. Although the character set is useful for the browser to determine how to display the document, the language and the character set are also used in the process of content negotiation (See Chapter 10, “Content Negotiation”). It determines which document to give to the client when there are alternative documents in more than one language or more than one character set.

To convey this further information, Apache optionally sends a Content-Language header, to specify the language that the document is in, and can append additional information onto the Content-Type header to indicate the particular character set that should be used to render the information correctly.

Content-Language: en, fr
Content-Type: text/plain; charset=ISO-8859-2

The language specification is the two-letter abbreviation for the language. The charset is the name of the particular character set that should be used. For a full listing of the two-letter abbreviations that may be used, see the documents ISO 639 and ISO 639-2.

Mirroring the directives for MIME types, languages, and character sets have directives provided my mod_mime for adding, and removing, associations with particular file extensions.

These directives are AddCharset, RemoveCharset, AddLanguage, RemoveLanguage, and DefaultLanguage.

There is not a DefaultCharset directive, because the default character set is defined by the HTTP 1.1 specification as ISO-8859-1, and so it is unnecessary to have a directive to set this.

AddCharset

The AddCharset directive associates a character set with a file extension, and causes Apache to send the character set information with the Content-type header when files with that extension are served. This directive can be set in any context, for example:

AddCharset ISO-2022-JP .jis

RemoveCharset

The RemoveCharset directive removes any association attached to the given file extension, for example:

RemoveCharset .jis

AddLanguage

The AddLanguage directive creates an association between a file extension and a particular language. Apache will send a Content-language HTTP header, indicating the language of the document, when files with this extension are served. This directive can be set in any context, for example:

AddLanguage en .en
AddLanguage fr .fr

Note that there can be only one language associated with any given file extension.

RemoveLanguage

The RemoveLanguage removes any language associations currently in effect for the specified file extension, for example:

RemoveLanguage .fr

DefaultLanguage

The DefaultLanguage directive determines the Content-language header that should be sent with files for which no explicit language association has been set. If your site is primarily an English-language site, for example, you should set this to en as shown in this example:

DefaultLanguage en

Files with Multiple Extensions

With the capability to set attributes on a file by virtue of the file's extensions, the obvious question is: What if I want to set the language and the character set? Or the encoding and the language? Or all three?

This is accomplished very simply by giving the file several extensions. You can stack as many file extensions as you like onto a file, and the file attributes are accumulated, with the following rules:

  • If you give two (or more) file extensions that map the same attribute (such as two extensions that specify the file language, for example) then the one seen last (reading from left to right) is the one that is used.

  • index.html.en.fr will be served with a Content-language header specifying that it is French, not English, and will be served with a Content-type of text/html.

  • If you use an extension that is not recognized at all, it will cause Apache to forget the extensions it had figured out so far, and start all over again.

  • example.fr.pop.gif will be served with a Content-type of image/gif, but with no Content-language, because the pop extension does not map to anything, and thus causes the mappings up to that point to be forgotten.

Handlers

mod_mime also defines directives for specifying handlers. A handler is a process that is defined for dealing with files of a particular type. For example, if we associate the handler cgi-script with files with an extension of .cgi, then those files will be executed, and the output sent to the client, rather than sending the file itself.

The directives AddHandler, RemoveHandler, and SetHandler are all provided by mod_mime.

Handlers actually have a chapter of their own, Chapter 14, “Handlers and Filters,” where these will be discussed in detail.

Summary

HTTP relies heavily on MIME headers for the delivery of content, both to tell the browser what to expect and to have it display the content correctly. mod_mime provides most of the directives that deal with setting the MIME types on particular files.



[1] That is, it can be upper or lowercase.

[2] The default default type, if you will.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset