CHAPTER 7

image

Handling HTTP

The Hypertext Transfer Protocol (HTTP) is the fundamental language for communication over the Web. It’s spoken by both Web servers and Web browsers, along with a variety of specialty tools for dealing with the Web.

The Python community has done a tremendous amount of work to standardize the behavior of applications that interact with HTTP, culminating in PEP-333,1 the Web Server Gateway Interface (WSGI). Since Django follows the WSGI specification, many of the details listed in this chapter are a direct result of compliance with PEP-333.

Requests and Responses

Because HTTP is a stateless protocol, at its heart is the notion of a request and a response. Clients issue a request to the server, which returns a response containing the information requested by the client or an error indicating why the request couldn’t be fulfilled.

While requests and responses follow a detailed specification, Django provides a pair of Python objects that are designed to make the protocol much easier to deal with in your own code. A basic working knowledge of the protocol is useful, but most of the details are handled behind the scenes. These objects are described in this section, along with notes indicating the relevant portions of the specification that should be referenced.

HttpRequest

As described in Chapter 4, every Django view receives, as its first argument, an object representing the incoming HTTP request. This object is an instance of the HttpRequest class, which encapsulates a variety of details concerning the request, as well as some utility methods for performing useful functions.

The base HttpRequest class lives at django.http, but individual server connectors will define a subclass with additional attributes or overridden methods that are specific to the Web server being utilized. Any overridden methods or attributes should behave as documented here, and any additional information will be best documented in the code for the server interface itself.

HttpRequest.method

The HTTP specification outlines a variety of verbs that can be used to describe the type of request being performed. This is typically referred to as its method, with different request methods having specific expectations of how they should be handled. In Django, the method being used for the request is represented as the method attribute of the HttpRequest object. It will be included as a standard string, with the method name in all uppercase letters.

Each method describes what the server should do with the resource identified by the URL. Most Web applications will only implement GET and POST, but a few others are worth explaining here as well. Further details on these—and others not listed here—can be found in the HTTP specification,2 as well as many other resources on the Web.

  • DELETE—Requests that the resource be deleted. Web browsers don’t implement this method, so its use is limited to Web service applications. In typical Web browser applications, such operations are done with a POST request, since GET requests aren’t allowed to have side effects, such as removal of the resource.
  • GET—Retrieves the resource specified by the URL. This is, by far, the most common type of request made on the Web, as every standard retrieval of a Web page is done with a GET request. As noted in the “Safe Methods” section, GET requests are assumed to have no side effects on the server; they should retrieve the specified resource and do nothing else.
  • HEAD—Retrieves some information about the resource without getting the entire contents. Specifically, the response to a HEAD request should return exactly the same headers as a GET request, only without anything in the body of the response. Web browsers don’t implement this method, but since the server-side operation is essentially just a GET request without a response body, it is rarely missed. In Web service applications, a HEAD request can be a low-bandwidth way to retrieve information about a resource, such as whether it exists, when it was last updated or the size of its content.
  • POST—Requests that the attached data be stored in some way related to the resource specified by the URL. This could mean comments on a blog post or news article, answers to a question, replies to a Web-based email or any number of other related situations.

    This definition is only valid in Web service environments, where a differentiation can be made between PUT and POST. In standard Web browsers, only GET and POST are reliably available, so POST is used for any situation that modifies information on the server. Using POST to submit data from a form is little more than a footnote in the official HTTP specification, but is the most popular use of the method.

  • PUT—Requests that the attached data be stored at the resource specified by the URL. This could be seen as a “create” or “replace” operation, depending on whether the resource already exists. This method isn’t traditionally available in Web browsers, though, so its use is limited to Web service applications. In a standard Web browser, the operation specified by PUT is done with a POST request instead.

“Safe” Methods

As alluded to in the previous section, there is an important distinction to be made among various types of HTTP requests. The specification refers to GET and HEAD as “safe” methods, which only retrieve the resource specified by the URL, without making any changes on the server at all. To be explicit, a view that processes a GET or HEAD request shouldn’t make any changes except those that are incidental to retrieving the page.

The goal of safe methods is to allow the same request to be made more than once and at various times, without any adverse effects. This assumption allows GET requests to be used by bookmarks and browser histories without a warning to the user when the request is made more than once. An example of an allowed change is updating a count that indicates how many times the page was viewed.

“Idempotent” Methods

In addition to safe methods, the HTTP specification describes PUT and DELETE as “idempotent,” meaning that, even though they are intended to make changes on the server, those changes are reliable enough that calling the same request with the same body multiple times will always make the same changes.

In the case of PUT, the resource would be created the first time the request is performed, and each subsequent request would simply replace the resource with the same data that was originally submitted, thus leaving it the same. For DELETE, each subsequent request after the resource was originally deleted would result in an error, indicating that the resource isn’t present, thus leaving the state of the resource the same each time. On the other hand, POST is expected to make changes or additions on each request. To represent this situation, Web browsers present a message when a POST request is performed more than once, warning the user that subsequent requests could cause problems.

HttpRequest.path

This attribute contains the complete path that was requested, without any query-string parameters attached. This can be used to identify the resource being requested, without relying on which view will be called or how it will behave.

Accessing Submitted Data

Any time a request comes in, it can potentially be accompanied by a variety of data provided by the Web browser. Processing this information is key to making a Web site dynamic and interactive, so Django makes it easy and flexible. Just as there are many ways to submit data to a Web server, there are as many ways to access that data once it arrives.

Data that comes in using the standard query-string format3 sent by most browsers is automatically parsed into a special type of dictionary class called QueryDict. This is an immutable subclass of MultiValueDict, which means that it functions mostly like a dictionary, but with a few added options for handling multiple values for each key in the dictionary.

The most significant detail of QueryDict is that it’s instantiated with a query-string from an incoming request. For more information on the details of how to access values in a QueryDict, see the details for MultiValueDict in Chapter 9.

HttpRequest.GET

If the request came in with the GET method, its GET attribute will be a QueryDict containing all the values that were included in the query-string portion of the URL. Of course, while there’s no technical restriction on when GET can be used to get parameters out of a URL, the goal of clean URLs limits the situations where it’s most advantageous.

In particular, it’s important to separate parameters that identify a resource from those that customize how the resource is retrieved. This is a subtle, but important, distinction. Consider the following examples:

  • /book/pro-django/chapter07/
  • /news/2008/jun/15/website-launched/
  • /report/2008/expenses/?ordering=category

As you can see, most of the data sent to the view for GET requests should be placed in the URL itself, rather than the query-string. This will help search engines index them more efficiently, while also making it easier for users to remember them and communicate them with others. As with many other principles, this isn’t an absolute rule, so keep query-strings and the GET attribute in your toolbox, but use them with care.

HttpRequest.POST

If the request comes in with a PUT or POST method using a standard HTML form, this will be a QueryDict containing all the values submitted with the form. The POST attribute will be populated for all standard forms, regardless of the encoding type, with or without files.

However, the HTTP specification allows these requests to supply data in any format, so if the incoming data doesn’t fit the format of a query-string, HttpRequest.POST will be empty, and the data will have to be read in directly through HttpRequest.raw_post_data.

HttpRequest.FILES

If an incoming PUT or POST request includes any uploaded files, those files will be stored away in the FILES attribute, which is also a QueryDict, with each value being a django.core.files.uploadedfile.UploadedFile object. This is a subclass of the File object described later, in Chapter 9, providing a few extra attributes specific to uploaded files.

  • content_type—The Content-Type associated with the file, if any was provided. Web browsers typically assign this based on the last part of the filename, though a Web service call could specify this more accurately based on the actual type of content.
  • charset—The character set that was specified for the uploaded file’s content.

HttpRequest.raw_post_data

Any time a request comes in with data in the body of the request, as is done for PUT and POST, the raw_post_data attribute provides access to this content, without any parsing. This isn’t typically necessary for most Web sites, as the GET and POST attributes are more appropriate for the most common types of requests. Web services may accept data in any format, and many use XML as a primary means of data transfer.

HttpRequest.META

When a request comes in, there is a significant amount of information related to the request that doesn’t come through in a query-string and isn’t available in the GET or POST attributes on the request. Instead, data regarding where the request came from and how it got to the server is stored in the request’s META attribute. Details of which values are available in META can be found in PEP-333.

In addition, each request is accompanied by a number of headers, which describe various options the client would like to make known. Exactly what these types of headers can contain is specified in the HTTP specification,4 but they typically control things like a preferred language, allowable content-types and information about the Web browser.

These headers are also stored in META, but in a form slightly altered from how they came in originally. All HTTP header names become uppercase, are prefixed with HTTP_ and have all of their dashes replaced with underscores.

  • Host becomes HTTP_HOST.
  • Referer becomes HTTP_REFERER.
  • X-Forwarded-For becomes HTTP_X_FORWARDED_FOR.

HttpRequest.COOKIES

Since each HTTP request is a fresh connection between the client and the server, cookies are used as a way to identify clients that make multiple requests. In a nutshell, cookies are little more than a way to send a name and associated value to a Web browser, which that browser will then send back each time it makes a new request to the Web site.

While cookies are set during the response phase of the process, as documented under HttpResponse, the task of reading cookies from an incoming request is quite simple. The COOKIES attribute of the request is a standard Python dictionary mapping names of cookies to the values that were previously sent.

Keep in mind that this dictionary will contain entries for all cookies sent by the browser, even if they were set by another application on the same server. The HttpResponse section later in this chapter covers the specific rules of how a browser decides which cookies to send with a particular request and how to control that behavior.

HttpRequest.get_signed_cookie(key[, …])

If you store information in a cookie that could be used against you if it were tampered with, you can opt to sign your cookies and validate those signatures when reading the cookies back in. The signatures themselves are provided using the HttpResponse.set_signed_cookie() method described later in this chapter, but when reading them in a request, you’ll need to use this method.

You can control the behavior of your cookie retrieval using a few additional arguments:

  • default=RAISE_ERROR—This argument allows you to specify a default value that should be returned if the requested key was not found or was invalid. This is equivalent to passing a default value into a standard dictionary’s get() method. If you don’t supply a value, this method will raise a standard KeyError if the key is missing or django.core.signing.BadSignature if the signature is invalid.
  • salt=''—This is a complement to the same argument in the set_signed_cookie() method. It allows you to use the same key in different aspects of your application, perhaps on multiple domains, without risk of the signature being reused from one use to another. This must match the value you provide when setting the cookie in order for the signature check to match.
  • max_age=None—By default, cookie signatures also have an expiration associated with them to avoid them being reused longer than intended. If you supply a max_age that exceeds the age of a given cookie, you’ll get a django.core.signing.SignatureExpired exception. By default, this won’t check the expiration date when validating the signature.

HttpRequest.get_host( )

Many server configurations allow a single Web application to respond to requests sent to multiple different domain names. To help with these situations, the get_host() method of the incoming request allows a view to identify the name that the Web browser used to reach the Web site.

In addition to the host name used to make the request, the value returned from this method will include a port number if the server was configured to respond on a nonstandard port.

HttpRequest.get_full_path( )

In addition to the host information, the get_full_path() method returns the entire path portion of the URL; everything after the protocol and domain information. This includes the full path that was used to determine which view to use, as well as any query-string that was provided.

HttpRequest.build_absolute_uri(location=None)

This method generates an absolute URL for the provided location, if any. If no location is supplied explicitly, the request’s current URL is returned, including the query-string. The exact behavior of the method if the location is provided depends on what value is passed in.

  • If the value contains a fully-qualified URL—including the protocol—that URL is already absolute and is returned as provided.
  • If the value begins with a forward slash (/), it is appended to the protocol and domain information of the current URL, then returned. This will generate an absolute URL for the provided path, without having to hard-code the server information.
  • Otherwise, the value is assumed to be a path relative to the request’s current URL, and the two will be joined together using Python’s urlparse.urljoin() utility function.

HttpRequest.is_secure( )

This simple method returns True if the request came in using the Secure Sockets Layer (SSL) protocol or False if the request was unsecured.

HttpRequest.is_ajax( )

Useful for “Web 2.0” sites, this method returns True if the request has an X-Requested-With header with a value of “XMLHttpRequest”. Most JavaScript libraries designed to make calls to the server will provide this header, providing a convenient way to identify them.

HttpRequest.encoding

This is a simple attribute representing the encoding to be used when accessing the GET and POST attributes described earlier. Values in those dictionaries are forced to unicode objects using this encoding, if one is set. By default, its value is None, which will use the default encoding of utf-8 when accessing values.

In most cases, this attribute can be left as is, with most input being converted properly using the default encoding. Specific applications may have different needs, so if the application expects input with a different encoding, simply set this attribute to a value that will decode those values properly.

HttpResponse

After a request is received and processed, every view is responsible for returning a response—an instance of HttpResponse. This object maps cleanly to an actual HTTP response, including headers, and is the only way of controlling what is sent back to the Web browser. Like its cousin for requests, HttpResponse lives at django.http, but several shortcuts are available to create responses more easily.

Creating a Response

Unlike the request, the author of a view has full control over how its response is created, allowing a variety of options. The standard HttpResponse class is instantiated rather simply, but accepts three arguments to customize its behavior. None of these are required; options described later in this section can set these values in other ways.

  • content—This accepts text—or other content—to be used as the body of the request.
  • status—This sets the HTTP status code5 to be sent with the request.
  • content_type—This controls the Content-Type header to be sent with the request. If this is supplied, make sure it also contains the charset value when appropriate.
    >>> from django.http import HttpResponse
    >>> print HttpResponse()
    Content-Type: text/html; charset=utf-8
     
     
    >>> print HttpResponse(content_type='application/xml; charset=utf-8')
    Content-Type: application/xml; charset=utf-8
     
     
    >>> print HttpResponse('content')
    Content-Type: text/html; charset=utf-8
     
    content

There is also a mimetype argument, provided for backwards-compatibility with older Django applications, but content_type should be used instead. It’s still important to keep mimetype in mind, though, as it means that status and content_type should be specified as keyword arguments, if supplied at all.

Dictionary Access to Headers

Once a response has been created, it’s simple to customize the headers that will be sent out along with its content, using standard dictionary syntax. This is quite straightforward and works just as you’d expect. The only notable variation from a standard dictionary is that all key comparisons are case-insensitive.

>>> from django.http import HttpResponse
>>> response = HttpResponse('test content')
>>> response['Content-Type']
'text/html; charset=utf-8'
>>> response['Content-Length']
Traceback (most recent call last):
  ...
KeyError: 'content-length'
>>> response['Content-Length'] = 12
>>> for name, value in response.items():
...     print '%s is set to %r' % (name, value)
...
Content-Length is set to '12'
Content-Type is set to 'text/html; charset=utf-8'

File-Like Access to Content

In addition to the ability to specify body content as a string when creating the response object, content can be created by many third-party libraries that know how to write to open files. Django’s HttpResponse implements a few file protocol methods—most notably write()—that enable it to be treated as a write-only file for many of these libraries. This technique can be especially useful when using Django to generate binary content, such as PDF files, dynamically within views.

One important thing to note regarding file-like access to the response body is that not all file protocol methods are implemented. This means that certain libraries, such as Python’s own zipfile.ZipFile class, which require those extra methods, will fail with an AttributeError, indicating which method was missing. This is by design, as HTTP responses aren’t true files, so there is no predictable way to implement those methods.

HttpResponse.status_code

This attribute contains the numerical status code representing the type of response being sent to the client. As described earlier, this can be set immediately when instantiating the response object, but as a standard object attribute, it can also be set any time after the response has been created.

This should only be set to known HTTP response status codes. See the HTTP specification for details on valid status codes. This status can be set while instantiating the response, but it can also be set as a class attribute on a subclass, which is how Django configures many of its specialized responses.

HttpResponse.set_cookie(key, value=''[, …])

When looking to store values across multiple requests, cookies are the tool of choice, passing values to the Web browser through special headers, which are then sent back to the server on subsequent requests. By calling set_cookie() with a key and a value, the HTTP response sent to the client will contain a separate header, telling the browser what to store and when to send it back to the server.

In addition to just the key and value, set_cookie() can take a few extra arguments that configure when the browser should send the cookie back to the server. While a quest for readability suggests that these arguments be specified using keywords, this list uses their positional order. More details on what values are allowed for each of these options can be found in the official specification for HTTP state management.6

  • max_age=None—Corresponding to the max-age option from the specification, this specifies the number of seconds the cookie should remain active.
  • expires=None—Not all browsers accept and respect max-age as required by the official specification but instead follow an early pattern set out by Netscape. The expires attribute takes an exact date when the cookie should expire, rather than an offset in seconds. The specified date is in the following format: Sun, 15-Jun-2008 12:34:56 GMT.
  • path='/'—This specifies a base path under which the browser should send this cookie back to the server. That is, if the path of the URL being requested begins with the value specified here, the browser will send the cookie’s value along with the request.
  • domain=None—Similar to path, this specifies the domain under which the cookie will be sent. If left as None, the cookie will be restricted to the same domain that issued it, while providing a value will allow greater flexibility.
  • secure=False—If set to True, this will indicate that the cookie contains sensitive information and should only be sent to the server through a secure connection, such as SSL.
    >>> response = HttpResponse()
    >>> response.set_cookie('a', '1')
    >>> response.set_cookie('b', '2', max_age=3600)
    >>> response.set_cookie('c', '3', path='/test/', secure=True)
    >>> print response.cookies
    Set-Cookie: a=1; Path=/
    Set-Cookie: b=2; Max-Age=3600; Path=/
    Set-Cookie: c=3; Path=/test/; secure

Keep in mind that this will set the cookie in the browser only after the response has made its way across the wire. That means that the cookie’s value won’t be available on the request object until the browser’s next request.

COOKIES AND SECURITY

Although cookies can be a tremendously useful way to maintain state across multiple HTTP requests, they’re stored on a user’s computer, where knowledgeable users will have access to view them and alter their contents. Cookies on their own are not secure, and should not be used to store sensitive data or data that controls how the user can access the site.

The typical way around this problem is to only store a reference in the cookie, which can be used to retrieve the “real” data from somewhere on the server, such as a database or a file, where users don’t have access. The “Applied Techniques” section near the end of this chapter provides an alternative method of storing data securely in cookies so that their data can in fact be trusted.

HttpResponse.delete_cookie(key, path='/', domain=None)

If a cookie has already been delivered to the Web browser and is no longer needed or has become invalid, the delete_cookie() method can be used to instruct the browser to remove it. As mentioned, the path and domain provided here must match an existing cookie in order to have it deleted properly.

It does this by setting a new cookie with max-age set to 0 and expires set to Thu, 01-Jan-1970 00:00:00 GMT. This causes the browser to overwrite any existing cookie matching the same key, path and domain, then expire it immediately.

HttpResponse.cookies

In addition to being able to explicitly set and delete cookies during the response phase, you can view the cookies that will be sent to the Web browser. The cookies attribute uses Python’s standard Cookie module,7 with the attribute itself being a SimpleCookie object, which behaves much like a dictionary, with each value being a Morsel object.

Using a cookie’s name as the key, you can retrieve a Morsel representing a specific cookie value, along with its associated options. This object may be used as a dictionary to reference these additional options, while its value attribute contains the value that was set for the cookie. Even deleted cookies are accessible using this dictionary, since the process involves setting a new cookie that will simply expire immediately.

>>> len(response.cookies)
3
>>> for name, cookie in response.cookies.items():
...     print '%s: %s (path: %s)' % (name, cookie.value, cookie['path'])
...
a: 1 (path: /)
b: 2 (path: /test/)
c: 3 (path: /)

HttpResponse.set_signed_cookie(key, value, salt=''[, …])

This works just like set_cookie(), except that it also cryptographically signs the value before sending it out to the browser. Because cookies are stored in the browser, this ensures that the user doesn’t modify the values in those cookies before visiting your site again. You still don’t want to store sensitive information in your cookies, but this allows you to confidently store things like a logged-in username in a cookie, without the user being able to use it as an attack vector.

This takes all the same arguments as set_cookie(), with one addition: salt. By default, Django uses your settings.SECRET_KEY to generate a signature, which is fine in most cases, where a cookie with a particular key is only likely to be used for one purpose. In other cases, the salt argument allows you to craft a signature to whatever use you currently have.

For example, if you’re serving up multiple domains with a single Django installation, you could use the domain name as the salt for your signatures, so that a user can’t reuse the signature from one domain on a different domain. The different salts ensure the signatures will be different, so that a copied signature would fail the signature test when retrieving the cookie in your view.

HttpResponse.content

This attribute provides access to the string content of the response body. This can be read or written, and is particularly useful during the response phase of middleware processing.

Specialty Response Objects

Since there are several common HTTP status codes, Django provides a set of customized HttpResponse subclasses with their status_code attribute already set accordingly. Like HttpResponse itself, these all live at django.http. Some of them take a different set of arguments than the standard HttpResponse, and those differences are also listed here.

  • HttpResponseRedirect—Takes a single argument, a URL that the browser will redirect to. It also sets the status_code to 302, indicating a “Found” status, where the resource is located.
  • HttpResponsePermanentRedirect—Takes a single argument, a URL that the browser will redirect to. It sets the status_code to 301, indicating the resource was permanently moved to the URL specified.
  • HttpResponseNotModified—Sets the status_code to 304, indicating a “Not Modified” status, to be used in response to a conditional GET, when the response hasn’t changed from the conditions associated with the request.
  • HttpResponseBadRequest—Sets the status_code to 400, indicating a “Bad Request” where the syntax used in the request couldn’t be understood by the view.
  • HttpResponseForbidden—Sets the status_code to 403, “Forbidden,” where the requested resource does exist, but the requesting user doesn’t have permission to access it.
  • HttpResponseNotFound—Perhaps most common of all custom classes, this sets the status_code to 404, “Not Found,” where the URL in the request didn’t map to a known resource.
  • HttpResponseNotAllowed—Sets the status_code to 405, “Not Allowed,” indicating that the method used in the request isn’t valid for the resource specified by the URL.
  • HttpResponseGone—Sets the status_code to 410, “Gone,” to indicate that the resource specified by the URL is no longer available and can’t be located at any other URL.
  • HttpResponseServerError—Sets the status_code to 500, “Server Error,” used whenever the view encountered an unrecoverable error.

Some of these specialized responses aren’t supported by Web browsers, but they’re all quite useful for Web service applications, where a wider range of options are available. It often makes more sense to set these statuses on a site-wide basis, so individual views don’t have to worry about managing them directly. For this, Django provides HTTP middleware.

Writing HTTP Middleware

While Django itself creates an HttpRequest and each view is responsible for creating an HttpResponse, applications commonly need certain tasks to be performed on every incoming request or outgoing response. This portion of the process, called middleware, can be a useful way to inject advanced processing into the flow.

Common examples of middleware processing are compressing response content, denying access to certain types of requests or those from certain hosts and logging requests and their associated responses. Although these tasks could be done in individual views, doing so would not only require a great deal of boilerplate but would also require each view to know about every piece of middleware that would be applied.

This would also mean that adding or removing HTTP processing would require touching every single view in an entire project. That’s not only a maintenance issue in its own right, but it also causes additional maintenance problems if your project uses any third-party applications. After all, changing third-party code restricts your ability to upgrade it in the future without unnecessary hassle. Django solves these problems by performing middleware operations in a separate part of the request/response cycle.

Each piece of middleware is simply a Python class that defines at least one of the following methods. There are no other requirements for this class; that is, it doesn’t have to subclass any provided base class, contain any particular attributes or be instantiated in any specific way. Just provide the class at an importable location and a site will be able to activate it.

There are four distinct points where middleware can hook into Django’s HTTP handling, performing whatever tasks it needs along the way. Each part of the process is controlled simply by specifying a method on the middleware class. Remember, it’s all just Python, so anything that’s valid Python is valid in middleware as well.

MiddlewareClass.process_request(self, request)

As soon as the incoming HTTP request is made into an HttpRequest object, middleware has its first chance to change how things get handled. This hook occurs even before Django analyzes the URL to determine which view to use.

Being standard Python, the process_request() method can perform any task, but common tasks include prohibiting access to certain clients or request types, adding attributes to the request for use by context processors or returning a previously-cached response based on details of the request.

This method can change any attribute on the request, but keep in mind that any changes will affect how Django handles the request throughout the rest of the process. For example, because this method is called prior to the URL resolution, it can modify request.path to redirect the request to an entirely different view than would’ve otherwise been used. While something like this is often the desired behavior, it can possibly be an unintended side effect, so take care when modifying the request.

MiddlewareClass.process_view(self, request, view, args, kwargs)

This method is called after the URL has been mapped to a view and arguments have been extracted from it, but before the view is actually called. In addition to the request, the arguments passed to this method are as follows:

  • view—The view function that will be called. This is the actual function object, not the name, regardless of whether the view was configured using a string or a callable.
  • args—A tuple containing the positional arguments that will be passed to the view.
  • kwargs—A dictionary containing the keyword arguments that will be passed to the view.

Now that the view’s arguments have been extracted from the URL, it is possible to verify these against what the configuration was supposed to obtain. This can be quite useful during development as a way to verify that everything is configured properly. Simply set up a middleware class to print out the args and kwargs variables along with request.path. Then, if anything goes wrong with a view, the development server’s console will have a handy way to identify or rule out a potential problem.

This may seem like a perfect opportunity to do some detailed logging of the view that’s about to be executed as well, since the view function object is available too. While this is true, the common use of decorators on views complicates matters. Specifically, the view function passed to this method will often be a wrapper function created by the decorator, rather than the view itself.

This means that the introspection features detailed in Chapter 2 can’t reliably be used to line up positional arguments with the names they were given in the function definition. There is still some good, though, as you should still be able to access the module and name of the view, as long as the decorators use the special wraps decorator described in Chapter 9.

class ArgumentLogMiddleware(object):
    def process_view(request, view, args, kwargs):
        print 'Calling %s.%s' % (view.__module__, view.__name__)
        print 'Arguments: %s' % (kwargs or (args,))

MiddlewareClass.process_response(self, request, response)

After the view has been executed, the new response object is made available for middleware to view it and make any necessary changes. This is where middleware could cache the response for future use, compress the response body for faster transmission over the wire or modify the headers and content that will be sent with the response.

It receives the original request object as well as the response object returned by the view. At this point, the request has already exhausted its usefulness to the HTTP cycle, but it can be useful if some of its attributes are used to determine what to do with the response. The response object can be—and often is—modified at this stage, before being returned by the method.

The process_response() method should always return an HttpResponse object, regardless of what’s done with it beforehand. Most often, this will be the response it was given in the first place, just with some minor modifications. Sometimes, it may make more sense to return an entirely different response, such as when redirecting to a different URL.

MiddlewareClass.process_exception(self, request, exception)

If something goes wrong during any part of the request-handling process, including the middleware methods, an exception will usually be thrown. Most of these exceptions will be sent to the process_exception() to be logged or handled in a special way. The exception argument passed to this method is the exception object that was thrown, and it can be used to retrieve specific details about what went wrong.

A common task for this stage of the process is to log exceptions in a way that’s specific to the site currently in use. The exception’s string representation is usually sufficient for this, along with its type, though the exact usefulness of this will depend on the exception that was raised. By combining details of the original request with details of the exception, you can generate useful and readable logs.

Deciding Between Middleware and View Decorators

Chapter 4 showed how views can use decorators to perform extra work before or after the view is executed, and keen readers will notice that middleware can perform a similar function. View decorators have access to the incoming request as well as the response generated by the view. They can even access the view function and the arguments that will be passed to it, and they can wrap the view in a try block to handle any exceptions that are raised.

So what makes them different, and when should you use one over the other? That’s a rather subjective topic, and there’s no one answer to satisfy all cases. Each approach has advantages and disadvantages, which should help you decide which route to take for a particular application.

Differences in Scope

One of the most notable differences between middleware and view decorators is how much of the site is covered. Middleware is activated in a site’s settings.py, so it covers all requests that come in on any URL. This simple fact provides a few advantages:

  • Many operations—such as caching or compression—should naturally happen for every request on the site; middleware makes these tasks easy to implement.
  • Future additions to the site are automatically covered by existing middleware, without having to make any special allowances for the behavior they provide.
  • Third-party applications don’t need any modifications in order to take advantage of middleware behavior.

Decorators, on the other hand, are applied to individual functions, which means that every view must have decorators added manually. This makes decorators a bit more time-consuming to manage, but some operations—such as access restriction or specialized cache requirements—are more appropriate for limited parts of the site, where decorators can be used to great effect.

Configuration Options

Middleware classes are referenced as strings containing the import path to the class, which doesn’t allow any direct way to configure any of their features. Most middleware that accept options do so by way of custom settings that are specific to that middleware. This does provide a way to customize how the middleware works, but like middleware themselves, these settings are sitewide, by definition. There isn’t any room for customizing them for individual views.

As shown in Chapter 2, decorators can be written to accept configuration options when they’re applied to a function, and view decorators are no different. Each view could have a separate set of options or curry could be used to create a brand-new decorator with a set of preconfigured arguments.

Using Middleware As Decorators

Given the similarities between middleware and decorators, Django provides a utility to transform an existing middleware class into a decorator. This allows code to be reused across an entire site, using the best tool for the job in any situation.

Living at django.utils.decorators, the special decorator_from_middleware() function takes, as its only argument, a middleware class that should be applied to a single view. The return value is a perfectly functional decorator, which can be applied to any number of views.

Allowing Configuration Options

Since decorators can accept options to configure their behavior, we need a way for middleware classes to utilize this same flexibility. Providing an __init__() method on the middleware class that accepts additional arguments will allow a class to be written from the beginning to be used either as middleware or as a view decorator.

One thing to keep in mind is that middleware will be most commonly called without any arguments, so any additional arguments you define must use defaults. Failing to do so will result in a TypeError whenever it is used as a standard middleware and also with decorator_from_middleware() on its own, which doesn’t accept any arguments.

class MinimumResponseMiddleware(object):
    """
    Makes sure a response is at least a certain size
    """
 
    def __init__(self, min_length=1024):
        self.min_length = min_length

 
    def process_response(self, request, response):
        """
        Pads the response content to be at least as
        long as the length specified in __init__()
        """
        response.content = response.content.ljust(self.min_length)

When used as middleware, this class will pad all responses to be at least 1,024 characters in length. In order for individual views to receive specific values for this minimum length, we can instead turn to decorator_from_middleware_with_args(). That will accept arguments when decorating the view and pass those arguments into the __init__() method of the middleware class.

Also, be aware that if a middleware class is already defined as middleware and as a decorator, any views that use the decorator will actually be calling the middleware twice for every request. For some, such as those that set attributes on the request object, this won’t be an issue. For others—especially those that modify the outgoing response—this can cause a world of trouble.

HTTP-Related Signals

Since requests are spawned outside the control of any application code, signals are used to inform application code of the beginning and completion of all request/response cycles. Like all signals, these are simply Signal objects, and they live at django.core.signals. For more information on signals, how they work and how to use them, refer to Chapter 9.

django.core.signals.request_started

Whenever a request is received from the outside, this signal is fired without any additional parameters. It fires early in the process, even before the HttpRequest object has been created. Without any arguments, its uses are limited, but it does provide a way to notify applications when a request is received, before any middleware has a chance to get access to the request object.

One potential use for this would be as a way to register new listeners for other signals, which should only operate during requests coming in over HTTP. This is in contrast to situations where those other signals might get fired due to some non-HTTP event, such as a scheduled job or a command-line application.

django.core.signals.request_finished

Once the response has been generated by the view and middleware has been processed, this signal fires just prior to sending the response back to the client that sent the original request. Like request_started, it doesn’t provide any parameters to the listener, so its use is fairly limited, but it could be used as a way to disconnect any listeners that were attached when request_started fired.

django.core.signals.got_request_exception

If an exception occurs any time while processing a request but it isn’t handled explicitly somewhere else, Django fires the got_request_exception signal with just one parameter: the request object that was being processed.

This is different from the process_exception() method of middleware, which is only fired for errors that occur during execution of the view. Many other exceptions will fire this signal, such as problems during URL resolution or any of the other middleware methods.

Applied Techniques

By providing so many hooks into the protocol handling, Django makes possible a great variety of options for modifying HTTP traffic for an application. This is an area where each application will have its own needs, based on what type of traffic it receives and what type of interface it expects to provide. Therefore, take the following examples as more of an explanation of how to hook into Django’s HTTP handling, rather than an exhaustive list of what can be done to customize this behavior.

Signing Cookies Automatically

Django’s support for signed cookies is convenient, but it requires that you call separate methods for setting and retrieving cookies, in order to make sure the signature is applied and validated correctly. You can’t simply access the cookies attribute on the request without losing the security benefits of signatures. By using a custom middleware, however, it’s possible to do exactly that: add and verify signatures automatically, using the simple access methods normally reserved for unsigned cookies.

At a high level, there are a few tasks this middleware will be responsible for:

  • Signing cookies in outgoing requests
  • Verifying and removing cookies on incoming requests
  • Managing the salt and expiration options for those signatures

The first two tasks can be achieved fairly simply, by inspecting the request and response to look for cookies and calling the signed variations of the cookie methods to manage the signatures. Let’s start with setting response cookies.

Signing Outgoing Response Cookies

The middleware can start with the process_response() method, which will need to find any cookies that were set by the view and add signatures to their values.

class SignedCookiesMiddleware(object):

    def process_response(self, request, response):
        for (key, morsel) in response.cookies.items():
            response.set_signed_cookie(key, morsel.value,
                max_age=morsel['max-age'],
                expires=morsel['expires'],
                path=morsel['path'],
                domain=morsel['domain'],
                secure=morsel['secure']
            )
        return response

This approach uses all the attributes of the original cookie when setting the new one, so that it’s identical except for the method that’s used to set it. Using set_signed_cookie() will do all the appropriate things behind the scenes.

Deleted cookies show up in response.cookies as well, though, even though they don’t have a value and don’t need to be signed. These can be identified by their max-age of 0, which can be used to ignore them and only sign actual values that matter to the application.

class SignedCookiesMiddleware(object):
    def process_response(self, request, response):
        for (key, morsel) in response.cookies.items():
            if morsel['max-age'] == 0:
                # Deleted cookies don't need to be signed
                continue
            response.set_signed_cookie(key, morsel.value,
                max_age=morsel['max-age'],
                expires=morsel['expires'],
                path=morsel['path'],
                domain=morsel['domain'],
                secure=morsel['secure']
            )
        return response

Validating Incoming Request Cookies

Working with incoming requests is also fairly simple. The process_request() method is our entry point for this part of the process, and it merely has to find all the incoming cookies and use get_signed_cookie() to check the signatures and remove those signatures from the values.

class SignedCookiesMiddlewar
e(object):
    def process_request(self, request):
        for key in request.COOKIES:
            request.COOKIES[key] = request.get_signed_cookie(key)

Reading cookies is simpler than writing them, because we don’t have to deal with all the individual parameters; they’re already part of the cookies themselves. This code still has one problem, though. If any signature is missing, invalid or expired, get_signed_cookie() will raise an exception, and we’ll need to handle that in some way.

One option is to simply let the errors go through, hoping they’ll get caught in some other code, but because your views and other middleware won’t even know that this middleware is signing cookies, they aren’t likely to deal with signature exceptions. Worse, if you don’t have code that handles these exceptions, they’ll work their way all the way out to your users, typically in the form of an HTTP 500 error, which doesn’t explain the situation at all.

Instead, this middleware can handle the exceptions directly. Since only values with valid signatures can be passed to your views, an obvious approach is to simply remove any invalid cookies from the request altogether. The exceptions go away, along with the cookies that generated those exceptions. Your views will just see the valid cookies, just like they’re expecting, and any invalid cookies won’t exist in the request anymore. Users can clear their cookies at any time, so views that rely on cookies should always handle requests with missing cookies anyway, so this approach fits well with what views will already be doing.

Supporting this behavior requires nothing more than catching the relevant exceptions and removing those cookies responsible for raising them.

from django.core.signing import BadSignature, SignatureExpired
 
class SignedCookiesMiddleware(object):
    def process_request(self, request):
        for (key, signed_value) in request.COOKIES.items():
            try:
                request.COOKIES[key] = request.get_signed_cookie(key)
            except (BadSignature, SignatureExpired):
                # Invalid cookies should behave as if they were never sent
                del request.COOKIES[key]

Signing Cookies As a Decorator

So far, SignedCookiesMiddleware hasn’t used any of the signature-specific options when setting and retrieving signed cookies. The defaults are often good enough for a middleware that’s meant for use on the whole site. Since middleware can also be used as decorators, though, we also need to account for customizations to individual views. That’s where the salt and expiration settings become useful.

As shown earlier in the chapter, decorator_from_middleware() can supply arguments to the middleware’s __init__() method, so that will provide a path for customizing the salt and max_age arguments. Once accepting those arguments in __init__(), the individual hook methods can incorporate them as appropriate.

from django.core.signing import BadSignature, SignatureExpired
 
class SignedCookiesMiddleware(object):
    def __init__(self, salt='', max_age=None):
        self.salt = salt
        self.max_age = max_age
 
    def process_request(self, request):
        for (key, signed_value) in request.COOKIES.items():
            try:
                request.COOKIES[key] = request.get_signed_cookie(key,
                    salt=self.salt,
                    max_age=self.max_age)
            except (BadSignature, SignatureExpired):
                # Invalid cookies should behave as if they were never sent
                del request.COOKIES[key]
 
    def process_response(self, request, response):
        for (key, morsel) in response.cookies.items():
            if morsel['max-age'] == 0:
                # Deleted cookies don't need to be signed
                continue
            response.set_signed_cookie(key, morsel.value,
                salt=self.salt
                max_age=self.max_age or morsel['max-age'],
                expires=morsel['expires'],
                path=morsel['path'],
                domain=morsel['domain'],
                secure=morsel['secure']
            )
        return response

Now you can create a decorator using decorator_from_middleware_with_args() and supply salt and max_age arguments to customize that decorator’s behavior for each individual view.

from django.utils.decorators import decorator_from_middleware_with_args
signed_cookies = decorator_from_middleware_with_args(SignedCookiesMiddleware)
 
@signed_cookies(salt='foo')
def foo(request, ...):
    ...

Now What?

The request and response cycle is the primary interface Django applications use to communicate with the outside world. Just as important is the collection of utilities available behind the scenes that allow applications to perform their most fundamental tasks.

1 http://prodjango.com/pep-333/

2 http://prodjango.com/http-methods/

3 http://prodjango.com/query-string/

4 http://prodjango.com/http-headers/

5 http://prodjango.com/http-status-codes/

6 http://prodjango.com/cookie-spec/

7 http://prodjango.com/r/cookie-module/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset