CHAPTER 9

image

Common Tools

While Django aims to provide a foundation for you to build your own Web application, the framework has its own underpinnings that tie it all together. These common tools and features help everything remain consistent and easier to maintain, and those same benefits can be used by your own applications. After all, what’s available in Django is available for anything that uses it.

Core Exceptions (django.core.exceptions)

While Python comes with its own set of exceptions that can be raised in various situations, Django introduces enough complexity on top of that to merit some more. Since Django serves a specialty audience, these exceptions are considerably more specialized, but they’re still usable by more than just core code. Some of these exceptions have been mentioned previously because they deal more specifically with a particular Django feature, but they’re also useful in other situations, as the following sections will explain.

ImproperlyConfigured

This is one of the first exceptions most new users run into because it’s the one raised when an application’s models aren’t set up correctly, a view can’t be found or a number of other common configuration mistakes occur. It’s typically raised during execution of manage.py validation and helps users identify and correct whatever mistakes were discovered.

Not all applications require any particular configuration, but those that do can make good use of this exception, since most users have seen it before. Common situations where this can be useful include missing or incorrect settings, a URL configuration used without an accompanying INSTALLED_APPS entry, invalid arguments given to custom model fields, and missing a required third-party library.

The most important thing to remember is to indicate not only that something went wrong, but also how the user should go about fixing it. Typically, exceptions indicate that some bit of code ran awry, and there’s little to no way of informing a user how to fix it. With an application’s configuration, however, there are a finite number of acceptable ways to set it up, and this error should be used as a way to steer users in the right direction.

For example, if an application is designed to work with audio files, it might require the presence of Mutagen,1 a well-established Python library for extracting information from such files. A simple import of this library at the top of the models.py, where it’s likely to be used, could identify if the library is installed correctly, and instruct the user how to proceed if not.

from django.core.exceptions import ImproperlyConfigured
 
try:
    import mutagen
except ImportError:
    raise ImproperlyConfigured("This application requires the Mutagen library.")

MiddlewareNotUsed

Chapter 7 described how middleware can be used to adjust how HTTP is handled, but an interesting side effect is that not all middleware is always useful. While each project has the option of setting just those middleware that are necessary by way of the MIDDLEWARE_CLASSES setting, there are still differences between development and production or among the various developers’ computers.

Each middleware has the ability to decide whether its environment is suitable to be used and indicate if there’s a problem. Middleware classes are instantiated automatically when first needed, at the beginning of the first request, which is where this check would take place. By overriding the class’s __init__() method, middleware can check right away whether everything’s set up to work properly and react accordingly.

Specifically, this reaction is to either return without doing anything if everything looks fine or raise MiddlewareNotUsed. If raised, Django will always catch this exception and take it to mean that the class should be removed from the list of middleware that gets applied on every request.

This is an important distinction to make because without being able to tell Django to not use the middleware at all, it would be up to each individual method to decide whether it should execute. While that would work, it would take up valuable time and memory on every request, checking for something that could be determined just once. By taking the middleware out of the list entirely, it never consumes any additional cycles or memory at all.

MultipleObjectsReturned

When retrieving objects from the database, it’s often expected that exactly one row will be returned. This is always the case whenever the query is a primary key, but slugs—and perhaps even dates—can be made unique in certain applications. Django supports this situation with a QuerySet’s get() method, and if it matches more than one result, it can throw off the whole execution of the application.

image Note  Django’s SlugField is almost always set as unique=True because it’s used to identify objects in a URL.

Since get() is expected to return exactly one record from the database, a query matching multiple records is marked by an exception, MultipleObjectsReturned. It’s not raised for other types of queries, since multiple records are to be expected in most situations. Catching this exception can be useful in a number of ways, from displaying more useful error messages to removing unexpected duplicates.

ObjectDoesNotExist

The other side of the get() expectation is that one row will always be returned; that is, there must always be a row in order to succeed. If a query that expects a row to exist finds instead that no such rows are present, Django responds accordingly with ObjectDoesNotExist. It works in much the same way as MultipleObjectsReturned, differing only in the situation where it’s raised.

Simply called DoesNotExist, this subclass avoids an extra import because the class it’s used on is typically already imported when the get() method is called. In addition, by being called DoesNotExist and being an attribute of a model class, it looks like perfectly readable English: Article.DoesNotExist.

PermissionDenied

Most applications will have some form of permissions in place to prevent access to restricted resources; this follows the pattern of a rule with exceptions. The rule is that the user attempting to access the resource will indeed have the correct permissions, so any user that doesn’t will result in an exception—this time PermissionDenied. This serves as a convenient way to indicate the problem and stop processing the rest of the view, since the view itself could make changes that aren’t valid if the user doesn’t have the correct permissions.

Django also catches this exception automatically inside its request handler, using it as an instruction to return an HTTP 403 Forbidden response instead of the usual 200 OK. This will indicate to the client that the credentials provided didn’t have sufficient permission to request the resource and that the user shouldn’t try again without rectifying the situation. This behavior is provided by default in Django’s own admin application but can also be used in any other.

Like other exceptions, PermissionDenied can be either raised or caught, though the default behavior of returning a special HTTP response code is appropriate most of the time. If some other behavior is desired, it’s easy enough to create a middleware that catches this exception in the process_view() phase, possibly redirecting users to a form where they can contact the site administrators to request permission to access the page.

from django.core.exceptions import PermissionDenied
from django.http import HttpResponseRedirect
from django.core.urlresolvers import reverse
 
class PermissionRedirectMiddleware(object):
    def __init__(self, view='request_permission', args=None, kwargs=None):
        self.view = view
        self.args = args or ()
        self.kwargs = kwargs or {}
 
    def process_view(self, request, view, args, kwargs):
        try:
            response = view(request, *args, **kwargs)
        except PermissionDenied:
            url = reverse(self.view, args=self.args, kwargs=self.kwargs)
            return HttpResponseRedirect(url)

Adding a reference to this in MIDDLEWARE_CLASSES or creating a decorator out of it using decorator_from_middleware() as described in Chapter 7 is all that’s necessary to redirect users to another page when their permissions weren’t valid for the original request. Even without a custom handler for this exception, though, it’s quite useful to raise it in any of your own views where a user doesn’t satisfy the appropriate permissions. That response will then result in whatever handling is used for all other similar situations, helping make your site as cohesive and consistent as possible.

SuspiciousOperation

While users typically obey the rules and use your site the way it’s expected to be used, any reasonable developer prepares for those who don’t. Django takes a number of precautions to protect against unauthorized access to things like the administration interface and provides decorators to restrict access to application views, but there are still more subtle things to take into account.

For instance, the sessions framework needs to worry about users altering the session ID in an attempt to hijack another user’s session. These types of things don’t fall under authentication or permissions themselves, but rather a user is attempting to circumvent these usual protections. It’s important to identify when this occurs, so it can be dealt with appropriately.

To identify these across the board, Django provides a SuspiciousOperation exception that can be used any time something like this happens. In many situations, this is thrown and caught in the same application but is provided so that it’s possible to reach into the application and use just the portion that raises the exception. In other cases, it’s left exposed to other applications to handle in whatever way makes the most sense.

The signed cookies application from Chapter 7 is a good example of where suspicious activity can be easily identified and handled. If a cookie comes in without a valid signature, it’s clear that something fishy is going on and the signature validation code raises a SuspiciousOperation to signify it. Since it’s designed to work as a hands-free middleware, it also provides code to catch this exception and perform a more useful function by removing the offending cookie from the request before it reaches the view. But since it’s possible for other applications to sign and validate values outside the middleware, it’s useful to raise an exception that accurately identifies what’s going on.

ValidationError

Models and forms can both validate data before processing them further, and when that data is invalid, Django raises a ValidationError. This can be useful any time you have data to be validated, though, even outside those contexts. If you have an app that processes JSON data, for example, you may want to provide validation different from how models and forms work, and you may also want to validate the entire object, in addition to individual fields. You can maintain consistency with Django by reusing the same ValidationError that’s used in other areas.

When instantiating a ValidationError, you can pass in a few different kind of objects, typically referring to the data that’s invalid. Typically, you would pass in a printable object, such as a string or something that can be coerced to a string using the __str__() method. You can also pass in a list of such objects, or a dictionary where both the keys and values are printable, which allows you to combine several errors into a single exception. When printing the ValidationError in these cases, its internal code will automatically perform the necessary coercion to ensure you get strings.

image Note  The special handling for lists of dictionaries is limited to only lists and dictionaries. Other types of sequences and mappings will be treated as a standard printable object, without looking into it for individual values. Also, it will only look into the first level of data, so if you nest data within a list, for example, Django will only pull values out of the outer list; any inner lists will simply be coerced to strings.

ViewDoesNotExist

When resolving URLs, it’s quite possible for an incoming URL to match a pattern in the URL configuration, but not match any known views. This could be for a variety of reasons, including a truly missing view, but it’s also often due to an error that causes the view not to be loaded properly. After all, Django can only identify a proper view if Python can parse it and load it as a function. When any of these situations occur, Django raises ViewDoesNotExist to indicate, as best it can, what went wrong.

There’s typically no need to manually catch this error or do anything special with it, since Django handles it as best as can be reasonably expected. In development, with DEBUG=True, it displays a useful error page with details on which view was attempted and a Python error message indicating why it couldn’t be loaded. In production, that level of detail is unsafe, so it falls back to a standard HTTP 500 error, notifying the administrators behind the scenes.

Text Modification (django.utils.text)

At its core, the Web is a written medium, using text to convey the vast majority of ideas. Typically, this text is supplied as a combination of templates and database content, but it often needs a bit of massaging before it can be sent to users. It might have to be capitalized for use in a title, line-wrapped for use in an email or otherwise altered.

compress_string(s)

This simple utility compresses the input string using the gzip format. This allows you transfer content in a format that browsers are able to decompress on the other end.

>>> from django.utils.text import compress_string
>>> compress_string('foo')
'x1fx8bx08x00s={Qx02xffKxcbxcfx07x00!esx8cx03x00x00x00'

Clearly this example doesn’t look very compressed, but that’s merely an artifact of how compression works with small strings. The headers and bookkeeping necessary for the compression algorithm are enough to make the string longer in these cases. When you supply a longer string, such as a file or a rendered template, you’ll see a much smaller string in the output of this function.

image Note  If you’re using this to send content to a browser, you’ll also need to send a header to tell the browser how to handle it.

Content-Encoding: gzip

compress_sequence(sequence)

This works much like compress_string(), but it will compress individual items in the provided sequence. Rather than simply returning a string of all the compressed content, compress_sequence() is actually a generator, yielding content piece by piece. The first item in the output will be the gzip header, followed by compressed versions of each of the input strings and finally the gzip footer.

>>> for x in text.compress_sequence(['foo', 'bar', 'baz']):
...     print repr(x)
...
'x1fx8bx08x00x16={Qx02xff'
'Jxcbxcfx07x00x00x00xffxff'
'JJ,x02x00x00x00xffxff'
'JJxacx02x00x00x00xffxff'
"x03x00xaa'xx1a x00x00x00"

get_text_list(items, last_word='or')

There are a number of ways to present a list of items to users, each appropriate for different situations. Rather than listing each item on its own line, it’s often useful to display the list in plain English as a comma-separated list, such as, “red, blue and green.” This may seem like a daunting task, but get_text_list() simplifies it considerably. Simply pass in a list of items as the first argument and an optional conjunction to be used as the second argument, and it returns a string containing the items separated by a comma and the conjunction at the end.

>>> from django.utils.text import get_text_list
>>> 'You can use Python %s' % get_text_list([1, 2, 3])
u'You can use Python 1, 2 or 3'
>>> get_text_list(['me', 'myself', 'I'], 'and')
u'me, myself and I'

javascript_quote(s, quote_double_quotes=False)

When writing strings out to JavaScript, whether in source code or in a response code in JavaScript Object Notation (JSON),2 there are certain considerations that have to be taken into account for special characters. This function properly escapes these special characters, including Unicode characters, in a way that JavaScript can understand.

>>> from django.utils.text import javascript_quote
>>> javascript_quote('test ing')
'test\ningx00'

normalize_newlines(text)

When an application needs to work with text content coming from unknown sources, it’s quite possible that input will be generated on a combination of Windows, Apple and Unix-style systems. These different platforms have different standards for what characters they use to encode line-endings, which can cause problems when the application needs to do any text processing on them. Given input like this, normalize_newlines() looks for the common line-ending alternatives and converts them all to the Unix-style that Python expects.

>>> from django.utils.text import normalize_newlines
>>> normalize_newlines(u'Line one Line two Line three Line four')
u'Line one Line two Line three Line four'

phone2numeric(phone)

Businesses often offer phone numbers as words to make them easier to remember. If phone numbers like that are offered as input to an application, they’re typically only useful as-is if they’re only ever displayed directly to users. If the application ever has to use those numbers as part of an automated system or show them to employees who make calls on a regular basis, it’s more useful to work with them as raw numbers instead of marketing text. By passing phone numbers through phone2numeric(), you can be sure that you’ll always get a real phone number to work with.

>>> from django.utils.text import phone2numeric
>>> phone2numeric(u'555-CODE')
u'555-2633'

recapitalize(text)

Given a string that may have already been converted to lowercase, perhaps for search or other comparison, it’s usually necessary to convert it back to regular mixed case before displaying it to users. The recapitalize() function does this, capitalizing letters that follow sentence-ending punctuation, such as periods and question marks.

>>> from django.utils.text import recapitalize
>>> recapitalize(u'does this really work? of course it does.')
u'Does this really work? Of course it does.'

image Caution  Although Django provides many features for international audiences, the recapitalize() function only works for basic English text. Punctuation used in other languages may not be properly identified, causing the capitalized output to be incorrect.

slugify(value)

Slugs are a certain kind of string that’s suitable for use in a URL, and often are a somewhat stripped down version of the article’s title. Slugs consist of lower-case letters, hyphens instead of spaces, and a lack of punctuation and other non-word characters. The slugify() function takes a text value and performs the necessary transformations to make it suitable for use as a URL slug.

>>> from django.utils.text import slugify
>>> slugify(u'How does it work?')
u'how-does-it-work'

smart_split(text)

Originally developed as a way to parse template tag arguments, smart_split() takes a string and breaks it apart at spaces, while still leaving quoted passages intact. This is a good way to parse arguments for any other application, as it allows a great deal of flexibility. It recognizes both single and double quotes, safely handles escaped quotes and also leaves the quotes intact at the beginning and end of any quoted passages it comes across.

>>> from django.utils.text import smart_split
>>> for arg in smart_split('arg1 arg2 arg3'):
...     print arg
arg1
arg2
arg3
>>> for arg in smart_split('arg1 "arg2's longer" arg3'):
...     print arg
arg1
"arg2's longer"
arg3

unescape_entities(text)

HTML can contain entities that make it easier to represent certain international characters and other special glyphs that are difficult to type on most English keyboards or transfer using native English character encodings. These are useful when editing HTML by hand, but if you’re using a broad text encoding like UTF-8, you can send the raw characters over the wire instead of relying on the browsers to convert them after the fact. By passing your string into this function, any HTML entities will be converted to the appropriate Unicode codepoints.

>>> from django.utils.text import unescape_entities
>>> unescape_entities('“Curly quotes!”')
u'u201cCurly quotes!u201d'

unescape_string_literal(s)

When writing a string that contains apostrophes or quotation marks, you often need to escape those characters by placing a backslash before them to avoid having them accidentally used to terminate the string. Because the use of a backslash for this purpose, you also need to escape any literal backslashes you want to include in the string.

Ordinarily, Python will interpret these directly and provide you with a string that has the raw characters in it, without the extra backslashes. In some cases, such as in templates, the strings aren’t processed directly by Python, but are instead passed into your code as strings with the backslashes included. You can use unescape_string_literal() to get an equivalent string to what Python would normally provide.

>>> from django.utils.text import unescape_string_literal
>>> unescape_string_literal("'string'")
'string'
>>> unescape_string_literal(''string'')
'string'

wrap(text, width)

This takes the specified text and inserts newline characters as necessary to make sure that no line exceeds the width provided. It makes sure not to break up words, and also leaves existing newlines characters intact. It expects all newline characters to be Unix-style, though, so it’s best to run the text through normalize_newlines() first if you are not controlling the source of the text to be sure it works properly.

>>> from django.utils.text import wrap
>>> text = """
... This is a long section of text, destined to be broken apart.
... It is only a test.
... """
>>> print wrap(text, 35)
This is a long section of text,
destinedto be broken apart.
It is only a test.

Truncating Text

Another common need is to truncate text to fit into a smaller space. Whether you’re limiting the number of words or characters, and whether you need to take HTML tags into account when truncating, Django has a Truncator class that can do the job. You can instantiate it by simply passing in the text you’d like to truncate.

>>> from django.utils.text import Truncator

For the sake of this example, we have to first configure Django not to use its internationalization system. If you’re using manage.py shell, this will already be done for you, but if you’re just using Python outside of a project, you’ll need to configure this. In a real application, you won’t need to perform this step.

>>> from django.conf import settings
>>> settings.configure(USE_I18N=False)

Now we have an environment capable of working with text transformations like this.

>>> truncate = Truncator('This is short, but you get the idea.')

From there, the actual operations are provided by any of the available methods.

Truncator.chars(num, truncate='…')

This method limits the text to contain no more than the number provided, without regard to any words or sentences in the original text.

>>> truncate.chars(20)
u'This is short, bu...'

image Note  The resulting string is 20 characters long, including the ellipsis. As you’ll see, the truncate argument can change how many characters are used for the end of the string, and chars() will take that into account when deciding how much of the string to leave intact. Different settings for the truncate value will change how much of the original string remains. This behavior is unique to the chars() method.

The truncate argument specifies how the resulting string should be formatted. By default, it appends three periods after it, which will function as an ellipsis. You can supply any other string and it will be appended to the truncated string instead of the periods.

>>> truncate.chars(20, truncate='--')
u'This is short, but--'

You can also control the text output with more flexibility by specifying a format string, using a placeholder named truncated_text. This allows you place the truncated text anywhere within the string.

>>> truncate.charts(20, truncate='> %(truncated_text)s...')
u'> This is short, ...'

Truncator.words(num, truncate='…', html=False)

This method limits the length of the string to a specified number of words, rather than individual characters. This is usually preferable, as it avoids breaking in the middle of a word. Because words can be of varying lengths, the resulting string is less predictable than when using chars(), though.

>>> truncate.words(5)
u'This is short, but you...'
>>> truncate.words(4, truncate='--')
u'This is short, but--'

Also notice that the truncate argument no longer alters how the string gets truncated. Your text will be reduced to the specified number of words, and the truncate argument will be applied after that.

The html argument controls whether the method should avoid counting HTML attributes as separate words, because they’re separated by spaces. For normal text, the default of False is preferable, as it has less work to do, but if you’re outputting a string that may have HTML tags in it, you’ll want to use True instead.

>>> truncate = Truncator('This is <em class="word">short</em>, but you get the idea.')
>>> truncate.words(4)
u'This is <em class="word">short</em>,...'
>>> truncate.words(4, html=True)
u'This is <em class="word">short</em>, but...'
>>> truncate.words(3)
u'This is <em...'
>>> truncate.words(3, html=True)
u'This is <em class="word">short</em>,...'

Another advantage of using html=True is that it takes care to close tags that would otherwise be left open when the string is truncated.

>>> truncate = Truncator('This is short, <em>but you get the idea</em>.')
>>> truncate.words(5)
u'This is short, <em>but you...'
>>> truncate.words(5, html=True)
u'This is short, <em>but you...</em>'

Data Structures (django.utils.datastructures)

When working with any complex system, it’s often necessary to work with data in a very specific structure. This might be a sequential list of items, a mapping of keys to values, a hierarchical tree of categories, any combination of those or something else entirely. While Django doesn’t pretend to provide objects for every arrangement of data an application might need, there are a few specific things that the framework itself requires, and these are made available to all applications based on it as well.

DictWrapper

This is a good example of a data structure designed for a fairly specific purpose that might have other uses in the real world. The goal of this particular type of dictionary is to provide a way to transform values on retrieval, if the requested key matches a basic criterion.

When instantiating the dictionary, you can supply a function and a prefix string. Any time you request a key that begins with that prefix, the DictWrapper will strip off the prefix and call the supplied function on the associated value before returning it. Other than that, it works just like a standard dictionary.

>>> from django.utils.datastructures import DictWrapper
>>> def modify(value):
...     return 'Transformed %s' % value
>>> d = DictWrapper({'foo': 'bar'}, modify, 'transformed_')
>>> d['foo']
'bar'
>>> d['transformed_foo']
'Transformed: bar'

ImmutableList

The difference between a list and a tuple is typically described in terms of their contents. Lists can contain any number of objects, all of which should be of the same type, so that you can iterate over them and process each item just like all the others. Essentially, a list is a collection of values.

A tuple, on the other hand, is a whole value on its own, where each item within it has a specific meaning, indicated by its position. Any particular type of tuple would have the same number of values within it. For example, a three-dimensional point in space might be represented by a 3-item tuple, containing x, y and z coordinates. Every such point would have those same three values, and always in the same positions.

A key technical distinction between the two is that tuples are immutable. In order to change a tuple, you actually need to create a new tuple with the changed values. That immutability can be a useful safety net to ensure that the sequence doesn’t change out from under you, and it’s also a slight performance boost because tuples are simpler data structures. They’re not intended to be used as an immutable list, though.

For those situations where you have the semantics of a list, but also want the benefits of immutability, Django provides an alternative: the ImmutableList. It’s a subclass of tuple, but it also contains all the mutable methods available on lists. The only difference is that those methods each raise an AttributeError, rather than alter the value. It’s a subtle distinction, but it does give you the opportunity to take advantage of tuples, while still using the semantics of a list.

MergeDict

When multiple dictionaries need to be accessed together, the typical approach is to create a new dictionary that contains all the keys and values of those dictionaries together. This works well for simple applications, but it may well be necessary to maintain the mutability of the underlying dictionaries so that changes to them are reflected in the combined dictionary. The following shows how that breaks down with standard dictionaries.

>>> dict_one = {'a': 1, 'b': 2, 'c': 3}
>>> dict_two = {'c': 4, 'd': 5, 'e': 6}
>>> combined = dict(dict_one, **dict_two)
>>> combined['a'], combined['c'], combined['e']
(1, 4, 6)
>>> dict_one['a'] = 42
>>> combined['a']
1

This illustrates a simple approach at combining dictionaries, using the fact that dict() can accept both a dictionary and keyword arguments, combining them into a new dictionary. Thanks to the ** syntax described in detail in Chapter 2, this makes it a convenient way to achieve the desired result, but the example also shows where it starts to fail.

First, it only accepts two dictionaries; adding more would require calling dict() more than once, adding a new dictionary each time. Perhaps more importantly, updates to the source dictionaries don’t get reflected in the combined structure. To be clear, this is ordinarily a good thing, but in cases like request.REQUEST, which combines request.GET and request.POST, changes made to the underlying dictionaries should also be revealed in the combined output.

To facilitate all of this, Django uses its own class that acts like a dictionary in many respects, but transparently accesses multiple dictionaries behind the scenes. There’s no limit to the number of dictionaries that can be accessed this way. Simply supply as many dictionaries as needed when instantiating the object, and they’ll be accessed in the order they’re provided. Since it stores references to the real dictionaries and accesses them instead of creating a new one, modifications to the underlying dictionaries are reflected in the composite.

>>> from django.utils.datastructures import MergeDict
>>> dict_one = {'a': 1, 'b': 2, 'c': 3}
>>> dict_two = {'c': 4, 'd': 5, 'e': 6}
>>> combined = MergeDict(dict_one, dict_two)
>>> combined['a'], combined['c'], combined['e']
(1, 3, 6)
>>> dict_one['a'] = 42
>>> combined['a']
42

Since keys are checked in the internal dictionaries in the same order they were passed in to MergeDict, combined['c'] is 3 in the second example, while it was 4 in the first one.

MultiValueDict

On another extreme, it’s sometimes useful to have each key in a dictionary potentially reference more than one value. Since Web browsers send data to the server as a series of name/value pairs, without any more formal structure, it’s possible for a single name to be sent multiple times, probably with a different value each time. Dictionaries are designed to map one name to only one value, so this presents a challenge.

On the surface, it seems like the solution is simple: just store a list of values under each key. Digging a bit deeper, one problem is that the vast majority of applications only use one value for each key, so always using a list would make more work for everybody. Instead, the majority case should be able to use a single key to access a single value, while still allowing all the values to be accessed for those applications that need them.

Django uses MultiValueDict to handle this case, basing its default behavior on what most other frameworks do in this situation. By default, accessing a key in a MultiValueDict returns the last value that was submitted with that name. If all the values are required, a separate getlist() method is available to return the full list, even if it only contains one item.

>>> from django.utils.datastructures import MultiValueDict
>>> d = MultiValueDict({'a': ['1', '2', '3'], 'b': ['4'], 'c': ['5', '6']})
>>> d['a'], d['b'], d['c']
('3', '4', '6')
>>> d.getlist('a')
['1', '2', '3']
>>> d.getlist('b')
['4']
>>> d.getlist('c')
['5', '6']

image Caution  This doesn’t automatically coerce each value to a list. If you pass in a single item for any of the values, that value will be returned as expected, but getlist() will return the original value as it was passed in. That means getlist() will return the single item only, not a list containing a single item.

>>> d = MultiValueDict({'e': '7'})
>>> d['e']
'7'
>>> d.getlist('e')
'7'

SortedDict

One of the more obscure features of Python dictionaries is that they’re technically unsorted. Inspecting a variety of dictionaries may seem to yield some patterns, but they can’t be relied on, as they will differ between Python implementations. This can be quite a stumbling block at times because it’s easy to accidentally rely on the implicit ordering of dictionaries, only to find it change out from under you when you least expect.

It’s quite common to need a reliably ordered dictionary, so that both Python code and templates can know what to expect when they encounter a dictionary. In Django, this feature is provided by the SortedDict, which keeps track of the order its keys were added to the dictionary. The first step in utilizing this functionality is to pass in an ordered sequence of key/value pairs. This order is then preserved, as well as the order that any subsequent keys are given new values.

>>> from django.utils.datastructures import SortedDict
>>> d = SortedDict([('c', '1'), ('d', '3'), ('a', '2')])
>>> d.keys()
['c', 'd', 'a']
>>> d.values()
['1', '3', '2']
>>> d['b'] = '4'
>>> d.items()
[('c', '1'), ('d', '3'), ('a', '2'), ('b', '4')]

Functional Utilities (django.utils.functional)

Python treats functions as first-class objects. They have certain attributes and methods associated with them that are obviously different from other objects, but the core language treats them just like any other object. This handling allows for some very interesting uses of functions, such as setting attributes at run-time and assembling functions in a list, to be executed in order.

cached_property(func)

A property is one of the simplest kinds of descriptors because the common case simply calls a method when the attribute is accessed. This can be useful for ensuring that its value is always up to date if it relies on other attributes or external factors. Each time you access the attribute, the method is called and a new value is produced.

>>> class Foo(object):
...     @property
...     def bar(self):
...         print('Called the method!')
...         return 'baz'
...
>>> f = Foo()
>>> f.bar
Called the method!
'baz'
>>> f.bar
Called the method!
'baz'

Sometimes, though, you have a value that doesn’t change but can be expensive to produce. You don’t want to generate the value if you don’t need to, but you also don’t want to produce it more than once. To address this situation, you can use the @cached_property decorator. Applying this to a method will cause the method to be called the first time the attribute is accessed, but it will store the result on the object, so that every subsequent access will just get the stored value, rather than calling the method again.

>>> from django.utils.functional import cached_property
>>> class Foo(object):
...     @cached_property
...     def bar(self):
...         print('Called the method!')
...         return 'baz'
...
>>> f = Foo()
>>> f.bar
Called the method!
'baz'
>>> f.bar
'baz'

curry(func)

It’s often necessary to take a function with a complex set of arguments and simplify it so that code that calls it doesn’t always need to supply all the arguments. The most obvious way to do this is by providing default values wherever possible, as described in Chapter 2. In many situations, though, there isn’t a sensible default at the time the function is written or the default value might not be suitable to the needs of the situation. Normally, you can just call the function with whatever argument values you need, which works just fine for most needs.

Sometimes, though, the function’s arguments are determined at a different time than when it actually needs to be called. For instance, it’s quite common to pass a function around so it can be used later, whether as an instance method or a callback, or even a module-level function. When using a function that accepts more arguments than will be provided later, the remaining arguments must be specified in advance.

Since Python 2.5, this functionality is provided in the standard library, by way of the functools.partial function. While being bundled with Python is convenient, it’s only useful for subsequent installations, while Django supports versions of Python that have been around far longer. Instead, Django provides its own implementation at django.utils.functional.curry.

The first argument to curry is always a callable, which won’t be called right away, but will be tucked away to be used later. Beyond that, all positional and keyword arguments are saved as well, and will be applied to the supplied callable when the time comes. The return value is then a new function that, when called, will execute the original callable with both the original arguments and any arguments that were provided in the call that came later.

>>> from django.utils.functional import curry
>>> def normalize_value(value, max_value, factor=1, comment='Original'):
...     """
...     Normalizes the given value according to the provided maximum,
...     scaling it according to factor.
...     """
...     return '%s (%s)' % (float(value) / max_value * factor, comment)
>>> normalize_value(3, 4)
'0.75 (Original)'
>>> normalize_value(3, 4, factor=2, comment='Double')
'1.5 (Double)'
>>> percent = curry(normalize_value, max_value=100, comment='Percent')
>>> percent(50)
'0.5 (Percent)'
>>> percent(50, factor=2, comment='Double')
'1.0 (Double)'
>>> tripled = curry(normalize_value, factor=3, comment='Triple')
>>> tripled(3, 4)
'2.25 (Triple)'

lazy(func, *resultclasses)

Some values can be represented differently depending on their environment. A common example is translatable text, where the internal value is typically in English, but it can be represented using a different language selected by a user. Objects with behavior like this are considered lazy, because they’re not populated right away, but later, when necessary.

You can create a lazy object using this lazy() function. The primary argument it accepts is a function that can produce the eventual value. That function won’t be called right away, but will simply be stored away inside a Promise object. The promise can then be passed around throughout framework code like Django, which doesn’t care what the object is, until it finally reaches code that does care about the object. When attempting to access the promise, the function will be called and the value returned. In fact, the function will be called every time the object is accessed, each time with the chance to use the environment to alter the returned value.

The interesting part of this process is how the promise determines whether it’s being accessed. When simply passing around an object, the object itself has no access to what code keeps a reference to it. It can, however, react when its attributes are accessed. So when your code tries to access the attributes of a promise, that becomes a cue to generate a new representation of the promised value.

The remaining arguments to the lazy() function help with this part of the process. The resultclasses you specify should contain all the different types of objects that your function can return. Each of these classes has a set of attributes and methods on it, which the promise can then listen for. When any one of them is accessed, the promise will call its stored function to return a new value, then return the attribute on that value that was originally requested.

This can be particularly difficult to understand without an example. Translations are a common example, but another useful case is when working with dates and times. Specifically, social networks will often display the date and time of a particular event in terms of how long ago the event occurred, rather than as an absolute date. Django has a utility available for calculating this immediately, but you could also use it to create a lazy object. Then, every time it’s displayed, your code can calculate the time difference on demand.

Like we saw earlier, this example requires us to first configure Django not to use its internationalization system, if you’re not using manage.py shell.

>>> from django.conf import settings
>>> settings.configure(USE_I18N=False)

Now the system is configured to use the timesince() function. Located in django.utils.timesince, you can simply pass in a date or datetime object and it will return a string containing a human-readable representation of the duration between now and the date you passed in.

>>> import datetime
>>> from django.utils.timesince import timesince
>>> then = datetime.datetime.now() - datetime.timedelta(minutes=1)
>>> since = timesince(then)
>>> since
u'1 minute'
>>> print(since)
1 minute

That’s how it normally works, returning the duration immediately. Then you’re left with a string that was only valid when the function was called. A lazy object will work like a string when it needs to, but will evaluate the function whenever it needs to yield a value.

>>> from django.utils.functional import lazy
>>> lazy_since = lazy(timesince, str)(then)
>>> lazy_since
<django.utils.functional.__proxy__ at 0x...>
>>> print(lazy_since)
1 minute
 
# Wait a few minutes...
 
>>> print(lazy_since)
5 minutes

allow_lazy(func, *resultclasses)

This decorator provides another way to work with lazy options like those described in the preceding section. Most functions operate on actual objects, without knowing anything about the deferred loading behavior of lazy objects, and will just access the object’s attributes directly. If you provide a lazy object to such a function, it will immediately trigger the value, which may not be very useful if the function simply transforms the value.

>>> def bold(value):
...     return u'<b>%s</b>' % value
...
>>> bold(lazy_since)

u’<b>10 minutes</b>‘It’d be better if the new function call could be lazy as well, and even better still if you could do that without changing the function’s code. That’s where allow_lazy() comes into play. You can apply this to any function, so that when you call the function, it will check to see if any of the incoming arguments are lazy. If any of them are in fact lazy objects, the wrapper will step in and return a new lazy object backed by the original function. Otherwise, the original function will immediately run on the non-lazy arguments provided.

>>> from django.utils.functional import allow_lazy
>>> lazy_bold = allow_lazy(bold, str)
>>> lazy_bold(lazy_since)
<django.utils.functional.__proxy___ at 0x...>
>>> lazy_bold(since)
u'<b>1 minute</b>'
>>> print lazy_bold(lazy_since)
u'<b>2 minutes</b>

lazy_property(fget=None, fset=None, fdel=None)

Properties are a very useful way to wrap custom behavior around simple attribute access. For example, you could use a property to generate attribute values on demand or update related information when an attribute’s value is changed. One potential problem with them, though, is that they wrap specific functions when they’re first added to a class. Subclasses can inherit each property’s behavior, but it will always use the functions that were provided to the original decorator. The only way the subclass can override a property’s behavior is to create a whole new property, completely replacing every aspect of the property.

>>> class Foo(object):
...     def _get_prop(self):
...         return 'foo'
...     prop = property(_get_prop)
...
>>> class Bar(Foo):
...     def _get_prop(self):
...         return 'bar'
...
>>> Foo().prop
'foo'
>>> Bar().prop
'foo'

In order to allow a subclass to more easily override specific property behavior, you can create your property using the lazy_property() function. This will automatically look at whichever subclass is accessing the property and use any overridden functions you’ve added, falling back to the original functions otherwise.

>>> from django.utils.functional import lazy_property
>>> class Foo(object):
...     def _get_prop(self):
...         return 'foo'
...     prop = lazy_property(_get_prop)
...
>>> class Bar(Foo):
...     def _get_prop(self):
...         return 'bar'
...
>>> Foo().prop
'foo'
>>> Bar().prop
'bar'

memoize(func, cache, num_args)

When working with a lot of information, it’s often necessary for functions to make certain basic calculations where the only true variables—that is, values that change from one call to the next—are the arguments that are passed in. To reuse a term mentioned in Chapter 7, this behavior makes the function idempotent; given the same arguments, the result will be the same, regardless of how many times the function is called. This is, in fact, the original mathematical meaning of the term, which was borrowed for use with HTTP methods.

Idempotence provides an interesting disconnect between humans and computers. While humans can easily identify when a function is idempotent and learn to memorize the result rather than continue carrying out the function each time (remember learning your multiplication tables?), computers aren’t so lucky. They’ll happily churn away at the function time and time again, never realizing how much unnecessary time it takes. This can be a big problem in data-intensive applications, where a function might take a very long time to execute or be executed with the same arguments hundreds or thousands of times.

It’s possible for a program to take the same shortcut that we humans learn as children, but not without a little help. Django provides this assistance by way of the memoize() function, also located at django.utils.functional. It simply takes any standard function and returns a wrapper around it that records the arguments being used and maps them to the value the function returns for those arguments. Then, when those same arguments are passed in again, it simply finds and returns the value that was previously calculated, without running the original function again.

In addition to the function to be called, memoize() takes two other arguments, used to determine how its cache of return values should be managed.

  • cache—A dictionary where the values will be stored, with the key being the arguments passed in to the function. Any dictionary-like object will work here, so it’s possible, for instance, to write a dictionary wrapper around Django’s low-level cache—described in Chapter 8—and have multiple threads, processes or even entire machines all share the same memoization cache.
  • num_args—The number of arguments that are combined to form the key in the dictionary cache. This is typically the total number of arguments the function accepts, but can be lower if there are optional arguments that don’t affect the return value.
    >>> from django.utils.functional import memoize
    >>> def median(value_list):
    ...     """
    ...     Finds the median value of a list of numbers
    ...     """
    ...     print 'Executing the function!'
    ...     value_list = sorted(value_list)
    ...     half = int(len(value_list) / 2)
    ...     if len(value_list) % 2:
    ...         # Odd number of values
    ...         return value_list[half]
    ...     else:
    ...         # Even number of values
    ...         a, b = value_list[half - 1:half + 1]
    ...         return float(a + b) / 2
    >>> primes =(2, 3, 5, 7, 11, 13, 17)
    >>> fibonacci = (0, 1, 1, 2, 3, 5, 8, 13)
    >>> median(primes)
    Executing the function!
    7
    >>> median(primes)
    Executing the function!
    7
    >>> median = memoize(median, {}, 1)
    >>> median(primes)
    Executing the function!
    7
    >>> median(primes)
    7
    >>> median(fibonacci)
    Executing the function!
    2.5
    >>> median(fibonacci)
    2.5

NOTE ABOUT MEMOIZING ARGUMENTS

Because the function’s arguments will be used in a dictionary to map to return values, they must be hashable values. Typically, this means anything immutable, but certain other types of objects may be hashable as well. For example, the median() function described in this section would throw an error if passed a list instead of a tuple. Because the contents of a list can change, they can’t be used as a dictionary key.

partition(predicate, values)

This is a simple utility function that will split a sequence of values into two lists, depending on the result of passing each value to the predicate function. The return value is a 2-tuple, with the first item in that tuple being the False responses, while the second item contains the True responses.

>>> from django.utils.functional import partition
>>> partition(lambda x: x > 4, range(10))
([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])

The predicate is expected to return True or False, but internally partition() actually takes advantage of the fact that True and False are equivalent to 1 and 0, respectively, when they’re used as indexes to a sequence. That means that if you have a predicate that already returns 1 and 0, you don’t need to convert it to use True and False instead.

>>> even, odd = parittion(lambda x: x % 2, range(10))
>>> even
[0, 2, 4, 6, 8]
>>> odd
[1, 3, 5, 7, 9]

wraps(func)

Chapter 2 described decorators in detail, but there’s one aspect of them that can cause problems in some situations because decorators often return a wrapper around the original function. This wrapper is, in fact, an entirely different function than what was written in the source file, so it has different attributes as well. When introspecting functions, this can cause confusion if several functions are passed through the same decorator because they would all share similar properties, including their names.

>>> def decorator(func):
...     def wrapper(*args, **kwargs):
...         return func(*args, **kwargs)
...     return wrapper
>>> def test():
...     print 'Testing!'
>>> decorated = decorator(test)
>>> decorated.__name__
'wrapper'

To help ease this situation, Django includes a copy of Python’s own wraps() function, which was first introduced in Python 2.5. wraps() is actually another decorator, which copies details of the original function onto the wrapper function, so it looks more like the original when everything’s done. Just pass in the original function to wraps() and use it as you would any other decorator on your wrapper, and it’ll do the rest.

>>> from django.utils.functional import wraps
>>> def decorator(func):
...     @wraps(func)
...     def wrapper(*args, **kwargs):
...         return func(*args, **kwargs)
...     return wrapper
>>> def test():
...     print 'Testing!'
>>> decorated = decorator(test)
>>> decorated.__name__
'test'

image Caution  Unfortunately, wraps() can’t make the wrapper completely identical to the original function. In particular, its function signature will always reflect that of the wrapper function, so attempting to introspect the arguments of decorated functions will likely result in some confusion. Still, for automated documentation and debugging purposes, having wraps() update the name and other information is quite useful.

Signals

An important aspect of a large application is knowing when certain things happen in other parts of the application. Even better is the ability to do something the instant that event happens. For this purpose, Django includes a signal dispatcher that allows code to broadcast the occurrence of an event, while providing a method for other code to listen for those broadcasts and react accordingly, the instant the event occurs. It identifies the type of event being broadcast by allowing code to define unique signals to dispatch.

This concept of dispatching and the code that enables it isn’t unique to Django, but its implementation is customized for the needs of a Web application. This implementation is located at django.dispatch.dispatcher, though it’s designed to be used through the simple Signal object, available at django.dispatch. Django uses signals in a variety of places, many of which have been documented elsewhere in this book, in the areas where they’re used. The following sections discuss in more generality how signals and dispatching work, and how to register listeners for particular events.

How It Works

The basic process is fairly simple. Each step will be explained in more detail in individual sections, but the following should serve as a good overview.

First, some Python code defines a signal. As described in the next section, this is a Signal object that is placed in a reliable location. This object represents an event that is expected to occur at some point in time—possibly multiple times. The dispatcher doesn’t use any central registration of signals; it’s up to your own code to know which signal to use at any given time.

When your code triggers an event that you’d like other code to know about, your code sends some information to the signal, including a “sender” object representing where the event is coming from and any arguments that describe other details of the event. The signal itself identifies just the type of event; these additional arguments describe what’s happening at a particular time.

The signal then looks at its list of registered listeners to see if any of them match the provided signal and sender, and calls each function it finds in turn, passing along whatever arguments the signal was given when the event was triggered. Registration of listeners can happen at any time, and the signal will update its registry when a new listener is added, so that future events will include the new listener.

Defining a Signal

A signal doesn’t need to implement any kind of protocol, or even supply any attributes. They’re really just vehicles to use for advertising when an event occurs; they’re simply instances of Signal. The real key to defining a successful signal is just in making sure that it doesn’t get replaced. A signal object must always be available from the same import location, and it must always be the same object. The dispatcher requires this because it uses the object as an identifier, to match the event being dispatched with the appropriate listeners that have been registered.

>>> from django.dispatch import Signal
>>> signal = Signal()
>>> signal
<django.dispatch.dispatcher.Signal object at 0x...>

Sending a Signal

Whenever you’d like to notify other code of an event occurrence, signals provide a send() method to send that signal to any registered listeners. This method requires a sender, which represents the object that was responsible for dispatching the signal, which allows listeners to respond to events coming from a particular object. Typically, Django uses a class—such as a model—as the sender, so that listeners can be registered prior to any instances being created, while also allowing for listeners to respond to events on all instances of that class.

In addition to a sender, send() also accepts any number of additional keyword arguments, which will be passed through directly to listeners. As shown in the next section, listeners must always accept all keyword arguments, regardless of what they actually use. This allows the sending code to add new information to a signal later on, without causing any problems with listeners that haven’t yet been updated to use that new information. It’s quite likely that the code that sends a signal will have features added to it later on, and this keyword argument support makes it easy to incorporate those features into existing signals.

Once all the listeners have been called, send() returns a list of the responses returned by the registered listeners. This list contains a sequence of 2-tuples, of the format (listener, response). Django’s own signals don’t typically use any return values, but they can be quite useful to support plugins that send information back to the application itself.

>>> from django.dispatch import Signal
>>> signal = Signal()
>>> sender = object()
>>> signal.send(sender=sender, spam='eggs')
[]

Capturing Return Values

Functions are often expected to return a value, and signals can take full advantage of that. When each listener is called with the signal’s arguments, Django captures its return value and collects all of them together in a list. Once all the listeners have been called, the full list of return values is then returned from Signal.send(), allowing the calling code to access any information provided by the listeners. This allows signals to be used for more than just extra actions; they can also be used for data processing and related tasks.

Defining a Listener

When sent, the signal passes the sender and all appropriate arguments to each listener function that is registered with that signal. A listener is simply a Python function like any other; the only difference is the fact of having been registered as a listener for a particular signal. Since the signal simply calls the listener as a function, it can actually be any valid Python callable, many of which are described in Chapter 2. In practice, standard functions are the most common.

While listeners are allowed a great deal of flexibility, signals do make one important assumption about how they’re defined: all listeners must accept any keyword arguments that are passed in. Which arguments are actually used depends entirely on how a particular listener intends to use the signal, but it must accept unused arguments without error. As shown previously, signals may be sent with any number of keyword arguments, and these will all be passed along to all listeners.

The value in this approach is that listeners don’t need to know about everything the signal is responsible for. A listener can be attached for one purpose, expecting a specific set of arguments. Then, additional arguments can be added to the signal dispatch, and all previously defined listeners will continue to function properly. As with any other function call, if a listener expects an argument that isn’t provided with the signal, Python will raise a TypeError.

def listener(sender, a, **kwargs):
    return a * 3

Registering Listeners

Once you have a signal to work with and a listener intended to work with it, connecting them is a simple call to the signal’s connect() method. In addition to one required argument, there are a few options that can be specified when registering a signal, customizing how that listener should be handled when the signal is dispatched later on.

  • receiver—The callable that will receive the signal and its associated arguments. This is obviously required for all registrations.
  • sender—A specific object to watch for signals. Since every signal must include a sender, this allows a listener to respond to just that one sender. If omitted, the listener will be called for all senders that issue the given signal.
  • weak—A Boolean indicating whether weak references should be used, a topic described in more detail in the next section. This defaults to True, using weak references by default.
  • dispatch_uid—A unique string used to identify the listener on the given signal. Since modules can sometimes get imported more than once, it’s possible for listeners to get registered twice, which will often cause problems. Supplying a unique string here will ensure that the listener only gets registered once, no matter how many times a module gets imported. If omitted, an ID will be generated based on the listener itself.

Forcing Strong References

While weak references are a fairly complex topic, well beyond the scope of this book,3 signals’ use of them can cause confusion in certain situations, so it’s worth giving a basic overview of the problem and its solution. When an object is referenced using a weak reference, as done by Django’s dispatcher, this reference alone will not keep the object from being garbage collected. It must still have a strong reference somewhere else, or Python will automatically destroy it and free the memory it occupies.

While standard references in Python are strong, the dispatcher, by default, uses weak references to maintain its list of registered listeners. This is generally preferable with signals, because it means that listener functions that belong to code no longer in use won’t use up valuable time and energy by being called.

However, some situations in Python would ordinarily cause an object to be destroyed, and these situations require special attention when using signals. In particular, if a listener function is defined inside another function—perhaps to customize a function for a particular object—the listener will be destroyed when its container function finishes executing and its scope is removed.

>>> from django.dispatch import Signal
>>> signal = Signal()
>>> def weak_customizer():
...     def weak_handler(sender, **kwargs):
...        pass
...     signal.connect(weak_handler)
...
>>> def strong_customizer():
...     def strong_handler(sender, **kwargs):
...        pass
...     signal.connect(strong_handler, weak=False)
...
>>> weak_customizer()
>>> strong_customizer()
>>> signal.send(sender="sender")
[(<function <strong_handler> at 0x...>, None)]

As you can see, the default form of registering the listener allows the function to be destroyed once its customization function finishes executing. By specifying weak=False explicitly, it survives to be called when the signal is sent at a later point in time.

Now What?

The tools laid out in this chapter won’t provide major new features for your applications, but they can help with many of the simpler tasks many applications need. These little things can really help tie it all together. How the application actually gets used is another issue, with some of the more interesting options described in the next chapter.

1 http://prodjango.com/mutagen/

2 http://prodjango.com/json/

3 http://prodjango.com/weak-references/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset