CHAPTER 3: Models

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 3

Models

Data is at the center of most modern Web applications, and Django aims to provide support for a variety of data structures and persistence options. Models are the primary aspect of the traditional MVC model that Django uses as expected. Models are an essential part of any application that needs to persist data across multiple requests, sessions or even server instances.

Django models are defined as standard Python classes, with a wealth of additional features added in automatically. Behind the scenes, an object-relational mapper (ORM) allows these classes and their instances access to databases. Without this ORM, developers would be required to deal with the database directly, using Structured Query Language (SQL), the standard way to access content in databases.

The primary goal of SQL is to describe and access the relationships that are stored in a relational database. SQL does not generally provide high-level relationships for applications, so most applications include handwritten SQL for data activities. This is definitely possible, but it tends to lead toward lots of repetition, which in and of itself violates the DRY principle outlined in Chapter 1.

These bits of SQL littered throughout an application’s code quickly become unmanageable, especially since the programmers who have to manage the code aren’t typically experts in relational databases. That also means that these databases are quite prone to bugs, which are often troublesome to track down and fix.

That still doesn’t factor in the biggest issue of all: security. SQL injection¹ attacks are a common way for malicious attackers to access or even modify data they shouldn’t have access to. This occurs when hand-written SQL doesn’t take appropriate precautions with regard to the values that are passed into the database. The more SQL statements that are written by hand, the more likely they are to be susceptible to this type of attack.

All of these problems are extremely common in Web development, regardless of language, and ORMs are a common way for frameworks to mitigate them. There are other ways to avoid some of these problems, such as SQL injection, but Django’s ORM was written with these concerns in mind and handles much of it behind the scenes. By accessing data using standard Python objects, the amount of SQL is minimized, reducing the opportunity for problems to crop up.

How Django Processes Model Classes

Described in Chapter 2, one of Django’s most recognizable features is its declarative syntax for model definitions. With this, model definitions can be simple and concise, while still providing a vast array of functionality. The basic process of using metaclasses for declarative syntax is described in detail in Chapter 2, but there are more specific steps taken when handling models, which deserve some extra attention.

The metaclass responsible for processing model definitions is ModelBase, living at django.db.models.base. This provides a few key features, listed here in the order in which the actions are performed.

A new class is generated to be used for the actual model, preserving the module location where the original model was defined.
If a custom app_label wasn’t provided for the model, it’s determined based on the module where it was declared.
Meta options are pulled out of the model and placed in a special Options object, which is described in more detail later in this chapter.
Two special exception classes, DoesNotExist and MultipleObjectsReturned, are created and customized for the new model.
A default manager is assigned to the model if one wasn’t already provided.
If the model was already defined—which can happen because of differences in how the module was imported at different stages—the existing model is retrieved from the application cache and returned, making sure that the same class object is always used.
Attributes and methods defined on the original model are added to the newly-created model class.
Settings from inherited parent models are set on the new model.
The new model is registered with the application cache for future reference.
The newly-created model is returned to be used in place of the class that was defined in the source file.

Abstract models and inherited models are special cases, where not all of these actions occur. Specific differences for these cases are covered later in this chapter.

Setting Attributes on Models

Python provides useful tools for getting and setting attributes on objects without knowing the name in advance, but while getattr() and setattr() represent the standard way of accessing attributes on objects, one of Django’s hooks for model fields requires some additional handling. Django provides a class method, add_to_class(), on all of its models, which should be used as a substitute for setattr().

The syntax and semantics of add_to_class() are slightly different than the traditional functions. It’s actually a class method, rather than a built-in or even module-level function, which means the class is provided implicitly, rather than being an explicit first argument. This method checks the provided value for the presence of a contribute_to_class() method, and calls it if it exists. Otherwise, the standard setattr() function is used to add the value to the model. These behaviors are mutually exclusive; only one will happen in a given add_to_class() call. It’s important to realize that this isn’t just for Django’s own internal code. If an application has need to add arbitrary objects as attributes to models, they must call add_to_class(). This way, developers working with the application can pass any object in, and be assured that it will be handled the same as if it had been applied directly on the model’s class definition.

This whole process changes what the classes look like when using the introspection techniques described in Chapter 2. In order to determine the declared fields, the database table being used or the display name for the model, some additional knowledge is required.

Getting Information About Models

Once a model has been processed by Python, along with Django’s ModelBase metaclass, its original structure can still be determined by using an attribute that exists on every Django model and its instances called _meta.

There are a number of attributes available on _meta, which combine to describe the model, how it was defined, and what values were provided to customize its behavior. These can also be classified into two separate groups: attributes that are determined by looking at the actual structure of the original class and those that are specified directly as part of a Meta class defined inside the model.

REGARDING THE STABILITY OF _META

Names beginning with underscores typically refer to private attributes that shouldn’t be used directly. They’re often used internally by functions and methods that are more public in nature, and are generally accompanied by warnings about likely changes and undocumented behavior. In most cases, these warnings are valid; programmers usually write tools for their own use, and find little need in documenting their behavior or securing their longevity.

However, _meta is a bit of an exception to the rule. While it is indeed part of a private API, which isn’t necessary for the vast majority of situations, it shares something with many tools described in this book; it can prove extremely useful if understood and used properly. In fact, _meta goes one better, by being quite stable and highly unlikely to change without considerable effort to keep it backwards-compatible. It’s the foundation of much of Django’s own internal code, and is already being accessed directly by many third-party applications as well.

So, while names beginning with underscores do generally spell danger, potential incompatibilities and lack of support, you can rely on _meta quite safely. Just make sure to keep up with Django’s list of backwards-incompatible changes. Anything new that would break _meta will be listed there.

Class Information

While most of the basic introspection techniques covered in Chapter 2 apply to Django models, there are a number of details that are also made available on the _meta attribute. Most of this is information Django itself needs in order to properly deal with models, but as with many other features, it can be quite useful for other applications as well.

One important distinction to make with models is whether they’re “installed” or not. This means checking whether the application that contains them is listed in the site’s INSTALLED_APPS setting. Many Django features, such as syncdb and the built-in admin interface, require an application to be listed in INSTALLED_APPS in order to be located and used.

If an application is designed to accept any Django model directly, rather than iterating through INSTALLED_APPS, it will often need some way to determine whether the model is properly installed. This is necessary in case the application needs to handle models differently, depending on whether database operations should be performed on the table, for instance. For this purpose, Django provides the installed attribute, which will be True only if the model belongs to an application listed in INSTALLED_APPS, and False otherwise.

There are two other attributes of model-level information that are commonly useful to application developers. As described in Chapter 2, all Python classes provide an easy way to get the name of the class and the module where it was defined, using the __name__ and __module__ attributes, respectively. However, there are some situations where these can be misleading.

Consider a situation where a model may be subclassed without inheriting all the Django-specific model inheritance processing. This requires a bit of tweaking with metaclasses, but can prove useful for solving certain types of problems. When doing this, the __name__ and __module__ attributes will refer to the child class, rather than the actual model that sits underneath.

Often, this is the desired behavior, as it’s just how standard Python works, but when attempting to interact with the Django model, or other areas of Django that may need to work with it, it may be necessary to know the details of the model itself, rather than the child class. One way to go about this would be to use class introspection to get the various parent classes that are in use, checking each to see if it’s a Django model.

This is a fairly unsightly process that takes time to code, time to execute, makes maintenance and readability more difficult and adds boilerplate if it needs to be done often. Thankfully, Django provides two additional attributes on _meta to greatly simplify this. The module_name attribute contains the __module__ attribute from the underlying model, while object_name pertains to the __name__ attribute of the model.

Field Definitions

A major challenge involved in using and manipulating Django models is the process of locating and using fields that are defined for them. Django uses the creation_counter technique described in Chapter 2 to keep track of the order of fields, so they can be placed inside a list for future reference. This list is stored in the fields attribute of the model’s _meta attribute.

As a list, this can be iterated to retrieve all the field objects in order, which is extremely useful when looking to deal with models generically. As described later in this chapter, field objects have attributes containing all the options that were specified for them, so each item in the list can provide a wealth of information.

With this, we can create a custom form or template output, or any other feature that needs to work with fields on an arbitrary model. Consider the following example, which prints out the display names and current values for each field in a given object, without having to know in advance what model is being used.

from django.utils.text import capfirst
 
def get_values(instance):
    for field in instance._meta.fields:
        name = capfirst(field.verbose_name)
        value = getattr(instance, field.name)
        print('%s: %s' % (name, value))

Going about it this way allows the function to ignore the details of the model behind the object. As long as it’s an instance of a proper Django model, the _meta attribute will be available and all the fields will be accessible in this way. Since Django automatically adds an AutoField to any model that doesn’t declare a primary key, the created AutoField will also be included in the fields list.

While being able to iterate through a list is great for those situations where all the fields will be taken into account, sometimes only a single field is needed, and the name of that field is known in advance. Since fields is a list instead of a dictionary, the only way to get a field by its name would be to loop over the fields, checking each to see if its name matches.

To cater to this need, Django provides a utility method, _meta.get_field(). By providing the field name to the _meta.get_field(), it’s easy to retrieve just the specified field. If no field with that name exists, it will raise a FieldDoesNotExist exception, which lives at django.db.models.fields.

To get a better understanding of how these methods work together to identify the fields that were declared on a model, consider the following model declaration.

class Product(models.Model):
    sku = models.CharField(max_length=8, verbose_name='SKU')
    name = models.CharField(max_length=255)
    price = models.DecimalField(max_digits=5, decimal_places=2)
 
    def __unicode__(self):
        return self.name

Then, the model could be inspected to get more information about this declaration, without having to know what it looked like in advance.

>>> from django.utils.text import capfirst
>>> for field in Product._meta.fields:
...     print('%s: %s' % (capfirst(field.verbose_name), field.__class__))
...
ID: <class 'django.db.models.fields.AutoField'>
SKU: <class 'django.db.models.fields.CharField'>
Name: <class 'django.db.models.fields.CharField'>
Price: <class 'django.db.models.fields.DecimalField'>
>>> Product._meta.get_field('name').__class__
<class 'django.db.models.fields.CharField'>

Primary Key Fields

Any field can be specified as a primary key, by setting primary_key=True in the field’s definition. This means that if code is to handle a model or a model instance without prior knowledge of its definition, it’s often necessary to identify which field was defined as a primary key.

Much like getting a field by name, it would be possible to just iterate over all the fields, looking for one with its primary_key attribute set to True. After all, Django only allows one field to be specified as a primary key. Unfortunately, this again introduces a fair amount of boilerplate that slows things down and makes it more difficult to maintain.

To simplify this task, Django provides another _meta attribute, pk, which contains the field object that will be used as the primary key for the model. This is also faster than iterating over all the fields, since pk is populated once, when the model is first processed. After all, Django needs to determine whether it needs to provide an implicit primary key. The _meta.pk attribute is also used to enable the pk shortcut property on model instances, which returns the primary key value for an instance, regardless of which field is the primary key.

Typically, models don’t need to declare an explicit primary key, and can instead let Django create one automatically. This can be a useful way to avoid repeating such a common declaration, while still allowing it to be overridden if necessary. One potential problem with this, however, is the task of determining whether a model was given an automatic field, and what that field looks like.

It’s possible to make certain assumptions about a model, based on how Django provides this automatic field, and what it would typically look like. However, it’s easy to create a custom field that looks a lot like the implicit field, and it’d be very difficult to tell the difference if your code only looks at its structure and options.

Instead, Django provides two attributes on the _meta attribute that help with this situation. The first, _meta.has_auto_field, is True if the model let Django provide an id field implicitly. If it’s False, the model has an explicit primary key, so Django didn’t have to intervene.

The second attribute related to the automatic primary key field is _meta.auto_field, which will be the actual field object Django provided for use as the primary key. If _meta.has_auto_field is True, this will be an AutoField, and will always be configured the same way for all models that use it. It’s important to look at this attribute instead of making assumptions about the field’s structure, in case Django makes any changes in the future. It’s an easy way to help make sure your application keeps working properly in the future. If a model provides its own primary key field, and thus _meta.has_auto_field is False, _meta.auto_field will be set to None.

Configuration Options

In addition to providing access to the fields declared on the model, _meta also acts as a container for all the various options that can be set on a model using the Meta inner class. These options allow a model to control a variety of things, such as what the model is named, what database table it should use, how records should be ordered, and a number of others.

These options all have defaults, so that even those attributes that aren’t specified on the model are still available through the _meta attribute. The following is a list of the many options that are available in this way, along with their default values and a brief description what the option is intended for.

abstract—A Boolean that indicates whether the model was defined as abstract, a process that is described in more detail in Django’s model inheritance documentation.² The default value is False.
app_label—A string containing the name Django uses to recognize the application where the model was defined. It’s easiest to understand what this means by looking at the default value, which is the name of the module containing the models.py the model is specified in. For a model located at corporate.accounts.models.Account, the app_label would be "accounts".
db_table—The name of the database table that Django will use to store and retrieve data for the model. If not defined explicitly, it’s determined as a function of the model’s name and location. That is, the db_table for a model called Account with an app_label of accounts would be "accounts_account".
db_tablespace—In the case of Oracle, and perhaps other database backends in the future, tables can be placed in different parts of the disk, or different disks entirely. By default, this is simply an empty string, which tells the database to store the table in its default location. This option is ignored for backends that don’t support it.
get_latest_by—The name of a date-based field, such as a DateField or a DateTimeField, which should be used to determine the most recent instance of a model. If not provided, this will be an empty string.
order_with_respect_to—An instance of a field relating to another model, which is used when ordering instances of this model. This defaults to None, which implies that the model’s ordering is determined solely by fields within the model itself, rather than any related models.
ordering—A tuple containing the names of fields to be used when ordering instances of the model. By default, this is an empty tuple, which relies on the database to determine the ordering of model instances.
permissions—A sequence of tuples of additional permissions to be added to the model. Each tuple in the sequence contains two values, the first being the name of the permission to be used in code and in the database, and the second being the text to be displayed in the admin interface when selecting permissions for a user or group.
unique_together—A sequence of tuples indicating any groups of fields that must, when combined, be used in only one record in the database. Each tuple in the sequence contains the names of the fields that must be unique together for a particular index. Multiple tuples don’t have any relation to each other; they each represent a separate index at the database level.
verbose_name—The display name for a single instance of the model. By default, this is determined by the name of the class itself, by splitting up each capitalized portion into a separate uncapitalized word; Article would become "article", while AddressBook would become "address book".
verbose_name_plural—The display name for multiple instances of the model. By default, this will be simply the verbose_name with an “s” at the end. Article would be "articles" and AddressBook would be "address books".
verbose_name_raw—The raw, untranslated version of verbose_name. Occasionally, it’s necessary to use the same display name for everyone, without Django applying a translation. This is particularly useful when storing it away in the cache or database for later access, especially if it’ll be translated at a later point in time.

Accessing the Model Cache

Once models have been processed by the ModelBase metaclass, they’re placed in a global registry called AppCache, located at django.db.models.loading. This is instantiated automatically, immediately when the module is imported, and is accessed using the name cache. This special cache provides access to the various models that are known to Django, as well as installs new ones if necessary.

Because ModelBase handles registration of new models whenever the class is processed by Python, the models it contains aren’t guaranteed to be part of applications present in the INSTALLED_APPS setting. This fact makes it even more important to remember that the _meta attribute on the model contains an installed attribute indicating whether the model belongs to an installed application.

Whenever code accesses one of the features in this section, AppCache will automatically load applications that are listed in INSTALLED_APPS, making sure that whenever some of the features are accessed, the cache includes all applications and models that should be made available. Without this, the results of these methods would be wildly unpredictable, based solely on which applications were loaded in which order.

As might seem obvious, the application cache can only be fully populated once all the applications have been loaded. Therefore, if an application’s models.py makes any calls to AppCache as part of this loading process, it’s possible that the cache might not be fully populated yet.

To protect against this problem, AppCache provides a method to determine whether the cache itself has been populated and is ready to be accessed. Calling cache.app_cache_ready() will return True or False depending on whether all of the installed applications have been processed correctly. Using this, applications that could benefit from having their own cache of known models can check if this cache is available for that purpose. If so, it can use this cache directly, while if not, it can manually determine what it needs to know.

Retrieving All Applications

When looking to introspect a site’s contents, it’s also very useful to look at the structure of applications themselves. After all, looking at models is only useful if there are models to look at, and sometimes it’s necessary to just collect all the models currently in use. It’s also useful to have them arranged by the application that declares them. Django already needs to have this information handy, so AppCache is designed to specifically manage this information.

HOW DOES DJANGO SEE APPLICATIONS?

One important thing to keep in mind is that Django needs an object to use as a reference for the application. A Django application is essentially a standard Python package, which is just a collection of modules contained in a single folder. While Python provides an object to use as a reference for individual modules, it doesn’t offer anything to refer to a package.

Because of this, the closest notion Django can have to an application object is the __init__.py module that Python uses to recognize it as a package. In that case, Django would be using a module object as an application reference.

Unfortunately, few projects store anything useful in __init__.py, so Django isn’t likely to find anything of interest in it. In order to get at anything really useful, it would have to perform some extra work to traverse the package structure to get a module that contained some pertinent information.

Instead, since Django has to use a module object anyway, it makes more sense to use a module that contains useful information right off the bat. For the majority of applications, the most useful module in a package is models.py, where all the Django models are defined. Therefore, Django uses this module to recognize an application. Some of the following methods return an application, and in each case, it returns the models module within the application’s package.

The first step in a site-wide introspection is to determine what applications are installed. Calling cache.get_apps() will return such a list, containing the application module for each application in the INSTALLED_APPS setting that contains a models module. That’s not to say that it only returns applications that have models. It actually checks for the presence of a models module, so even an empty models.py will cause an application to be included in this list.

Take, for example, the following INSTALLED_APPS setting, showing several of Django’s own contributed applications, as well as some in-house applications and the signedcookies application described in Chapter 7.

INSTALLED_APPS = (
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'news',
    'customers',
    'callcenter',
    'signedcookies',
)

Most of these applications will, by necessity, contain various models. Chapter 7’s signedcookies, however, only interacts with the site’s HTTP traffic, so it has no use for the database. Therefore, when looking through the results of cache.get_apps(), the signedcookies application won’t show up.

>>> from django.conf import settings
>>> from django.db.models.loading import cache
>>> len(settings.INSTALLED_APPS)
9
>>> len(cache.get_apps())
8
>>> for app in cache.get_apps():
...     print(app.__name__)
...
django.contrib.admin.models
django.contrib.auth.models
django.contrib.contenttypes.models
django.contrib.sessions.models
django.contrib.sites.models
news.models
customers.models
callcenter.models

Retrieving a Single Application

With a list of applications, it’s straightforward to get models from each, so they can be handled appropriately. The next section describes that process in more detail. However, looking at all models isn’t always the best approach; sometimes an application might be given the label of a specific application, so it can deal with just the models in that application.

While it would certainly be possible to just loop through the results from cache.get_apps(), checking the module names against the application module’s __name__ attribute, that technique quickly runs into a few problems. First, the application’s label isn’t the same as its __name__ attribute, so trying to compare the two results in a good bit of extra code, most of which is already being done by Django. Also, that code must be tested and maintained, which increases the risk of introducing bugs into the application.

Instead, Django provides a utility for handling this situation. By passing the known label to cache.get_app(), an application can retrieve the application module for just the application matching that particular label. The label referred to here is determined as a specific part of the application’s import path.

Typically referenced as app_label, an application’s label is usually formed from the last part of the application module’s import path before the models portion. To illustrate a few examples, consider the following application labels, corresponding to the entries in the INSTALLED_APPS setting.

admin
auth
contenttypes
sessions
sites
news
customers
callcenter
signedcookies

There’s one important note to mention here. As part of the Meta options described in the official documentation, and briefly touched on earlier in this chapter, any model may override its own app_label setting to behave as though it was declared inside a different application. This option does not affect the behavior of cache.get_app() in any way. The get_app() method simply maps the app_label to an application module, without regard to what options the modules inside it may have declared.

As demonstrated earlier with cache.get_apps(), applications without models are viewed slightly differently within Django itself than others. By default, cache.get_app() will raise an ImproperlyConfigured exception if the application doesn’t contain a models.py file. Sometimes it may still be useful to process applications without models, so cache.get_app() accepts an optional second argument to control how such applications are handled.

This second argument, called emptyOK, takes a Boolean indicating whether the application is allowed to not contain any models. This defaults to False, which will raise the ImproperlyConfigured exception, but if True is given instead, cache.get_app() will simply return None, allowing the calling code to continue managing the application.

>>> from django.db.models.loading import cache
>>> print(cache.get_app('admin'))
<module 'django.contrib.admin.models' from ...>
>>> print(cache.get_app('signedcookies'))
Traceback (most recent call last):
  ...
django.core.exceptions.ImproperlyConfigured: App with label signedcookies could not be found
>>> print(cache.get_app('signedcookies', emptyOK=True))
None

Dealing with Individual Models

Once an application is known, the next step is to deal with individual models within that application. Once again, AppCache comes through with a few methods to handling this situation. Retrieving models from the cache typically takes one of two forms, depending on how much is known about the model in advance.

In the first case, consider pure introspection. Remember from the previous section that AppCache provides access to all known applications with a single call to the get_apps() method, which returns application modules. Since these modules are actually the models modules within each application, it may seem easy to just use dir(app_module) or iterate over app_module.__dict__ to get the models that were defined.

Unfortunately, like many uses of simple iteration, that would require the loop to check each individual object in the module to see if it is in fact a model or if it’s something else entirely. After all, Python modules can contain anything, and many models make use of tuples and module-level constants to help do their work, so there’s no guarantee that each item in the module’s namespace is in fact a Django model.

Instead, cache.get_models() retrieves a list of proper Django models that are specific to the given application module. It’s no coincidence that both cache.get_apps() and cache.get_app() return application modules; cache.get_models() is suitable for use with both of these methods. That means that a list of models can be retrieved even without an application, but knowing the application in advance reduces the number of models retrieved.

The following code demonstrates how these techniques can be used in combination to retrieve a list of models for each of the known applications in use on the site.

>>> from django.db.models.loading import cache
>>> for app in cache.get_apps():
...     app_label = app.__name__.split('.')[-2]
...     for model in cache.get_models(app):
...         print('%s.%s' % (app_label, model.__name__))
...
admin.LogEntry
auth.Message
auth.Group
auth.User
auth.Permission
contenttypes.ContentType
sessions.Session
sites.Site
news.News
customers.Customer
callcenter.Agent
callcenter.Call
callcenter.Case

As an additional option, get_models() can also be called with no argument, which will cause it to return all the models that are known to AppCache. This is a useful shortcut to avoid some of the overhead associated with the extra loop in this example, as a quick way to grab all the models.

There’s a catch, however.

When using get_models() directly, with no argument, all registered models are returned. This may sound like a great idea, and sometimes it is, but remember that AppCache registers all models as they’re encountered, regardless of where they were found. The full list may include models that aren’t part of an installed application. Contrast that with the get_apps()/get_models() combination, which only retrieves models if their applications are found in the INSTALLED_APPS setting.

In practice, get_models() may return different results if called without an argument than if it were called with each of the applications returned from get_apps(). Typically, this could mean that an application may get access to extra models that it might not want to know about. Sometimes this is indeed the desired behavior, but it’s always important to understand the difference.

One way a model could be in AppCache, but not be installed, is if the application is imported from a separate, installed application, which would cause its model classes to be processed by Django and registered, regardless of whether or not it was in INSTALLED_APPS. Also, if any model specifies an app_label on its Meta class and that application label doesn’t match up with any installed application, the same situation would occur. If an application does wish to access all the models, regardless of whether they’re installed or not, remember that it can use the _meta.installed attribute to identify which models were installed properly.

Sometimes, the name of both the application and the model are provided, perhaps as part of a URL or other configuration. In these cases, it doesn’t make much sense to iterate over all the models for the given application. For this case, AppCache provides another method, get_model(), which retrieves a model class based on an application label and model name. The application name is case-sensitive, but the model name isn’t.

>>> from django.db.models.loading import cache
>>> cache.get_model('auth', 'user')
<class 'django.contrib.auth.models.User'>

Using Model Fields

One of the most important aspects of models is the set of fields that are available to hold data. Without fields, a model would just be an empty container with no way to do anything useful. Fields provide a way to organize a model’s values and validate against specific data types, providing a bridge between the database and native Python data types.

Normally, when accessing a field as an attribute of a model instance, the value will be a standard Python object representing the value found in the database. Previous sections in this chapter have described a variety of ways to get access to the actual field objects themselves, rather than this converted value. There are a variety of useful things that can be done with field objects.

Common Field Attributes

Different field types will have different attributes according to their needs, but there are several attributes that are common across most built-in Django fields. These can be used to generically access various details of fields, and by association, the values and behaviors they’re meant to interface with. Note that there are more attributes used internally than those listed here, but these are the most useful and stable, and will provide the greatest value to applications looking to work with fields.

The descriptions listed here are how Django itself uses these attributes, and how developers will expect them to behave. Other applications will likely find use for them as well, to control certain types of behaviors, so the following descriptions will help illustrate their intended usage.

Some applications may find uses that are slightly different from what Django itself expects to use them for, but the general semantics of the values should remain intact. Remember that developers will build their expectations for these values based on how Django itself behaves, and third-party applications should avoid violating these expectations.

attname—The name of the attribute on model instances where the database-related value is stored. This is typically the same as the name attribute, for simple cases where the value from the database is stored directly on the model. In other cases, it’s more appropriate to expose a more complex object, such as another model instance, to other code when the actual field name is accessed. For those cases, attname and name will be different, with the attribute referenced by name being the complex object, while the attribute referenced by attname contains the raw data required to create it.
blank—A Boolean value indicating whether the field must have a value supplied when using a form generated automatically based on the model. This is purely validation-related behavior; the null attribute controls whether a model can actually be saved in the database without a value for the given field.
choices—A sequence of 2-tuples indicating the valid choices for the field. The first item in each tuple is the actual value that would be stored in the database if selected, while the second item is the text that will be displayed to the user for that value.
column—The name of the database column that will be used to hold the field’s value. This will either match db_column, if the field declared its database column explicitly, or will have been generated automatically, based on the field’s name. Normally, this can be ignored, since Django manages the database interaction directly, but some applications may have need to communicate directly with the database or interface with some other database adapter that will need this information.
db_column—The name explicitly supplied as the database column name for the field’s values. This is different from column in that db_column refers to what the model itself declares, rather than what will actually be used. This will only have a value if the model field specified its db_column argument explicitly; it will be None otherwise.
db_index—A Boolean indicating whether the field was declared to have an index created for it in the database. This only indicates whether the field was configured to instruct Django to create the index. Other indexes may have been added directly in the database itself, which won’t necessarily be reflected in the value of this attribute.
db_tablespace—The tablespace directive indicating where the field’s data will be stored. Currently only supported for the Oracle backend, the format of its contents will depend on which database backend is in place. It will always have a string value, defaulting to the value of the DEFAULT_INDEX_TABLESPACE setting if not set explicitly.
default—The default value for the field, to be used if no value has yet been supplied to the field itself. In addition to being inserted into the database in such a case, this value will be used as the field’s initial value for any forms generated based on the model. The type of value stored in this attribute will be whatever native Python data type the field is intended to interact with, such as a string or an integer.
description—A simple text description of the field or its purpose. A docstring is generally useful as well, but this description can be used when displaying information about the field inside an application, such as admindocs.
editable—A Boolean indicating whether the field should be presented to users for editing when generating forms based on the model. This doesn’t make the field itself read-only from within Python so this is far from a guarantee that the field won’t be edited. It’s simply a directive to control the default behavior of forms, though other applications can—and should—use it to control other behaviors as well, if they provide editing capabilities.
empty_strings_allowed—A Boolean indicating whether the field allows an empty string as a possible value. This isn’t an option specified as the configuration of a specific field instance, but is rather defined in the field’s class itself. Many fields, such as CharField and EmailField, treat empty strings separately from None, so this attribute allows backends to decide how to handle empty strings for databases, such as Oracle, that might otherwise lose that distinction.
help_text—The informative text provided in the field definition, to be displayed to users when the field is presented for editing. This will be passed in for forms that are generated based on the model, such as the provided admin interface.
max_length—The maximum length the field’s value can contain. Most string-based fields, such as CharField and EmailField, use this to limit the length of string content, both in form fields and the underlying database column. Other field types, such as IntegerField and DateField, simply ignore it, as it has no meaning in those cases.
name—The name of the field, as defined when assigning the field to the model. This is set as part of the contribute_to_class() process, to maintain DRY by avoiding having to type the name twice. This will be the name of the attribute where the field’s native Python value will be assigned and retrieved. Contrast this with attname, which stores the raw data necessary to populate name. Often, the two values will be the same, but the distinction is important to understand, for cases where they’re different.
null—A Boolean indicating whether the field can be committed to the database without a value assigned. This primarily controls how the underlying database column is created, but some applications may find other uses, as long the semantics remain the same.
primary_key—A Boolean indicating whether the field should be used as the primary key for the database table. In addition to instructing the database to generate the primary key index, Django uses this indicator to determine which field’s value to use when looking up specific instances, such as related objects through foreign key relationships. See the section on “Primary Keys” earlier in this chapter for details on the _meta.pk shortcut for determining which field has this value set to True.
rel—In the case of fields that relate one model to another, this will be a special object describing the various aspects of that relationship. For all non-relationship field types, this will be set to None.
serialize—A Boolean indicating whether the field should be included when model instances are serialized using the serialization framework.³
unique—A Boolean indicating the field must be unique among all instances of the model. This is primarily used to create the proper constraints in the database to enforce this condition, but it can also be used by applications. For instance, a content editing application that provides detailed feedback about whether the user-entered values are valid for the model can also take this into account when making that determination.
unique_for_date—The name of a date-related field, such as a DateField or DateTimeField, for which this value should be unique. This is essentially like unique, except that the constraint is limited to records that occur on the same date, according to the field referenced by this attribute. This can’t be enforced at the database level, so Django manages the constraint manually, as should any other applications that need to provide detailed information about whether a given object can be committed to the database.
unique_for_month—Like unique_for_date, except that the uniqueness is only required for objects that occur within the same month, according to the date-related field referenced by the name contained by this attribute.
unique_for_year—Like unique_for_date, except that the uniqueness is only required for objects that occur within the same year, according to the date-related field referenced by the name contained by this attribute.
verbose_name—The full name of the field, in plain English, to be displayed to users. Django’s documentation recommends that this begin with a lower-case letter, so that applications can capitalize it as necessary. If an application needs this value capitalized, be sure to use the capfirst() utility method, described in Chapter 9.

Common Field Methods

Like the attributes described in the previous section, these methods are common to most field types, and provide a wealth of functionality that might otherwise be difficult to come by. Not all field types will implement all of these methods, and their exact behavior may change depending on the field type involved, but the general semantics described here will remain the same.

There are more methods that get used even more internally, which aren’t listed here, because they’re primarily responsible for simply populating the attributes described in the previous section. Therefore, it’s generally best to simply reference the generated attributes, rather than attempting to recreate them manually after the fact.

clean(value, instance)—Validates the given value is appropriate for the model, and the instance it's assigned to. Internally, this defers to both to_python() and validate(), as well as processing a list of validators that were defined when the field was instantiated. It will return a corrected value if everything was valid, and will raise django.core.exceptions.ValidationError otherwise.
contribute_to_class(cls, name)—Configures the field for the class it’s attached to. One of the most important methods on fields, this is called when ModelBase is processing the attributes that were assigned to the model’s class definition. The cls argument is the model class it was assigned to, and name is the name it was given when it was assigned there. This allows fields the opportunity to perform any additional setup or configuration, based on this information. It usually doesn’t need to be called directly, but can be a useful way of applying a field to a previously-processed model.
db_type(connection)—Returns the database-specific column definition necessary for this field to store its data. Typically, this is only used internally, but as with some of the other attributes listed, if an application needs to access the database directly using some other tool, this can be a useful way to determine what the underlying column looks like.
formfield()—Returns a form field based on the field’s data type and verbose name, suitable for inclusion on any standard form. It optionally takes one explicit argument, form_class, which is a form field class to be instantiated, which defaults to whatever form field is most appropriate, as defined by the model field itself. It also accepts any number of additional keyword arguments, which are simply passed through the form field’s constructor before returning the instantiated form field. This is normally called automatically by Django when constructing a form based on a model, but may be used manually as well for other situations. More information can be found in Chapter 5.
get_attname()—Returns the name that should be used for the attname attribute. This is only called once, while the field is being configured for the class.
get_attname_column()—Returns a two-item tuple containing the values to be used for the attname attribute as well as the column attribute.
get_cache_name()—Returns a name suitable for use as a cache for the field, if caching is necessary. This is typically only required for fields that generate complex Python data types, which would suffer significant performance penalties if such a complex object had to be generated on every access, or in cases where it won’t be used. See the applied techniques at the end of this chapter for details on how to use this method in such cases.
get_choices()—Returns a sequence of 2-tuples that should be used for displaying choices to users looking to enter data into this field. Unlike the choices attribute, this may also include an empty option that would indicate no choice has been made. This behavior is controlled by two optional arguments: include_blank, a Boolean indicating whether it should be included, and blank_choice, a list of tuples containing the values and display text that should be used for the empty options. By default, these arguments are configured so that a single choice of ("", "---------") is included.
get_db_prep_lookup(value, lookup_type, connection, prepared=False)—Returns a representation of the supplied value that’s suitable for comparing against existing values in the database.
get_db_prep_save(value, connection)—Returns a representation of the supplied value that’s suitable to be stored in the database.
get_db_prep_value(value, connection, prepared=False)—Returns a representation of the supplied value that’s ready for general use with the database. This is called internally by both get_db_prep_lookup() and get_db_prep_save().
get_default()—Returns the default value that would be used for the field. This takes care of all the necessary logic, checking if a default value was provided, executing it if a callable was provided as the default, and differentiating between empty strings and None, for database backends needing that behavior.
get_internal_type()—Returns a string representing a high-level idea of what type of data the field contains. This is primarily used, along with a mapping provided by each database backend, to determine the actual database column to be used.
get_prep_lookup(lookup_type, value)—Like get_db_prep_lookup(), except that this method is used for simpler conversions that don’t require knowing which type of database is used.
get_prep_value(value)—Like get_db_prep_value(), except that this method is used for simpler conversions that don’t require knowing which type of database is used.
has_default()—Returns True if the field has a default value associated with it, or False if the default behavior will be left to the database backend.
pre_save(model_instance, add)—Returns a value for the field just prior to being saved in the database. By default, this simply returns the value that is already set on the supplied model_instance, but it could return a value derived from some other field or perhaps completely unrelated to the instance, such as the current time. The add argument is a Boolean indicating whether the provided instance is being added for the first time.
save_form_data(instance, data)—Stores the supplied data to the appropriate attribute on the supplied instance. This is a shortcut for forms to be able to adequately populate a model instance based on form data.
set_attributes_from_name(name)—Uses the supplied name argument to set the field’s name, attname, column and verbose_name attributes as necessary. This method defers to get_attname_column() for the attname and column values, while verbose_name is only set here if it wasn’t explicitly defined when instantiating the field.
to_python(value)—Coerces the supplied value to a native Python data type that can be used when accessing the field’s value on a model instance. See its description later in this chapter for further details.
validate(value, instance)—Returns without error if the field’s value is appropriate for the field’s configuration and other data on a model instance, or raises django.core.exceptions.ValidationError otherwise. This is called internally by clean().
value_from_object(obj)—Returns the field’s value as it appears on the supplied object.

Subclassing Fields

One of the more useful things that can be done with Django models, particularly with regard to reusable applications, is to tie into a model’s ability to process individual types of fields in a generic fashion. This allows fields themselves to have considerable control over how they interact with the database, what native Python data type is used to access their contents and how they’re applied to the model classes that use them.

The majority of this section assumes that the custom field will need to retain much of the same functionality of existing fields, such as interacting with the database and generated forms. There are many other applications, such as the historical records application described in Chapter 11, which use the hooks described in this section to provide much more functionality than just a simple field.

The term “field” here is used loosely to describe any object that uses some of these techniques to present itself to a Django developer as something resembling a standard Django model field. In reality, such an object could encapsulate complex relationships, such as a tagging application, or even control the creation of entire new Django models on the fly, based on the model to which they’re assigned. The possibilities are nearly limitless.

The key to remember is that Django uses duck typing principles with regard to fields. It simply accesses whatever attributes and methods it expects in each situation, without regard to what those actually do behind the scenes. In fact, there’s not even any requirement that objects be a subclass of django.db.models.fields.Field to make use of these hooks. Inheriting from Field simply provides an easy way to reuse much of the existing functionality, if that behavior is required.

Deciding Whether to Invent or Extend

One of the first things to consider when writing a new field is whether to try to invent an entire new type of field, starting perhaps from scratch without the aid of Field at all, or to extend some existing field type and inherit much of its behavior. There are advantages and disadvantages to each approach, and which is most appropriate depends very much on the demands of the new field being created.

By inheriting from Field or one of its subclasses, most of the behaviors in the following sections will be inherited, potentially reducing the amount of new code the custom field must include. If its behavior is similar to an existing field type, this can be a very useful way not only to cut down on new code, which helps reduce bugs, but also to automatically receive any new or updated functionality provided by Django itself in future releases. After all, by relying on Django itself for much of this behavior, updates to that code will automatically be reflected in the behavior of the custom field.

On the other hand, if the new field varies considerably from any existing field type, the standard behaviors will need to be rewritten for its own use anyway, negating any value of inheriting from a parent class. If most—or all—of these behaviors have to be written from scratch, inheriting from an existing field will simply create an extra step in the process Python uses to manage the class, even though that extra step offers little or no benefit. In these cases, it’s best, therefore, to simply start from scratch, implementing just those behaviors that make sense for the custom field, and Django will still process it properly, due to its use of duck typing.

Of course, there is some middle ground between the two approaches. For instance, a custom field may interact with a completely unique data type, bearing little resemblance to any existing field types, but it may still store its data in the database like a standard field, and could benefit from reusing many of Django’s more basic field methods, such as assigning names and storing itself in _meta.fields. In these cases, it’s quite reasonable to inherit from Field itself, rather than a specific subclass, and inherit just this most basic functionality.

Performing Actions During Model Registration

The first step any field goes through is being processed by the ModelBase metaclass, whenever Python encounters a model class that utilizes the field in question. For standard Python objects, this means simply getting assigned to the model class as normal, with no additional processing. Fields take a different path, however, and each field gets the chance to customize how it’s applied to a model class.

contribute_to_class(self, cls, name)

This is perhaps the most important method a field can contain, as it provides an essential feature: the ability for a field to know what class it was assigned to, and what name it was given. This may seem like a simple requirement, but Python itself doesn’t normally have a way to facilitate this.

You may recall that descriptors, described in Chapter 2, have a way to identify what class—and even what instance of that class—was used to access the object, but this is only available at the time the attribute is accessed; there’s still no way to know this information at the time the assignment took place. More importantly, even descriptors don’t provide any way to identify what name was used to access them, which can be a considerable problem when trying to cache information or interact with other features that require the use of a name, such as that of a database column.

Instead, by using a metaclass, Django can intercede at the point where Python is processing the class, and use the presence of a contribute_to_class() method to identify objects that need to be handled differently. If this method exists, it’s called instead of the standard setattr(), allowing the field to register itself in whatever way is most appropriate for its purpose. When doing so, Django also provides the class itself as an argument, as well as the name it was given, which was discovered while looking through the attributes assigned to the class. Therefore, in addition to the usual self, this method receives two arguments.

cls—The actual class object of the model the field was assigned to. This can be used to customize the field based on the name or other attributes of the model itself.
name—The name, as a string, of the attribute as it was assigned to the model’s class. Fields will typically store this away as an attribute of the field itself, for future reference.

Once these two arguments have been processed in whatever way is appropriate for the field, the method shouldn’t return anything, as its return value is ignored by Django.

CONTRIBUTE_TO_CLASS() VS SETATTR()

There is one very important thing to keep in mind when dealing with contribute_to_class(). It’s been mentioned a few times already in various places, but it’s so important that it merits driving home very explicitly. If Django identifies an object as having a contribute_to_class() method, only that method will be called.

Normally, setattr() is used to set attributes on an object such as a class, but since model fields don’t get set in the standard namespace, that step is skipped intentionally. Therefore, if a custom field does in fact need to be set as an attribute on the model class itself, doing so is the sole responsibility of the field itself, during the execution of its contribute_to_class() method.

Sometimes, fields will instead need to set some other object, such as a descriptor, as the attribute on the class, to provide additional customizations for other types of access. This, too, is the responsibility of the field class, and the only time to do so in a way that will maintain the appearance of a standard field is during the execution of its contribute_to_class() method.

In the case of standard Django fields, and perhaps for many types of custom fields and other objects that behave as fields, this avoidance of setattr() is quite intentional. If that behavior is desired, contribute_to_class() should simply avoid setting anything on the model class, and Django’s own behavior will make sure that nothing is assigned to the class itself.

contribute_to_related_class(self, cls, related)

For fields that relate themselves to other models, this is called once the related model is available, so that attributes can be added to that model as well. For example, this is how Django provides a reverse attribute on a related class when a ForeignKey is applied.

The two arguments it receives are cls, the model class the relationship was actually applied to, and related, the model the relationship points to, where other attributes may yet need to be applied. Like contribute_to_class(), this shouldn’t return anything, as it would simply be ignored anyway.

Altering Data Behavior

Given that most field types exist to interact with specific data types, one of the first things to consider is how to tell Django to handle that data type. This includes how to store it in the database, how to ensure validity of its value and how to represent that value in Python. These are some of the most fundamental aspects of field behavior, and properly altering them can open up a world of possibilities.

get_internal_type(self)

This method returns a string, which helps determine how the database should store values for the field. The string itself isn’t an actual database column type, but instead it’s applied to a mapping provided by the database backend to determine what type of column to use. This way, fields can be written without being tied to a specific database backend.

Because the return value for this function gets applied to a known dictionary of types to retrieve the database column name, that value must be a valid entry in that dictionary. Therefore, there’s a finite set of possible return values, which are listed here.

AutoField
BigIntegerField
BooleanField
CharField
CommaSeparatedIntegerField
DateField
DateTimeField
DecimalField
FileField
FilePathField
FloatField
ImageField
IntegerField
IPAddressField
NullBooleanField
OneToOneField
PositiveIntegerField
PositiveSmallIntegerField
SlugField
SmallIntegerField
TextField
TimeField

validate(self, value, instance)

When a model is being checked for the accuracy of its values, this method is used to determine whether the field’s contents are correct. The arguments it receives are the value of the field itself, and also the model with all of its fields. This allows it the option of validating not only the field’s own value, but also that it makes sense in the context of the greater model.

It should be obvious why this would be of use when validating an individual field’s value, but it’s less clear what value lies in using the rest of the model’s values. After all, when writing a field, there’s typically no way to know what other fields will be used alongside it.

Sometimes, however, a field may be written specifically for a particular model, and can therefore know in advance what the entire model will look like. In these cases, the field can, for example, check to see what type of account a person has, because the maximum value for the field depends on that other field.

to_python(self, value)

The value of a field can be stored in a number of different ways, depending on where it’s being stored. In a database, it can be one of a few basic types, such as strings, integers and dates, while when serializing a model, all values will be coerced to strings. That means that often, when instantiating a model, its value has to be forced back into its proper Python representation. This behavior is handled by the to_python() method, though it’s not quite as straightforward as it may seem on the surface.

The first thing to consider is that the value passed to to_python() could be one of a number of representations of the data. For instance, it could be whatever format is returned from the database adapter, such as a string, integer or native Python date, but it could also be a string retrieved from a serializer, or if the field manages a more complex custom data type that needs to be initialized, the value could actually be a fully-initialized instance of that type.

To illustrate this, consider the situation of BooleanField. Values that get passed into it could come in a variety of forms, so it’s to_python() method needs to anticipate this and make sure that it always returns a Boolean value or throws an exception indicating that the value wasn’t suitable for the field.

def to_python(self, value):
    if value in (True, False): return value
    if value in ('t', 'True', '1'): return True
    if value in ('f', 'False', '0'): return False
    raise exceptions.ValidationError(_("This value must be either True or False."))

As you can see, it has to check for a few different types of values that could all be coerced into Boolean values reliably. In addition to the native True and False, it checks for the string representations of the same, as well as a couple single-character representations that might turn up in various situations. If it finds something suitable, it simply returns the appropriate native Boolean value, raising the ValidationError described in the previous section if a suitable value couldn’t be found.

Unfortunately, to_python() is an extra method call that’s not always necessary, so it’s not always called when it seems like it would be. In particular, it’s provided mainly for validating data prior to committing to the database and when retrieving content from serialized data, so when retrieving from the database, it’s assumed that the data has already been validated, and the database backends generally suffice for returning the proper type.

Because of this, Django doesn’t call to_python() when retrieving data from the database. For the built-in types, and many potential add-on fields, this is sufficient, but for other data types or complex objects, some more work will be done to convert the database value to something appropriate to work with. To support these types of fields, Django provides a special way to force to_python() to be called when populating the field’s value.

Supporting Complex Types with SubfieldBase

Sometimes databases just don’t have the necessary data types to support certain types of applications. For example, most databases don’t have a way to store a length of time and present it to Python as a datetime.timedelta ⁴ object. PostgreSQL has a column type called interval ⁵ for this purpose, which does map directly to a Python timedelta as it should, but other databases don’t, which makes this impractical in terms of reusability. It would work suitably for PostgreSQL, but in order to make an application portable, it needs to be usable with more than one database.

Thankfully, timedelta stores its values in days, seconds and microseconds, and can write the entire value based on just a number of seconds passed in as a float. Therefore, it’s possible for a new DurationField to use a DecimalField to store a value in the database, convert to a float in Python, then pass it into timedelta for use on the model instance.

import datetime
import re
 
from django.core.exceptions import ValidationError
 
def to_python(value):
    if isinstance(value, datetime.timedelta):
        return value
    match = re.match(r'(?:(d+) days?, )?(d+):(d+):(d+)(?:.(d+))?', str(value))
    if match:
        parts = match.groups()
        # The parts in this list are as follows:
        # [days, hours, minutes, seconds, microseconds]
        # But microseconds need to be padded with zeros to work properly.
        parts[4] = groups[4].ljust(6, '0')
        # And they all need to be converted to integers, defaulting to 0
        parts = [part and int(part) or 0 for part in groups]
 
 
        return datetime.timedelta(parts[0], parts[3], parts[4],
                                  hours=parts[1], minutes=parts[2])
    try:
        return datetime.timedelta(seconds=float(value))
    except (TypeError, ValueError):
        raise ValidationError('This value must be a real number.')
    except OverflowError:
        raise ValidationError('The maximum allowed value is %s' % 
                                         datetime.timedelta.max)

This is the type of process that simply can’t be handled without using to_python(), and it must take place every time the model is instantiated, even when coming from the database. However, calling an extra method call on every access from the database can get quite expensive, so it’s essential to be able to handle this without penalizing those fields that don’t use it.

As will be shown at the end of this chapter, a descriptor can be used to customize what happens when a field’s value is accessed, which can be an excellent way to control this type of behavior. Of course, descriptors can be tricky if they’re just a means to an end, and the to_python() behavior described here is a fairly common need for these complex data types, so Django provides a shortcut to ease the creation of this descriptor.

Located at django.db.models.fields.subclassing, the SubfieldBase metaclass is Django’s way of easing the creation of model fields whose to_python() method will always be called. By simply applying this to a model class, it takes care of the rest, setting up a descriptor that calls to_python() the first time the field is loaded. Therefore, the DurationField example would use this in the field definition as follows:

from django.db import models
from django.db.models.fields.subclassing import SubfieldBase
 
class DurationField(models.DecimalField, metaclass=SubfieldBase):
    pass
 
    # Field logic then continues here

Controlling Database Behavior

Another important aspect of fields is how they interact with the database. This can include how the data itself is stored, how it’s prepared before being sent to the database and how it’s prepared for comparison with values already in the database. This process is already taken by Django itself, with every existing field type providing a few methods to define this behavior.

For custom fields, it’s often necessary to override this behavior, interacting with the database in ways other than how Django itself would expect to do so. The following methods define nearly every aspect of how a field works with the database, so fields have a great deal of control over how the database interaction is handled.

db_type(self, connection)

Rarely overridden by individual fields, this method returns a database-specific string that controls how the column is created for use with the given field. Django internally uses the result of the get_internal_type() method in conjunction with a mapping provided by each individual backend to provide a return value from this method. That functionality is enough for the vast majority of field applications.

The most important thing to remember when considering the use of this method is that its return value is specific to a particular database backend. In order to use this field in projects with different backends, the connection argument is provided to help you decide what to use. In a simple case, you can use connection.settings_dict['ENGINE'] to determine what type of database the field is being used on, and behave accordingly. For example, if DurationField could in fact use interval in PostgreSQL, while still supporting other databases:

class DurationField(models.Field):
    def db_type(self, connection):
        engine = connection.settings_dict['ENGINE']
        if engine == 'django.db.backends.postgresql_psycopg2':
            return 'interval'
        else:
            return connection.creation.data_types['DecimalField']

One other feature of this method is that if you return None instead of a string, Django will skip the creation of this particular field. This can be necessary if the field must be created in a more complicated fashion than a single string can represent. Django will still attempt to reference the column when executing queries, though, so you’ll need to make sure you do in fact create the column before attempting to use this field.

Most of time, you’ll want to leave this method to Django, but it does provide a way to override the default behavior when you really need to. Just be careful doing this in a distributed application, because you’ll end up having to support multiple types of databases, not just the one you’re most familiar with.

get_prep_value(self, value)

There are a few methods that deal with preparing a value for different kids of use within the database, but they typically share the same code for preparing a value for use in the database at all. The get_prep_value() method is used by both of the following methods to perform this basic conversion.

In most cases, converting a Python object to some more basic type will suffice to allow a custom field to pass values to the database. By overriding get_prep_value(), the other database preparation methods can typically use their default implementations without issue. For example, DurationField requires this type of conversion, since timedelta objects can’t be passed directly to most databases, which led to using a DecimalField to control the column’s behavior. A custom get_prep_value() method can convert timedelta objects to Decimal values, which can then be passed to the database normally.

from django.db import models
from django.db.models.fields.subclassing import SubfieldBase
from django.utils import _decima l
 
class DurationField(models.DecimalField, metaclass=SubfieldBase):
    def get_prep_value(self, value):
        return _decimal.Decimal('%s.%s' % (value.days * 86400 + value.seconds,
                                           value.microseconds))
 
    # Field logic then continues here

get_db_prep_value(self, value, connection, prepared=False)

In cases when you need to prepare the value differently for different database connections, this method will allow you the flexibility to do so. The connection argument again represents the database connection being used, and can be used to make the necessary decisions about how to proceed. The prepared argument indicates whether the value has already been passed through get_prep_value(). If False, you should call that method before proceeding further.Here’s what DurationField could look like if it continued to split up its behavior between PostgreSQL and other databases:

from django.db import models
from django.db.models.fields.subclassing import SubfieldBase
from django.utils import _decimal
 
class DurationField(models.DecimalField, metaclass=SubfieldBase):
    def get_prep_value(self, value):
        # Nothing to do here, because get_db_prep_value() will do the dirty work
        return value
 
    def get_db_prep_value(self, value, connection, prepared=False):
        if not prepared:
            value = self.get_prep_value(value)
        engine = connection.settings_dict['ENGINE']
        if engine == 'django.db.backends.postgresql_psycopg2':
            # PostgreSQL can handle timedeltas directly
            return value
        else:
            return _decimal.Decimal('%s.%s' % (value.days * 86400 + value.seconds,
                                               value.microseconds))
 
    # Field logic then continues here

get_db_prep_save(self, value, connection)

This works much the same as get_db_prep_value(), but offers a way to offer separate behavior when actually saving values into the database, as opposed to other operations. In fact, if you don’t provide an implementation for this method, the default behavior will simply defer to get_db_prep_value(), which will usually suffice.

get_prep_lookup(self, lookup_type, value)

Another area where fields have to interact with the database is when making comparisons between Python objects and values already stored in the database. This takes place every time a QuerySet’s filter() method is used, for instance, in order to generate the necessary database query. Since comparisons might require different handling than saving, Django uses the get_prep_lookup() method to manage this task.

When called, this method receives two explicit arguments, detailing how the lookup is expected to take place. The first, lookup_type, is the type of comparison that was requested in the filter() method. The second, value, is the Python object that was provided for comparison against database values.

While value is fairly straightforward, lookup_type is a little different, because it’s a string containing the requested comparison type. There are several of these available as part of Django’s database API,⁶ each having its own expectations. This is the full list, including the purpose of each:

exact and iexact—The supplied value must match exactly with what’s present in the database, with iexact being case-insensitive. Django assumes a filter without a lookup type to mean exact, which will be passed in to get_prep_lookup().
contains and icontains—The supplied value must be present in at least part of the value present in the database, with icontains being case-insensitive.
gt and gte—The database value must compare as greater than the value supplied to the lookup, while gte also allows for the values to be equal.
lt and lte—The database value must compare as less than the value supplied to the lookup, while lte also allows for the values to be equal.
in—The database value must exactly match at least one of the values present in a list supplied as the lookup value.
startswith and istartswith—The database value must begin with the string supplied as the lookup value, with istartswith being case-insensitive.
endswith and iendswith—The database value must end with the string supplied as the lookup value, with iendswith being case-insensitive.
range—The database value must with the range specified by a 2-tuple of beginning and ending limits supplied as the lookup value.
year, month and day—The database value must contain the specified lookup value as its year, month or day portion, depending on which lookup type was used. This is valid for dates only.
isnull—The database value must be equivalent to NULL in order to be matched.
search—The database value must pass a full-text index search. This is valid only for MySQL, and only if the database has been modified to enable the necessary indexing.
regex and iregex—The database value must match the format specified by the regular expression supplied as the lookup value, with iregex being case-insensitive.

Fields that inherit from some existing field can usually avoid overriding this method, as the parent class usually does the right thing. Other times, unfortunately, the child class needs specific handling for certain lookup types, where this can be quite useful. Still other times, it’s necessary to restrict certain types of lookups entirely.

One useful side effect of having Python code executed as part of the lookup process is that it allows exceptions to be thrown for lookups that aren’t valid for that field. This works just like anywhere else, where if you raise an exception, it will bail out of the query early, displaying a message indicating what happened.

WHERE’D MY ERROR GO?

Unfortunately, even though it’s possible—and often quite useful—to raise exceptions within get_prep_lookup(), sometimes you may find that they get suppressed. If this happens, the query will appear to execute, but you’ll likely receive just an empty list as its result, rather than seeing your error.

Due to the some of the hoops QuerySets have to jump through internally, certain types of errors—including TypeError, which seems like an obvious choice to use—get caught and suppressed, causing Django to move on with the process in spite of not getting a valid value for that field.

In order to make sure that the error gets raised to its fullest and works as expected, be sure to use ValueError instead of TypeError, as it doesn’t get caught in the same trap.

get_db_prep_lookup(self, lookup_type, value, connection, prepared=False)

This performs essentially the same task as get_prep_lookup(), except that its output will be fed directly into the database query. It receives the same arguments, with the addition of connection and prepared, which work just like the arguments passed into get_db_prep_value(). The default implementation defers to get_prep_lookup(), which will be sufficient for most needs.

Dealing with Files

Many applications have need to manage content that goes beyond what’s traditionally stored in database. Beyond the usual numbers and strings, there’s a world of other data formats, from audio and video to print-ready Portable Document Format (PDF) files and plenty more. Content like this isn’t well suited for being stored directly in the database—though in some cases it’s at least possible—but it’s still useful to tie it to other content that is in the database.

To handle this, Django provides a special FileField, with extra methods designed to facilitate access to files. It also uses many of the hooks described in this chapter to store a reference to the file in the database, as well as provide a special object that can access files in a portable manner. Django also provides an ImageField, which inherits much of its functionality from FileField, while adding some of its own, specifically tailored for dealing with the special needs of images.

Subclasses of FileField shouldn’t generally need to override many of its methods, since they’re mostly related to those features of a file that are common to all file types. This includes things like the filename and relative path, which don’t have anything to do with the specifics of a particular type of file. Some, however, such as save_file(), can be overridden to provide special handling of attributes related to a specific type of file.

get_directory_name(self)

This method simply returns a relative path that will be stored in the database along with the filename. By default, this looks at the upload_to attribute of the field to determine what the directory should be, and even subclasses should respect this behavior. Exactly how that attribute is used, however, is where subclasses can customize this method to great effect.

Normally, Django creates a directory name using two pieces of information: the upload_to string itself and the current date. The date the file was uploaded is applied to the directory name, replacing certain characters with portions of the date. This allows individual fields to more accurately control where their files are stored, which helps keep directories smaller, and can possibly even make better use of disk capacity.

In a subclass, however, it may be more useful to generate the directory name based on some other type of information, such as the current site’s domain name in multisite setups, or the Internet Protocol (IP) address of the machine where the upload was received, in larger production environments where there are multiple Web servers sharing common storage.

Essentially, anything’s fair game here, as long as it only requires information that can be determined by only having access to the FileField instance. The current site or IP address can be obtained without regard to the current model at all, as can the current time. Other information, however, such as the user who submitted the file, the IP address of his or her remote computer, or the object the file will be attached to, is not accessible from this function, and thus can’t be used.

Of course, there is another option to specify some of this additional information, but doing so bypasses this method entirely. By specifying a callable for upload_to, as described in Django’s file documentation,⁷ the directory can be generated based on the object it will be attached to, which may include the User who owns the object.

Note that when using a callable as upload_to, that callable is expected to return the entire path, including the directory and filename, so get_directory_name() won’t be called at all in such cases, unless that callable explicitly calls it. Also, the incoming request still isn’t available, even to that callable, so making directory naming decisions based on that information will require a custom view.

get_filename(self, filename)

This works in much the same way as get_directory_name(), except that it’s responsible for specifying the filename portion of the path instead of the directory. It receives the original filename that was specified with the incoming file, and returns a new filename that will be used in the database, as well as the underlying storage system.

If a FileField subclass has need to customize the filename that will be used for a particular file, such as stripping out certain characters or altering the file’s extension, this would be the place to do it. That’s also why it receives the original filename as well, so that it has a way to create a filename that’s at least partially related to the one provided by the user.

By default, its output is combined with that of get_directory_name() to form the full path to be stored in the database and passed to the storage system. Like its counterpart, however, this is only true if the upload_to argument to the field was not a callable. If a callable was specified, it’s responsible for specifying the entire path, including the filename. Therefore, in such cases, this method will only be called if the upload_to callable specifically requests it.

generate_filename(self, instance, filename)

This is the default method used to generate the entire path. It uses the same function signature as a callable upload_to argument, because it plays the exact same role. In fact, internally to FileField, all references for generating the filename to be used for the file reference this method; if a callable was supplied to upload_to, it’s simply assigned to this same name, replacing the default behavior.

The default behavior is to use os.path.join() to combine the output of both the get_directory_name() and get_filename() methods, ignoring the model instance provided as an argument. If a FileField subclass needs the ability to specify the file’s entire path all at once, this method would be the place to do it.

Of course, remember that if a callable was supplied as the upload_to argument, this method will get replaced. This is true regardless of what behavior is supplied by a FileField subclass; the needs of a specific instance always win over the behavior of its class. So, while overriding this behavior can provide a more useful default, it doesn’t remove an individual developer’s ability to replace it entirely.

save_form_data(self, instance, data)

This is a utility method for forms to use as a shortcut for saving a file associated with a model instance. It accepts an instance of the model the field was attached to, as well as the uploaded file data provided by the form. By default, it just extracts the necessary information from the uploaded file object, and passes it through to the standard file saving methods.

The instance argument is an instance of the model where the FileField was defined, and the data argument is an UploadedFile object, as described in Chapter 8. The uploaded file contains a name attributes, which contains the filename and a read() method, which is used to access the file’s contents, so that it can be saved properly.

As this is the primary way files are handled by most areas of Django itself, overriding this field provides an excellent opportunity to tie into extended functionality based on specific field types. For example, Django’s own ImageField uses this as an opportunity to store the width and height of an image in separate fields, so they can be indexed and searched in the database directly. Other file types could take this same approach, storing certain attributes of the file in other fields for easier access later on.

Since this method gets access to the entire file’s contents, it’s possible to pass those contents into most libraries that deal with files. Anything that can read an open file object can process uploaded content by simply wrapping it in a StringIO ⁸ object. That way, the contents can be accessed without having to write them to the storage system first, only to have to read them back again.

delete_file(self, instance, sender)

While this may look like simply a way to delete a file, it actually serves a very particular purpose, which is alluded to by the presence of a sender argument. The contribute_to_class() method of FileField sets up this method as a listener for the post_delete signal. It’s not intended to be called individually, but instead it gets called every time a model instance with a FileField is deleted. As described for post_delete, the instance argument is the object that was just deleted, and the sender argument is the model class for that instance.

When triggered, it checks to see if the file referenced by this field on the specified instance should be deleted. After all, if no other instances are referencing the same file, and it’s not the default values for new instances, it’s quite likely that no references to the file remain. In those cases, the file is permanently removed from the storage system.

The uses for overriding this are clear, because the logic for when to delete the file are included directly within this method. If a FileField subclass needs to have different rules for this, simply overriding this method is enough to make it happen.

The obvious example is if files should always remain, for historical reasons, even after the model instances associated with them have been deleted. Providing that behavior is a simple matter of just defining an empty implementation of this method.

from django.db import models
 
class PermanentFileField(models.FileField):
    def delete_file(self, instance, sender, **kwargs):
        pass

Of course, there are other possible use cases for this as well, but the specifics of what those would look like will depend very much on the needs of an individual application.

attr_class

As a simple attribute, rather than a method, attr_class might not seem like it would provide much power or flexibility. Thankfully, looks are often deceiving, as it’s actually the gateway to some very useful features. The attr_class attribute is set to a class that will be used to represent the field’s value when referenced in Python. That means that the value of this simple attribute is actually the primary way of specifying what features are available on the public API for data entered into a particular FileField instance.

The following section describes the behavior of the class specified by default for this attribute, and how its methods can be overridden to provide additional functionality.

Customizing the File Class

When a model defines a FileField, the value made available as the attribute on actual model instances is a special object designed specifically for managing files. Located at django.db.models.fields.files, the File class provides a number of platform-independent and storage-independent methods for accessing a file’s content and properties of that content, as well as for saving new files and deleting existing ones.

Because it’s the public-facing API for accessing files, it’s often quite useful to provide additional functionality for file types that have common qualities that will need to be referenced often. This provides a nice, clean, object-oriented way to encapsulate that common code in one place, rather than requiring the rest of the application to write it over and over again.

For example, Django’s own ImageField provides its own subclass, ImageFile, which contains additional methods for accessing the width and height of an image, as well as caching it to speed up subsequent accesses. It’s an excellent example of how easy it is to provide this extra functionality.

In addition to providing new methods, though, there are a number of existing methods that could benefit from being overridden. These are a bit less likely to be of use directly, but as ImageFile shows, they can be used to perform some important tasks, such as updating or invalidating cached values.

For the most part, the methods described next map directly to file storage methods described in Chapter 8. The main difference is that these are specific to a particular file type, and can be customized for aspects that are unique to that file type, while storage systems are just designed to work with files, without regard to what type of content gets handled.

path(self)

This returns the path of the file, if it’s stored on the local filesystem. For files stored on other backends, which can’t be accessed with Python’s built-in open() function, this will raise an AttributeError, because the corresponding method isn’t available on the related storage system object.

This is provided mostly as a compatibility layer with older versions of Django, for those projects that were written before the introduction of this new file handling system. In the real world, projects written for newer versions of Django should avoid the use of this method, and instead use the open() method listed in this section to access files in a more portable fashion. Overriding it will also be of little use, so it’s listed here for completeness with the rest of the API.

url(self)

This method returns the URL where the file can be retrieved on the Web. It might be served up from the Django project itself, a media server operated by the site’s owners, or even a storage service operated by a third party. The exact details of where this URL comes from are specified by the storage system, so this method is a portable way to access the URL for the file.

Overriding this provides little benefit for the most situations, but there are a few reasons to do so, depending on the situation. One example might be a FileField subclass that manages HTML files with a specific structure, so that the URL might contain a name reference, to direct browsers to a specific point in the file.

size(self)

This retrieves the size of the underlying file, caching it for future reference. While this can be a very useful feature, there’s little value in overriding it in a subclass. The nature of file size is such that it doesn’t vary depending on file type, and there’s not really anything that can be done to customize how the size is obtained, so it’s just included here for completeness.

open(self, mode=‘rb’)

This retrieves the file’s content and returns an open file or file-like object, which allows access to the file. This is the preferred method of accessing a file’s contents in a portable fashion, since it passes through to the storage system for the majority of its functionality.

The mode attribute takes all the same options as Python’s own open() function,⁹ and can be used to open the file for read or write access. One use of overriding this method could be to change the default access mode, but only for changing whether it should be opened in binary mode by default or not. The default should always at least be to open the file for reading, rather than writing.

Another potential reason to subclass this would be to provide custom behaviors to the returned file-like object. By default, this method will return whatever object is returned by the storage system, but particular file types might have use for customizing methods on that object, such as write() or close() to alter how and when the file is written. Because this method is responsible for returning an open file-like object, it can wrap the true file-like object in another, passing through to the real object after doing whatever extra work needs doing.

save(self, name, content, save=True)

As the name implies, this saves a new file to the storage system, replacing the file currently in place on the model instance. The arguments should be mostly self-explanatory, with name being the name the new file should be saved as, and content being the actual contents of the file to written using that name.

Of course, invalid characters in the filename or existing files with the same name could result in the filename being changed by the storage system. Such changes will be reflected in the filename that’s stored on the model instance.
The save argument, however, merits further explanation. Because this saves a file that’s related to a model instance, the new filename will be stored on that instance for future reference. However, it’s not always beneficial to commit that change to the database immediately.
By default, it does save the instance right away, but if save is set to False, this will be bypassed, allowing additional changes to take place before committing to the database. Take care when doing this, however. The file will already have been committed to the storage system, so failing to eventually save the instance with the new filename will result in a file with no references to it.
Overriding this can provide a way to customize or record the filename that will be used, to change the default database commitment behavior, or perhaps most commonly, to retrieve information about the file’s contents and update any cached information accordingly. The default File object does this for the filesize, and ImageFile also updates its dimensions cache.

delete(self, save=True)

Also fairly self-explanatory, this deletes the file directly from the storage system, regardless of which storage system is being used. It also removes the filename from the model instance, so that it no longer references the file.

The save argument works just like the one from the save() method, determining whether the model instance is saved or not. Also like save(), if False is provided, it’s important to make sure the instance is in fact saved eventually. Otherwise, it will contain a reference to a file that has already been deleted. Perhaps worse yet, if another instance saves a file with the same name, the reference from the first instance will no longer be orphaned, but will in fact point to the wrong file entirely.

Overriding this provides most of the same benefits as overriding save(), by being able to remove any cached information so it doesn’t cause confusion if accessed later.

Signals

Chapter 2 described the signal dispatching system bundled with Django, and how signals work in general. As explained, signals can be created and made available from any Python module, and can be used for any purpose. For dealing with models, several signals provided out of the box, and can be used in a number of situations.

The following signals are all available at django.db.models.signals, and each sends the model class as the standard sender argument to the listener. In addition, many signals include a model instance as an additional argument. These and other additional arguments are detailed in the descriptions of each individual signal listed here.

class_prepared

This signal fires when Django’s ModelBase metaclass has finished processing a model class, indicating that the class is completely configured and ready to be used. Since the metaclass operates as soon as Python encounters the class declaration, class_prepared is fired before Python even continues processing the module that contains that declaration.

One important note to consider, however, is that this fires just prior to the model being registered with AppCache. Therefore, if a listener for class_prepared looks through AppCache to inspect the models that have been processed up to that point, the model that fired the signal won’t yet be present. There may be some uses for inspecting the application cache at this point in the process, but without a full application cache, its value is quite limited.

Unlike most of the other signals listed in this section, class_prepared only sends the standard sender argument. Since there isn’t any instance available at the point in time when the signal is fired and the _meta attribute on the new model class contains all the information about how it was declared, the model itself is enough to obtain all the information that’s available at that point in time.

>>> from django.db import models
>>> def listener(sender, **kwargs):
...     print('%s.%s' % (sender._meta.app_label, sender._meta.object_name))
...
>>> models.signals.class_prepared.connect(listener)
>>> class Article(models.Model):
...     title = models.CharField(max_length=255)
...     class Meta:
...         app_label = 'news'
...
news.Article

Like all signals, listeners for class_prepared can be registered with or without a specific model to listen for, though it may not seem like this would be possible. After all, if the listener must be registered prior to the signal being fired, and the signal is fired before Python even continues with the rest of the module, how can it possibly be registered with a class to listen for? Even if it could, what possible purpose could it serve?

The answer to both of these questions is contribute_to_class(). Remember that attributes on a model are given the opportunity to customize how they’re applied to the model. When an object with a contribute_to_class() method is encountered, that’s called instead of the usual setattr(), where it’s passed the model class and the attribute name, allowing the object to perform whatever functionality it wants to.

The key here is that contribute_to_class() receives the model class as an argument. It makes for an excellent opportunity to register a listener for class_prepared specifically for the class being processed. In fact, depending on the need at hand, this is not only possible, but could be downright essential.

Consider a situation where a field-like object needs to know everything about the model it’s attached to in order to properly configure itself. Since there’s no guarantee that all the other fields have been processed by the time contribute_to_class() is called on the object in question, it’s necessary to defer the rest of the configuration until the class has finished processing.

pre_init and post_init

When a model is instantiated, pre_init fires before any other work is performed. It gets dispatched even before any of the arguments passed into the model are assigned to their appropriate attributes. This is a good opportunity to inspect the arguments that will be assigned to the instance prior to that actually happening, especially since this allows a listener to fire before encountering any errors that might come as a result of the arguments specified.

Because this takes place prior to any of the field values being populated on the object itself, it doesn’t send the new object along when the signal is fired. Instead, it passes along two additional arguments besides sender that correspond to the positional and keyword arguments that were passed in to the model.

args—A tuple containing the positional arguments that were passed to the model constructor
kwargs—A dictionary containing the keyword arguments that were passed to model constructor

Note that even though these are the same names as those usually given to the excess argument technique described in Chapter 2, these are passed to the listener as explicit keyword arguments, rather than using * and **. Listeners must define these arguments explicitly in order for them to work properly.

>>> from django.db.models.signals import pre_init
>>> from news.models import Article
>>> def print_args(sender, args, kwargs, **signal_kwargs):
...     print('%s(*%s, **%s)' % (sender._meta.object_name, args, kwargs))
...
>>> pre_init.connect(print_args, sender=Article)
>>> article = Article(title=u'Testing')
Article(*(), **{'title': u'Testing'})

Similarly, post_init gets fired as part of the model instantiation process, but at the end instead of the beginning, once all the arguments have been mapped to the appropriate attributes based on the fields that were defined on the model. Therefore, as the name implies, the object is completely initialized at this point.

It would make sense, then, that when post_init fires, it gets passed the fully configured model instance as well as the standard sender, which is the model class. The new object is passed in as the instance argument to the listener, which can then do with it whatever is necessary, according to the application.

>>> from django.db.models.signals import post_init
>>> from news.models import Article
>>> def print_args(sender, args, kwargs, **signal_kwargs):
...     print('Instantiated %r' % instance)
...
>>> post_init.connect(sender=Article)
>>> article = Article(title=u'Testing')
Instantiated <Article: Testing>

pre_save and post_save

When a model instance is being committed to the database, Django provides two ways to hook into that process, both at the beginning and at the end. The primary difference, therefore, between the two is that pre_save is called before the object was committed to the database, while post_save is called afterward. This simple distinction can be very important, depending on the needs of the application.

When triggered by pre_save, a listener receives the model class as sender, and also the instance of the model as instance. This allows the listener to get access to—and even modify—the instance that’s about to be saved, before it hits the database. This can be a useful way to provide or override default arguments for models provided by third-party applications.

On the other hand, post_save is called after the save has been performed, and the instance has been committed to the database. This is a useful step in two ways, because it not only ensures that the data is in fact present in the database, which is necessary when dealing with related models, but it also occurs after Django has made the decision about whether to insert a new record into the database or update an existing record.

In addition to the sender and instance arguments that work the same way as in pre_save, listeners for post_save can receive another argument. The created argument is a Boolean indicating whether or not the instance had to be created from scratch. A value of True means it was newly inserted into the database, while False means an existing record was updated. When using the post_save signal to track database changes, this is an important distinction, and can be used to determine the behavior of other applications. To see this in action, see the history example in Chapter 11 of this book.

Because a model manager’s create() method does in fact commit a new instance to the database, it fires both of these signals. It’s also safe to assume that any time create() is used, the created argument will be True, but just remember that there may well be other times when that argument is also True.

>>> from django.db.models import signals
>>> from news.models import Article
>>> def before(instance, **kwargs):
...     print('About to save %s' % instance)
...
>>> signals.pre_save.connect(before, sender=Article)
>>> def after(instance, created, **kwargs):
...     print('%s was just %s' % (instance, created and 'created' or 'updated'))
...
>>> signals.post_save.connect(after, sender=Article)
>>> Article.objects.create(title='New article!')
About to save New article!
New Article! was just created<Article: New article!>

A NOTE ABOUT COMBINING PRE_SAVE() AND POST_SAVE()

There’s another very important difference between pre_save and post_save, because they’re not always called as a pair. Because pre_save is triggered at the beginning of the process, you can reliably assume that it will always be called every time a save() is initiated. However, post_save only happens at the end, so if anything goes wrong during the save itself, post_save won’t get triggered.

This is an important distinction, because it may seem convenient to register a pair of listeners for the model saving signals, expecting that both will always be called every time. While that may be true for the majority of cases, and certainly when nothing goes wrong, things do go wrong sometimes. Examples include an entry with a duplicate primary key or other unique column, data being of the wrong type or a timeout connecting to the database.

In situations where this type of behavior is required, the only reasonably sane way to go about it is to override the save() method on the model. This allows custom code to be run before and after the actual database interaction, but it also provides a way to identify problems that occurred in the process. In addition, it allows the code a better opportunity to pair the two pieces of functionality more fully, since if something does go wrong, it’s easier to identify, and thus any pending actions can be canceled as a result.

pre_delete and post_delete

Similar to the previous section in spirit, pre_delete and post_delete are the pair of signals relating to the deletion of model instances. They function almost identically to their saving counterparts, except that they both provide just the sender and instance arguments.

When using post_delete, keep in mind that the instance passed in to the listener will have already been removed from the database, so many of its methods will raise exceptions if used. This is especially true if it had previously related to instances of other models. Those relationships will have been lost by the time post_delete is triggered, so any handling of those situations should be done in pre_delete or by overriding the delete() method on the model. If you do override the model’s delete() method, you’ll need to make sure to access the model and its relationships prior to calling the delete() method on the parent class. Once you delete it through the parent class, you’ll be in the same situation as when using the post_delete signal.

Also, because the instance will have been deleted, its primary key value will no longer match up with anything in the database. However, in order to more accurately keep track of which object was deleted, the primary key value is left intact on the instance, and can be read using the pk shortcut described earlier in this chapter.

post_syncdb

Unrelated to a specific model, post_syncdb is instead triggered as part of the syncdb management command’s normal process. It provides a way for applications to identify when an application’s models have been installed into the database, in order to perform other tasks based on their definitions.

While there are likely other uses for this as well, the primary use for post_syncdb is to either configure the application itself the first time its models are installed in the database, or to identify other applications that are being installed, taking action appropriately. Within Django itself, there are examples of both types of functionality.

The django.contrib.auth application uses it to install permissions for new models into the database, as soon as the models are installed, as well as to create a new superuser if the auth application itself was just installed.
The django.contrib.contenttypes application uses it to maintain its own record of what models are in use, so it can provide relationships to any installed model.
The django.contrib.sites application uses it to install a default site for all new projects that use the application.

The key to making post_syncdb considerably effective is that it uses a different type of value for the sender argument that accompanies all signals. Instead of using a specific model, this signal sends the application’s models module, which is the object Django uses to identify an application. This allows a listener to be configured either for all applications or just the one that registered it.

All applications listed in the INSTALLED_APPS setting emit a post_syncdb signal every time the command is executed, even if nothing has changed. Therefore, in addition to sender, listeners of post_syncdb receive three additional arguments to indicate with more detail the circumstances under which syncdb was called, and help control their behavior in response.

app—The application object (its models module) representing the application that was just synchronized with the database. This is exactly the same as the sender argument, but is named app here to make listener functions a bit more readable.
created_models—A Python set containing all the models for the application that were actually installed into the database during the execution of syncdb. This is how a listener can identify just those models that are new, which is usually the most important thing a post_syncdb handler needs to know. This will always be provided, but in the case of an application where nothing is new, it will simply be an empty set.
verbosity—An integer identifying the verbosity level requested by the user who executed syncdb. Valid values are 0, 1 and 2, with 0 being minimal output (nothing in most cases), 1 being normal output and 2 being all output (including messages indicating actions being performed, even they don’t require user input). Listeners for post_syncdb should always be prepared to output what activities they’re performing, and should use this argument to identify when different messages should be displayed.

from django.db.models import signals
 
def app_report(app, created_models, verbosity, **kwargs):
    app_label = app.__name__.split('.')[-2]
 
    if verbosity == 0:
        # Don't do anything, because the
        # user doesn't want to see this.
        return
 
    # Get a list of models created for just the current application
    app_models = [m for m in created_models if m._meta.app_label == app_label]
 
    if app_models:
        # Print a simple status message
        print('Created %s model%s for %s.' % (len(app_models),
                                              len(app_models) > 1 and 's' or '',
                                              app_label))
        if verbosity == 2:
            # Print more detail about the
            # models that were installed
            for model in app_models:
                print('  %s.%s -> %s' % (app_label,
                                          model._meta.object_name,
                                          model._meta.db_table))
 
    elif verbosity == 2:
        print('%s had no models created.' % app_label)
 
signals.post_syncdb.connect(app_report)

Code for post_syncdb listeners is generally placed in an application’s management package, which is automatically loaded whenever manage.py is used for a project containing that application. This ensures that it doesn’t get unnecessarily loaded for situations where it’s not needed, while also making sure that it does get loaded whenever it might be necessary. Also, since it’s Python, code in your management package can do other things as well, such as inspect the INSTALLED_APPS setting and decide whether the listener should even be registered at all.

Applied Techniques

Given the wide array of tools available for individual models to customize their behavior, their interaction with the database, and that of the field associated with it, the options are nearly limitless. The techniques that follow represent just a small portion of what’s possible.

Loading Attributes on Demand

When working with certain types of data, it’s sometimes quite expensive to construct a complex Python object to represent a given value. Worse yet, some parts of the application might not even use that object, even though the rest of the model might be necessary. Some examples of this in the real world are complex geographic representations or large trees of nested objects.

In these cases, we must be able to get access to the full object when necessary, but it’s very important for performance to not have that object constructed if it won’t be used. Ideally, the data would be loaded from the database when the model is instantiated, but the raw value would just sit on the instance without being loaded into the full object. When the attribute is accessed, it would be constructed at that point, then cached so that subsequent accesses don’t have to keep reconstructing the object.

Looking back again to Chapter 2, descriptors are the perfect tool for this task, since they allow code to be run at the exact moment an attribute is accessed. Some care must be taken to make sure that the fully constructed object is cached properly for future use, but by using a separate name and attname, this is also fairly straightforward.

To illustrate how this would work in practice, consider a field designed to store and retrieve a pickled copy of any arbitrary Python object. There’s no way to know in advance how complicated the Python representation will be, so this is a situation where it’s ideal to delay the construction of that object until it’s actually necessary.

Storing Raw Data

The first step is to tell Django how to manage the raw data in the database, using a standard field. Since pickled objects are just strings, some form of text field would clearly be prudent, and since there’s no way to know in advance how large the pickled representation will be, the nearly limitless TextField seems like an obvious choice.

Of course, given that there will be some extra work going on for this new field, TextField alone won’t suffice. Instead, we’ll create a subclass that inherits the database functionality of TextField, while allowing extra customizations where necessary. Since fields are just Python classes like any other, this works just like you’d expect, but with one addition. In order to interact with the database using a different value than is used to interact with other Python code, the attname attribute needs to be different than the name attribute. This is controlled by a custom get_attname() method.

from django.db import models
 
class PickleField(models.TextField):
 
    def get_attname(self):
        return '%s_pickled' % self.name

This much alone will suffice for getting the field set up properly for the database. At this point, it’s even possible to assign a PickleField instance to a model and sync it with the database, and the column created will be perfectly usable for the duration of this example. Of course, it only manages the raw data so far; it won’t be able to handle real Python objects at all, much less deal with pickling and unpickling as necessary.

Pickling and Unpickling Data

To make the translation between a full Python object and a string representation that can be stored in the database, Python’s pickling modules¹⁰ will be the tool of choice. There are actually two separate modules provided by Python for this purpose: cPickle, written in C for improved performance, and pickle, written in pure Python for flexibility and portability. There are some minor differences between the two,¹¹ but they can be used interchangeably.

Having two modules available makes importing a bit trickier than usual. For obvious reasons, it’s very valuable to have the greater performance when it’s available, but a key aspect of Python and Django is the ability to be used across multiple platforms and environments. Therefore, when looking to import a pickling module, it’s best to try the more efficient module first, falling back to the more portable module when necessary.

try:
    import cPickle as pickle
except ImportError:
    import pickle

With a pickle module available, we can give PickleField the ability to actually pickle and unpickle data. By providing a couple basic methods, it’s possible to interface with the underlying module in a more object-oriented manner. In addition, it’s safe to assume that when preparing to commit to the database, the field’s value will be the full Python object, which obviously must be pickled.

On the other hand, when using a QuerySet’s filter() method to make comparisons against values in the database, pickled data will be quite useless. It would technically be possible to pickle the query’s value to compare against that found in the database, but it would be comparing the pickled values, not the original Python objects, which could lead to incorrect results.

More importantly, even though a pickled value is guaranteed to be unpickled properly when necessary, it’s quite possible that the same value, pickled on different occasions or possibly on different machines, will have different strings representing the original object. This is a documented side effect of the way pickling works, and must be taken into account.

With all of this in mind, it’s unreasonable to allow any kind of comparison against pickled data, so an exception should be thrown if such a comparison is attempted. As described previously in this chapter, that behavior is controlled by get_db_pre_lookup(), which can be overridden to throw such an exception. The full field thus far follows:

class PickleField(models.TextField):
    def pickle(self, obj):
        return pickle.dumps(obj)
 
    def unpickle(self, data):
        return pickle.loads(str(data))
 
    def get_attname(self):
        return '%s_pickled' % self.name
 
    def get_db_prep_lookup(self, lookup_type, value):
        raise ValueError("Can't make comparisons against pickled data.")

Note that pickle and cPickle only support pickled data strings as plain byte strings, not as full Unicode strings. Since everything in Django gets coerced to Unicode wherever possible, including retrieving from the database, unpickle() needs to take the extra step of forcing it back to a byte string in order to be unpickled properly.

WHY THE EXTRA METHODS?

It may seem odd to define separate pickle() and unpickle() methods, when the pickling module is already available in the module’s namespace. After all, it’s not only extra lines of code for you, the developer, to write, but it’s also an extra function call that Python has to go through to get the job done, which slows things down slightly, and seemingly unnecessarily.

The biggest advantage of doing it this way is that if any other application has need to subclass PickleField and wishes to override exactly how the data gets pickled and unpickled, having explicit methods for it makes that process considerably easier. They can just be overridden like normal, and as long as the rest of PickleField just references the methods, the subclass will work quite well.

This gets us one step closer, now that PickleField can store values in the database properly. However, it still doesn’t solve the main issue of loading data into a Python object, and doing so only when it’s really necessary.

Unpickling on Demand

If we weren’t concerned with performance, it’d be easy to perform the unpickling step in the to_python() method and just use SubfieldBase to make sure it happens every time an object is instantiated, regardless of where it came from. Unfortunately, that would incur a good deal of unnecessary overhead for those cases where this field wouldn’t be accessed, so it’s still well worth loading it up on demand, only when it’s requested.

As mentioned earlier, Python descriptors are particularly well suited for this scenario. They get called when an attribute is accessed, and can execute custom code at that time, replacing standard Python behavior with something designed for the task at hand.

The first step is determining how to instantiate the descriptor, which also means identifying what data it will need in order to get the job done. In order to retrieve the raw data from the model instance properly, it’ll need access to the field object, from which it can gather the name of the field itself.

class PickleDescriptor(property):
    def __init__(self, field):
        self.field = field

That will store references to all the features of the field that will be useful later on. With those in place, it’s possible to write the __get__() and __set__() methods that will actually do the hard work in the long run. Actually, __set__() is the easier of the two to implement; it just has to assign the raw data to the instance’s namespace directly.

def __set__(self, instance, value):
    instance.__dict__[self.field.name] = value
    setattr(instance, self.field.attname, self.field.pickle(value))

With that in place, the trickiest bit of this whole process is the descriptor’s __get__() method, which must be able to perform the following tasks in order to work properly.

Identify whether or not the full Python object needs to be created.
Generate a full Python object, by way of unpickling the raw data, only when necessary.
Cache the generated Python object for future use.
Return the cached copy of the object if it’s available, or the new one otherwise.

That last one’s actually a bit of a red herring, since it’s easy to make sure that a Python object is available at the end of the method, and just return that, without regard to where it came from. The rest, though, may look like quite a laundry list, but it’s really not that difficult to perform all those tasks in a small, readable method.

def __get__(self, instance, owner):
    if instance is None:
        return self
 
    if self.field.name not in instance.__dict__:
        # The object hasn't been created yet, so unpickle the data
        raw_data = getattr(instance, self.field.attname)
        instance.__dict__[self.field.name] = self.field.unpickle(raw_data)
 
    return instance.__dict__[self.field.name]

It should be fairly clear how this method performs each of the requirements. The first block checks for accesses from the model class, raising an appropriate exception. The second block does three more tasks, by first checking for the presences of a cached copy, and continuing otherwise. Then, it does two more in one line, unpickling the raw data and storing it in the cache if the cache wasn’t already populated. At the end, it simply returns whatever’s in the cache, regardless of whether it was in the cache when the method began.

Putting It All Together

The only thing left to make the whole thing work is to get the descriptor on the model at the right time, so it’s in place to get called when the attribute is accessed. This is precisely the intent of contribute_to_class(), where Django already provides a way for third-party code, such as this, to tie into the model creation process. Just make sure to always call the conribute_to_class() method on the parent class as well, to make sure that all the standard Django functionality is applied as well as the application’s more specialized requirements.

def contribute_to_class(self, cls, name):
    super(PickleField, self).contribute_to_class(cls, name)
    setattr(cls, name, PickleDescriptor(self))

With all of that now in place, we have a total of three import statements, two new classes and one new field that performs a very useful task. This is just one example of how this technique can be put to use, and there are as many more as there are applications using complicated Python data structures. The important thing to take away from this example is how to use descriptors to populate those complex objects only when necessary, which can be a big win in situations where they might not always be used.

try:
    import cPickle as pickle
except ImportError:
    import pickle
 
from django.db import models
 
class PickleDescriptor(property):
    def __init__(self, field):
        self.field = field
 
    def __get__(self, instance, owner):
        if instance is None:
            return self
 
        if self.field.name not in instance.__dict__:
            # The object hasn't been created yet, so unpickle the data
            raw_data = getattr(instance, self.field.attname)
            instance.__dict__[self.field.name] = self.field.unpickle(raw_data)
 
        return instance.__dict__[self.field.name]
 
    def __set__(self, instance, value):
        instance.__dict__[self.field.name] = value
        setattr(instance, self.field.attname, self.field.pickle(value))
 
class PickleField(models.TextField):
    def pickle(self, obj):
        return pickle.dumps(obj)
 
    def unpickle(self, data):
        return pickle.loads(str(data))
 
    def get_attname(self):
        return '%s_pickled' % self.name
 
    def get_db_prep_lookup(self, lookup_type, value):
        raise ValueError("Can't make comparisons against pickled data.")
 
    def contribute_to_class(self, cls, name):
        super(PickleField, self).contribute_to_class(cls, name)
        setattr(cls, name, PickleDescriptor(self))

Creating Models Dynamically at Runtime

Chapter 2 demonstrated how Python classes are really just objects like any other, and can be created at runtime by using the built-in type() constructor and passing in some details about how it should be defined. Since Django models are really just Python declared in a specific way, it’s reasonable to expect that they could also be created at runtime using this same feature. Some care must be taken, but this can be an extremely useful technique in a variety of situations.

The trick is to remember how Python processes classes, and how Django processes its models. Chapter 2 already illustrated the basic tools necessary to make this work, so it’s now just a matter of applying that to the specific details of Django models. There are a few things that set models apart from other Python classes:

All models subclass django.db.models.Model.
Fields are specified as class attributes in the model’s declaration.
Additional options are specified in a Meta class inside the model’s declaration.

With these requirements outlined, it’s fairly easy to map a model declaration onto the arguments for type(). In particular, remember that there are three arguments required to construct a class: name, bases and attrs. The model’s name is clearly mapped to name, while the single subclass of models.Model can be wrapped in a tuple and passed into bases. The remainder of the class declaration would go into attrs, including a Meta class for any additional model-level configuration options.

A First Pass

To make a first pass at what this function might look like, let’s start with just the most basic aspect of class creation and work our way out from there. To begin with, consider a function that generates a class with the correct name and base class, to illustrate the basic technique for creating a class dynamically and returning it for use elsewhere.

from django.db import models
 
def create_model(name):
    return type(name, (models.Model,), {})

Unfortunately, that’s actually a little too simplistic. Trying this out in Python will result in a KeyError, because Django expects the attribute dictionary to include a __module__ key, with its value being the import path of the module where the model was defined. This is normally populated by Python automatically for all classes defined in source files, but since we’re generating a model at runtime, it’s not available.

This is just one of the minor details that dynamic models have to face, and there’s really no way of avoiding it entirely. Instead, create_model() needs to be updated to provide a __module__ attribute directly. This is also another example of why it’s a good idea to put this code in one place; imagine having to deal with this every time a dynamic model is required. Here’s what it looks like to include a module path for the class:

def create_model(name, module_path):
    return type(name, (models.Model,), {'__module__': module_path})

Now it can accept a module path and keep Django happy. Well, it can keep Django happy as long as the module path has already been imported, which means it has to actually exist. Under normal circumstances, the model’s __module__ attribute is set to the path of the module where it was defined. Since the model will only be processed while executing that module, it’s always guaranteed that the module will exist and have been imported successfully. After all, if it hadn’t, the model would’ve been encountered in the first place.

For now, since the only requirement of the module path is that it be valid and already imported, Django’s own django.db.models will make a reasonable candidate. It should be overridden where appropriate, of course, but it’s a decent default until things get rolling.

def create_model(name, attrs={}, module_path='django.db.models'):
    attrs = dict(attrs, __module__=module_path)
    return type(name, (models.Model,), attrs)

Clearly, these dynamic models shake things up quite a bit, bypassing much of how Python normally works with a process like this. The __module__ issue is just the first issue encountered, and one of the easiest to work around. Thankfully, even though there are a few others to be handled, it can be well worth it if used properly.

The next step in this basic example is to include a dictionary of attributes to be set as if they were declared directly on a class definition. This will allow fields to be included on the model, as well as custom managers and common methods like __unicode__(). Since we’re already passing a dictionary to be used as attributes, assigning additional items to that dictionary is a simple process.

def create_model(name, attrs={}, module_path='django.db.models'):
    attrs = dict(attrs, __module__=module_path)
    return type(name, (models.Model,), attrs)

Ordinarily, it’s not advisable to supply a mutable object, such as a dictionary, as a default argument, since modifications to it would affect all future executions of the function. In this example, however, it’s used only to populate a new dictionary, and is immediately replaced by that new dictionary. Because of this, it’s safe to use as the default argument, in an effort to keep the method reasonably succinct.

So far, we’ve set up a 3-line function to create basic models with any number of attributes, which can then be used in other areas of Django. Technically, this function alone could be used to generate any model imaginable, but it already provides a shortcut for setting up __module__, so it would make sense to provide another shortcut for setting up the model configuration by way of a Meta inner class. That way, code to create a model won’t have to set up that class directly.

Adding Model Configuration Options

Django models accept configuration through an inner class called Meta, which contains attributes for all the options that are specified. That should sound familiar, since that’s basically what models themselves do as well. Unfortunately, because of how Django processes the Meta class, we have to take a different approach.

The attributes defined within Meta are passed along into a special Options object, which lives at django.db.models.options. As part of this process, Options makes sure that no attributes were supplied that it doesn’t know how to handle. Unfortunately, because the fact that Meta is a class is just a way to separate its namespace from that of the main model. Options only knows how to handle old-style Python classes—that is, classes that don’t inherit from the built-in object type.

This is an important distinction, because calling type() directly creates a new-style class, even if it doesn’t inherit from object, or any subclasses for that matter. This ends up creating two additional attributes on the class that Options doesn’t know how to deal with, so it raises a TypeError to indicate the problem. That leaves two options for creating a Meta class: removing the additional attributes or creating an old-style class using some other means.

While it would be possible to just remove the attributes that offend Options, an even better idea would be to provide it exactly what it expects: an old-style class. Clearly, using type() is out of the question, which leaves us with just declaring a class using standard syntax. Since this is possible even within functions, and its namespace dictionary can be updated with new attributes, it’s a decent way to go about solving this problem.

from django.db import models
 
def create_model(name, attrs={}, meta_attrs={}, module_path='django.db.models'):
    attrs['__module__'] = module_path
    class Meta: pass
    Meta.__dict__.update(meta_attrs, __module__=module_path)
    attrs['Meta'] = Meta
    return type(name, (models.Model,), attrs)

This will now accept two attribute dictionaries, one for the model itself, and another for the Meta inner class. This allows full customization of Django models that can be created at any time. While this may seem like a rather abstract concept at the moment, see Chapter 11 for a full example of how this can be used in practice to automatically record all changes to a model.

Now What?

With a solid foundation of Django’s models under your belt, the next step is to write some code that will allow users to interact with those models. The next chapter will show how views can provide your users with access to these models.

¹ http://prodjango.com/sql-injection/

² http://prodjango.com/model-inheritance/

³ http:/prodjango.com/serialization/

⁴ http://prodjango.com/timedelta/

⁵ http://prodjango.com/postgresql-interval/

⁶ http://prodjango.com/db-api/

⁷ http://prodjango.com/file-api/

⁸ http://prodjango.com/stringio/

⁹ http://prodjango.com/open/

¹⁰ http://prodjango.com/pickle/

¹¹ http://prodjango.com/cpickle/

..................Content has been hidden....................