Data is at the center of most modern Web applications, and Django aims to provide support for a variety of data structures and persistence options. Models are the primary aspect of the traditional MVC model that Django uses as expected. Models are an essential part of any application that needs to persist data across multiple requests, sessions or even server instances.
Django models are defined as standard Python classes, with a wealth of additional features added in automatically. Behind the scenes, an object-relational mapper (ORM) allows these classes and their instances access to databases. Without this ORM, developers would be required to deal with the database directly, using Structured Query Language (SQL), the standard way to access content in databases.
The primary goal of SQL is to describe and access the relationships that are stored in a relational database. SQL does not generally provide high-level relationships for applications, so most applications include handwritten SQL for data activities. This is definitely possible, but it tends to lead toward lots of repetition, which in and of itself violates the DRY principle outlined in Chapter 1.
These bits of SQL littered throughout an application’s code quickly become unmanageable, especially since the programmers who have to manage the code aren’t typically experts in relational databases. That also means that these databases are quite prone to bugs, which are often troublesome to track down and fix.
That still doesn’t factor in the biggest issue of all: security. SQL injection1 attacks are a common way for malicious attackers to access or even modify data they shouldn’t have access to. This occurs when hand-written SQL doesn’t take appropriate precautions with regard to the values that are passed into the database. The more SQL statements that are written by hand, the more likely they are to be susceptible to this type of attack.
All of these problems are extremely common in Web development, regardless of language, and ORMs are a common way for frameworks to mitigate them. There are other ways to avoid some of these problems, such as SQL injection, but Django’s ORM was written with these concerns in mind and handles much of it behind the scenes. By accessing data using standard Python objects, the amount of SQL is minimized, reducing the opportunity for problems to crop up.
How Django Processes Model Classes
Described in Chapter 2, one of Django’s most recognizable features is its declarative syntax for model definitions. With this, model definitions can be simple and concise, while still providing a vast array of functionality. The basic process of using metaclasses for declarative syntax is described in detail in Chapter 2, but there are more specific steps taken when handling models, which deserve some extra attention.
The metaclass responsible for processing model definitions is ModelBase, living at django.db.models.base. This provides a few key features, listed here in the order in which the actions are performed.
Abstract models and inherited models are special cases, where not all of these actions occur. Specific differences for these cases are covered later in this chapter.
Setting Attributes on Models
Python provides useful tools for getting and setting attributes on objects without knowing the name in advance, but while getattr() and setattr() represent the standard way of accessing attributes on objects, one of Django’s hooks for model fields requires some additional handling. Django provides a class method, add_to_class(), on all of its models, which should be used as a substitute for setattr().
The syntax and semantics of add_to_class() are slightly different than the traditional functions. It’s actually a class method, rather than a built-in or even module-level function, which means the class is provided implicitly, rather than being an explicit first argument. This method checks the provided value for the presence of a contribute_to_class() method, and calls it if it exists. Otherwise, the standard setattr() function is used to add the value to the model. These behaviors are mutually exclusive; only one will happen in a given add_to_class() call. It’s important to realize that this isn’t just for Django’s own internal code. If an application has need to add arbitrary objects as attributes to models, they must call add_to_class(). This way, developers working with the application can pass any object in, and be assured that it will be handled the same as if it had been applied directly on the model’s class definition.
This whole process changes what the classes look like when using the introspection techniques described in Chapter 2. In order to determine the declared fields, the database table being used or the display name for the model, some additional knowledge is required.
Getting Information About Models
Once a model has been processed by Python, along with Django’s ModelBase metaclass, its original structure can still be determined by using an attribute that exists on every Django model and its instances called _meta.
There are a number of attributes available on _meta, which combine to describe the model, how it was defined, and what values were provided to customize its behavior. These can also be classified into two separate groups: attributes that are determined by looking at the actual structure of the original class and those that are specified directly as part of a Meta class defined inside the model.
REGARDING THE STABILITY OF _META
Names beginning with underscores typically refer to private attributes that shouldn’t be used directly. They’re often used internally by functions and methods that are more public in nature, and are generally accompanied by warnings about likely changes and undocumented behavior. In most cases, these warnings are valid; programmers usually write tools for their own use, and find little need in documenting their behavior or securing their longevity.
However, _meta is a bit of an exception to the rule. While it is indeed part of a private API, which isn’t necessary for the vast majority of situations, it shares something with many tools described in this book; it can prove extremely useful if understood and used properly. In fact, _meta goes one better, by being quite stable and highly unlikely to change without considerable effort to keep it backwards-compatible. It’s the foundation of much of Django’s own internal code, and is already being accessed directly by many third-party applications as well.
So, while names beginning with underscores do generally spell danger, potential incompatibilities and lack of support, you can rely on _meta quite safely. Just make sure to keep up with Django’s list of backwards-incompatible changes. Anything new that would break _meta will be listed there.
Class Information
While most of the basic introspection techniques covered in Chapter 2 apply to Django models, there are a number of details that are also made available on the _meta attribute. Most of this is information Django itself needs in order to properly deal with models, but as with many other features, it can be quite useful for other applications as well.
One important distinction to make with models is whether they’re “installed” or not. This means checking whether the application that contains them is listed in the site’s INSTALLED_APPS setting. Many Django features, such as syncdb and the built-in admin interface, require an application to be listed in INSTALLED_APPS in order to be located and used.
If an application is designed to accept any Django model directly, rather than iterating through INSTALLED_APPS, it will often need some way to determine whether the model is properly installed. This is necessary in case the application needs to handle models differently, depending on whether database operations should be performed on the table, for instance. For this purpose, Django provides the installed attribute, which will be True only if the model belongs to an application listed in INSTALLED_APPS, and False otherwise.
There are two other attributes of model-level information that are commonly useful to application developers. As described in Chapter 2, all Python classes provide an easy way to get the name of the class and the module where it was defined, using the __name__ and __module__ attributes, respectively. However, there are some situations where these can be misleading.
Consider a situation where a model may be subclassed without inheriting all the Django-specific model inheritance processing. This requires a bit of tweaking with metaclasses, but can prove useful for solving certain types of problems. When doing this, the __name__ and __module__ attributes will refer to the child class, rather than the actual model that sits underneath.
Often, this is the desired behavior, as it’s just how standard Python works, but when attempting to interact with the Django model, or other areas of Django that may need to work with it, it may be necessary to know the details of the model itself, rather than the child class. One way to go about this would be to use class introspection to get the various parent classes that are in use, checking each to see if it’s a Django model.
This is a fairly unsightly process that takes time to code, time to execute, makes maintenance and readability more difficult and adds boilerplate if it needs to be done often. Thankfully, Django provides two additional attributes on _meta to greatly simplify this. The module_name attribute contains the __module__ attribute from the underlying model, while object_name pertains to the __name__ attribute of the model.
Field Definitions
A major challenge involved in using and manipulating Django models is the process of locating and using fields that are defined for them. Django uses the creation_counter technique described in Chapter 2 to keep track of the order of fields, so they can be placed inside a list for future reference. This list is stored in the fields attribute of the model’s _meta attribute.
As a list, this can be iterated to retrieve all the field objects in order, which is extremely useful when looking to deal with models generically. As described later in this chapter, field objects have attributes containing all the options that were specified for them, so each item in the list can provide a wealth of information.
With this, we can create a custom form or template output, or any other feature that needs to work with fields on an arbitrary model. Consider the following example, which prints out the display names and current values for each field in a given object, without having to know in advance what model is being used.
from django.utils.text import capfirst
def get_values(instance):
for field in instance._meta.fields:
name = capfirst(field.verbose_name)
value = getattr(instance, field.name)
print('%s: %s' % (name, value))
Going about it this way allows the function to ignore the details of the model behind the object. As long as it’s an instance of a proper Django model, the _meta attribute will be available and all the fields will be accessible in this way. Since Django automatically adds an AutoField to any model that doesn’t declare a primary key, the created AutoField will also be included in the fields list.
While being able to iterate through a list is great for those situations where all the fields will be taken into account, sometimes only a single field is needed, and the name of that field is known in advance. Since fields is a list instead of a dictionary, the only way to get a field by its name would be to loop over the fields, checking each to see if its name matches.
To cater to this need, Django provides a utility method, _meta.get_field(). By providing the field name to the _meta.get_field(), it’s easy to retrieve just the specified field. If no field with that name exists, it will raise a FieldDoesNotExist exception, which lives at django.db.models.fields.
To get a better understanding of how these methods work together to identify the fields that were declared on a model, consider the following model declaration.
class Product(models.Model):
sku = models.CharField(max_length=8, verbose_name='SKU')
name = models.CharField(max_length=255)
price = models.DecimalField(max_digits=5, decimal_places=2)
def __unicode__(self):
return self.name
Then, the model could be inspected to get more information about this declaration, without having to know what it looked like in advance.
>>> from django.utils.text import capfirst
>>> for field in Product._meta.fields:
... print('%s: %s' % (capfirst(field.verbose_name), field.__class__))
...
ID: <class 'django.db.models.fields.AutoField'>
SKU: <class 'django.db.models.fields.CharField'>
Name: <class 'django.db.models.fields.CharField'>
Price: <class 'django.db.models.fields.DecimalField'>
>>> Product._meta.get_field('name').__class__
<class 'django.db.models.fields.CharField'>
Primary Key Fields
Any field can be specified as a primary key, by setting primary_key=True in the field’s definition. This means that if code is to handle a model or a model instance without prior knowledge of its definition, it’s often necessary to identify which field was defined as a primary key.
Much like getting a field by name, it would be possible to just iterate over all the fields, looking for one with its primary_key attribute set to True. After all, Django only allows one field to be specified as a primary key. Unfortunately, this again introduces a fair amount of boilerplate that slows things down and makes it more difficult to maintain.
To simplify this task, Django provides another _meta attribute, pk, which contains the field object that will be used as the primary key for the model. This is also faster than iterating over all the fields, since pk is populated once, when the model is first processed. After all, Django needs to determine whether it needs to provide an implicit primary key. The _meta.pk attribute is also used to enable the pk shortcut property on model instances, which returns the primary key value for an instance, regardless of which field is the primary key.
Typically, models don’t need to declare an explicit primary key, and can instead let Django create one automatically. This can be a useful way to avoid repeating such a common declaration, while still allowing it to be overridden if necessary. One potential problem with this, however, is the task of determining whether a model was given an automatic field, and what that field looks like.
It’s possible to make certain assumptions about a model, based on how Django provides this automatic field, and what it would typically look like. However, it’s easy to create a custom field that looks a lot like the implicit field, and it’d be very difficult to tell the difference if your code only looks at its structure and options.
Instead, Django provides two attributes on the _meta attribute that help with this situation. The first, _meta.has_auto_field, is True if the model let Django provide an id field implicitly. If it’s False, the model has an explicit primary key, so Django didn’t have to intervene.
The second attribute related to the automatic primary key field is _meta.auto_field, which will be the actual field object Django provided for use as the primary key. If _meta.has_auto_field is True, this will be an AutoField, and will always be configured the same way for all models that use it. It’s important to look at this attribute instead of making assumptions about the field’s structure, in case Django makes any changes in the future. It’s an easy way to help make sure your application keeps working properly in the future. If a model provides its own primary key field, and thus _meta.has_auto_field is False, _meta.auto_field will be set to None.
Configuration Options
In addition to providing access to the fields declared on the model, _meta also acts as a container for all the various options that can be set on a model using the Meta inner class. These options allow a model to control a variety of things, such as what the model is named, what database table it should use, how records should be ordered, and a number of others.
These options all have defaults, so that even those attributes that aren’t specified on the model are still available through the _meta attribute. The following is a list of the many options that are available in this way, along with their default values and a brief description what the option is intended for.
Once models have been processed by the ModelBase metaclass, they’re placed in a global registry called AppCache, located at django.db.models.loading. This is instantiated automatically, immediately when the module is imported, and is accessed using the name cache. This special cache provides access to the various models that are known to Django, as well as installs new ones if necessary.
Because ModelBase handles registration of new models whenever the class is processed by Python, the models it contains aren’t guaranteed to be part of applications present in the INSTALLED_APPS setting. This fact makes it even more important to remember that the _meta attribute on the model contains an installed attribute indicating whether the model belongs to an installed application.
Whenever code accesses one of the features in this section, AppCache will automatically load applications that are listed in INSTALLED_APPS, making sure that whenever some of the features are accessed, the cache includes all applications and models that should be made available. Without this, the results of these methods would be wildly unpredictable, based solely on which applications were loaded in which order.
As might seem obvious, the application cache can only be fully populated once all the applications have been loaded. Therefore, if an application’s models.py makes any calls to AppCache as part of this loading process, it’s possible that the cache might not be fully populated yet.
To protect against this problem, AppCache provides a method to determine whether the cache itself has been populated and is ready to be accessed. Calling cache.app_cache_ready() will return True or False depending on whether all of the installed applications have been processed correctly. Using this, applications that could benefit from having their own cache of known models can check if this cache is available for that purpose. If so, it can use this cache directly, while if not, it can manually determine what it needs to know.
Retrieving All Applications
When looking to introspect a site’s contents, it’s also very useful to look at the structure of applications themselves. After all, looking at models is only useful if there are models to look at, and sometimes it’s necessary to just collect all the models currently in use. It’s also useful to have them arranged by the application that declares them. Django already needs to have this information handy, so AppCache is designed to specifically manage this information.
HOW DOES DJANGO SEE APPLICATIONS?
One important thing to keep in mind is that Django needs an object to use as a reference for the application. A Django application is essentially a standard Python package, which is just a collection of modules contained in a single folder. While Python provides an object to use as a reference for individual modules, it doesn’t offer anything to refer to a package.
Because of this, the closest notion Django can have to an application object is the __init__.py module that Python uses to recognize it as a package. In that case, Django would be using a module object as an application reference.
Unfortunately, few projects store anything useful in __init__.py, so Django isn’t likely to find anything of interest in it. In order to get at anything really useful, it would have to perform some extra work to traverse the package structure to get a module that contained some pertinent information.
Instead, since Django has to use a module object anyway, it makes more sense to use a module that contains useful information right off the bat. For the majority of applications, the most useful module in a package is models.py, where all the Django models are defined. Therefore, Django uses this module to recognize an application. Some of the following methods return an application, and in each case, it returns the models module within the application’s package.
The first step in a site-wide introspection is to determine what applications are installed. Calling cache.get_apps() will return such a list, containing the application module for each application in the INSTALLED_APPS setting that contains a models module. That’s not to say that it only returns applications that have models. It actually checks for the presence of a models module, so even an empty models.py will cause an application to be included in this list.
Take, for example, the following INSTALLED_APPS setting, showing several of Django’s own contributed applications, as well as some in-house applications and the signedcookies application described in Chapter 7.
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'news',
'customers',
'callcenter',
'signedcookies',
)
Most of these applications will, by necessity, contain various models. Chapter 7’s signedcookies, however, only interacts with the site’s HTTP traffic, so it has no use for the database. Therefore, when looking through the results of cache.get_apps(), the signedcookies application won’t show up.
>>> from django.conf import settings
>>> from django.db.models.loading import cache
>>> len(settings.INSTALLED_APPS)
9
>>> len(cache.get_apps())
8
>>> for app in cache.get_apps():
... print(app.__name__)
...
django.contrib.admin.models
django.contrib.auth.models
django.contrib.contenttypes.models
django.contrib.sessions.models
django.contrib.sites.models
news.models
customers.models
callcenter.models
Retrieving a Single Application
With a list of applications, it’s straightforward to get models from each, so they can be handled appropriately. The next section describes that process in more detail. However, looking at all models isn’t always the best approach; sometimes an application might be given the label of a specific application, so it can deal with just the models in that application.
While it would certainly be possible to just loop through the results from cache.get_apps(), checking the module names against the application module’s __name__ attribute, that technique quickly runs into a few problems. First, the application’s label isn’t the same as its __name__ attribute, so trying to compare the two results in a good bit of extra code, most of which is already being done by Django. Also, that code must be tested and maintained, which increases the risk of introducing bugs into the application.
Instead, Django provides a utility for handling this situation. By passing the known label to cache.get_app(), an application can retrieve the application module for just the application matching that particular label. The label referred to here is determined as a specific part of the application’s import path.
Typically referenced as app_label, an application’s label is usually formed from the last part of the application module’s import path before the models portion. To illustrate a few examples, consider the following application labels, corresponding to the entries in the INSTALLED_APPS setting.
admin
auth
contenttypes
sessions
sites
news
customers
callcenter
signedcookies
There’s one important note to mention here. As part of the Meta options described in the official documentation, and briefly touched on earlier in this chapter, any model may override its own app_label setting to behave as though it was declared inside a different application. This option does not affect the behavior of cache.get_app() in any way. The get_app() method simply maps the app_label to an application module, without regard to what options the modules inside it may have declared.
As demonstrated earlier with cache.get_apps(), applications without models are viewed slightly differently within Django itself than others. By default, cache.get_app() will raise an ImproperlyConfigured exception if the application doesn’t contain a models.py file. Sometimes it may still be useful to process applications without models, so cache.get_app() accepts an optional second argument to control how such applications are handled.
This second argument, called emptyOK, takes a Boolean indicating whether the application is allowed to not contain any models. This defaults to False, which will raise the ImproperlyConfigured exception, but if True is given instead, cache.get_app() will simply return None, allowing the calling code to continue managing the application.
>>> from django.db.models.loading import cache
>>> print(cache.get_app('admin'))
<module 'django.contrib.admin.models' from ...>
>>> print(cache.get_app('signedcookies'))
Traceback (most recent call last):
...
django.core.exceptions.ImproperlyConfigured: App with label signedcookies could not be found
>>> print(cache.get_app('signedcookies', emptyOK=True))
None
Dealing with Individual Models
Once an application is known, the next step is to deal with individual models within that application. Once again, AppCache comes through with a few methods to handling this situation. Retrieving models from the cache typically takes one of two forms, depending on how much is known about the model in advance.
In the first case, consider pure introspection. Remember from the previous section that AppCache provides access to all known applications with a single call to the get_apps() method, which returns application modules. Since these modules are actually the models modules within each application, it may seem easy to just use dir(app_module) or iterate over app_module.__dict__ to get the models that were defined.
Unfortunately, like many uses of simple iteration, that would require the loop to check each individual object in the module to see if it is in fact a model or if it’s something else entirely. After all, Python modules can contain anything, and many models make use of tuples and module-level constants to help do their work, so there’s no guarantee that each item in the module’s namespace is in fact a Django model.
Instead, cache.get_models() retrieves a list of proper Django models that are specific to the given application module. It’s no coincidence that both cache.get_apps() and cache.get_app() return application modules; cache.get_models() is suitable for use with both of these methods. That means that a list of models can be retrieved even without an application, but knowing the application in advance reduces the number of models retrieved.
The following code demonstrates how these techniques can be used in combination to retrieve a list of models for each of the known applications in use on the site.
>>> from django.db.models.loading import cache
>>> for app in cache.get_apps():
... app_label = app.__name__.split('.')[-2]
... for model in cache.get_models(app):
... print('%s.%s' % (app_label, model.__name__))
...
admin.LogEntry
auth.Message
auth.Group
auth.User
auth.Permission
contenttypes.ContentType
sessions.Session
sites.Site
news.News
customers.Customer
callcenter.Agent
callcenter.Call
callcenter.Case
As an additional option, get_models() can also be called with no argument, which will cause it to return all the models that are known to AppCache. This is a useful shortcut to avoid some of the overhead associated with the extra loop in this example, as a quick way to grab all the models.
There’s a catch, however.
When using get_models() directly, with no argument, all registered models are returned. This may sound like a great idea, and sometimes it is, but remember that AppCache registers all models as they’re encountered, regardless of where they were found. The full list may include models that aren’t part of an installed application. Contrast that with the get_apps()/get_models() combination, which only retrieves models if their applications are found in the INSTALLED_APPS setting.
In practice, get_models() may return different results if called without an argument than if it were called with each of the applications returned from get_apps(). Typically, this could mean that an application may get access to extra models that it might not want to know about. Sometimes this is indeed the desired behavior, but it’s always important to understand the difference.
One way a model could be in AppCache, but not be installed, is if the application is imported from a separate, installed application, which would cause its model classes to be processed by Django and registered, regardless of whether or not it was in INSTALLED_APPS. Also, if any model specifies an app_label on its Meta class and that application label doesn’t match up with any installed application, the same situation would occur. If an application does wish to access all the models, regardless of whether they’re installed or not, remember that it can use the _meta.installed attribute to identify which models were installed properly.
Sometimes, the name of both the application and the model are provided, perhaps as part of a URL or other configuration. In these cases, it doesn’t make much sense to iterate over all the models for the given application. For this case, AppCache provides another method, get_model(), which retrieves a model class based on an application label and model name. The application name is case-sensitive, but the model name isn’t.
>>> from django.db.models.loading import cache
>>> cache.get_model('auth', 'user')
<class 'django.contrib.auth.models.User'>
Using Model Fields
One of the most important aspects of models is the set of fields that are available to hold data. Without fields, a model would just be an empty container with no way to do anything useful. Fields provide a way to organize a model’s values and validate against specific data types, providing a bridge between the database and native Python data types.
Normally, when accessing a field as an attribute of a model instance, the value will be a standard Python object representing the value found in the database. Previous sections in this chapter have described a variety of ways to get access to the actual field objects themselves, rather than this converted value. There are a variety of useful things that can be done with field objects.
Common Field Attributes
Different field types will have different attributes according to their needs, but there are several attributes that are common across most built-in Django fields. These can be used to generically access various details of fields, and by association, the values and behaviors they’re meant to interface with. Note that there are more attributes used internally than those listed here, but these are the most useful and stable, and will provide the greatest value to applications looking to work with fields.
The descriptions listed here are how Django itself uses these attributes, and how developers will expect them to behave. Other applications will likely find use for them as well, to control certain types of behaviors, so the following descriptions will help illustrate their intended usage.
Some applications may find uses that are slightly different from what Django itself expects to use them for, but the general semantics of the values should remain intact. Remember that developers will build their expectations for these values based on how Django itself behaves, and third-party applications should avoid violating these expectations.
Common Field Methods
Like the attributes described in the previous section, these methods are common to most field types, and provide a wealth of functionality that might otherwise be difficult to come by. Not all field types will implement all of these methods, and their exact behavior may change depending on the field type involved, but the general semantics described here will remain the same.
There are more methods that get used even more internally, which aren’t listed here, because they’re primarily responsible for simply populating the attributes described in the previous section. Therefore, it’s generally best to simply reference the generated attributes, rather than attempting to recreate them manually after the fact.
One of the more useful things that can be done with Django models, particularly with regard to reusable applications, is to tie into a model’s ability to process individual types of fields in a generic fashion. This allows fields themselves to have considerable control over how they interact with the database, what native Python data type is used to access their contents and how they’re applied to the model classes that use them.
The majority of this section assumes that the custom field will need to retain much of the same functionality of existing fields, such as interacting with the database and generated forms. There are many other applications, such as the historical records application described in Chapter 11, which use the hooks described in this section to provide much more functionality than just a simple field.
The term “field” here is used loosely to describe any object that uses some of these techniques to present itself to a Django developer as something resembling a standard Django model field. In reality, such an object could encapsulate complex relationships, such as a tagging application, or even control the creation of entire new Django models on the fly, based on the model to which they’re assigned. The possibilities are nearly limitless.
The key to remember is that Django uses duck typing principles with regard to fields. It simply accesses whatever attributes and methods it expects in each situation, without regard to what those actually do behind the scenes. In fact, there’s not even any requirement that objects be a subclass of django.db.models.fields.Field to make use of these hooks. Inheriting from Field simply provides an easy way to reuse much of the existing functionality, if that behavior is required.
Deciding Whether to Invent or Extend
One of the first things to consider when writing a new field is whether to try to invent an entire new type of field, starting perhaps from scratch without the aid of Field at all, or to extend some existing field type and inherit much of its behavior. There are advantages and disadvantages to each approach, and which is most appropriate depends very much on the demands of the new field being created.
By inheriting from Field or one of its subclasses, most of the behaviors in the following sections will be inherited, potentially reducing the amount of new code the custom field must include. If its behavior is similar to an existing field type, this can be a very useful way not only to cut down on new code, which helps reduce bugs, but also to automatically receive any new or updated functionality provided by Django itself in future releases. After all, by relying on Django itself for much of this behavior, updates to that code will automatically be reflected in the behavior of the custom field.
On the other hand, if the new field varies considerably from any existing field type, the standard behaviors will need to be rewritten for its own use anyway, negating any value of inheriting from a parent class. If most—or all—of these behaviors have to be written from scratch, inheriting from an existing field will simply create an extra step in the process Python uses to manage the class, even though that extra step offers little or no benefit. In these cases, it’s best, therefore, to simply start from scratch, implementing just those behaviors that make sense for the custom field, and Django will still process it properly, due to its use of duck typing.
Of course, there is some middle ground between the two approaches. For instance, a custom field may interact with a completely unique data type, bearing little resemblance to any existing field types, but it may still store its data in the database like a standard field, and could benefit from reusing many of Django’s more basic field methods, such as assigning names and storing itself in _meta.fields. In these cases, it’s quite reasonable to inherit from Field itself, rather than a specific subclass, and inherit just this most basic functionality.
Performing Actions During Model Registration
The first step any field goes through is being processed by the ModelBase metaclass, whenever Python encounters a model class that utilizes the field in question. For standard Python objects, this means simply getting assigned to the model class as normal, with no additional processing. Fields take a different path, however, and each field gets the chance to customize how it’s applied to a model class.
contribute_to_class(self, cls, name)
This is perhaps the most important method a field can contain, as it provides an essential feature: the ability for a field to know what class it was assigned to, and what name it was given. This may seem like a simple requirement, but Python itself doesn’t normally have a way to facilitate this.
You may recall that descriptors, described in Chapter 2, have a way to identify what class—and even what instance of that class—was used to access the object, but this is only available at the time the attribute is accessed; there’s still no way to know this information at the time the assignment took place. More importantly, even descriptors don’t provide any way to identify what name was used to access them, which can be a considerable problem when trying to cache information or interact with other features that require the use of a name, such as that of a database column.
Instead, by using a metaclass, Django can intercede at the point where Python is processing the class, and use the presence of a contribute_to_class() method to identify objects that need to be handled differently. If this method exists, it’s called instead of the standard setattr(), allowing the field to register itself in whatever way is most appropriate for its purpose. When doing so, Django also provides the class itself as an argument, as well as the name it was given, which was discovered while looking through the attributes assigned to the class. Therefore, in addition to the usual self, this method receives two arguments.
Once these two arguments have been processed in whatever way is appropriate for the field, the method shouldn’t return anything, as its return value is ignored by Django.
CONTRIBUTE_TO_CLASS() VS SETATTR()
There is one very important thing to keep in mind when dealing with contribute_to_class(). It’s been mentioned a few times already in various places, but it’s so important that it merits driving home very explicitly. If Django identifies an object as having a contribute_to_class() method, only that method will be called.
Normally, setattr() is used to set attributes on an object such as a class, but since model fields don’t get set in the standard namespace, that step is skipped intentionally. Therefore, if a custom field does in fact need to be set as an attribute on the model class itself, doing so is the sole responsibility of the field itself, during the execution of its contribute_to_class() method.
Sometimes, fields will instead need to set some other object, such as a descriptor, as the attribute on the class, to provide additional customizations for other types of access. This, too, is the responsibility of the field class, and the only time to do so in a way that will maintain the appearance of a standard field is during the execution of its contribute_to_class() method.
In the case of standard Django fields, and perhaps for many types of custom fields and other objects that behave as fields, this avoidance of setattr() is quite intentional. If that behavior is desired, contribute_to_class() should simply avoid setting anything on the model class, and Django’s own behavior will make sure that nothing is assigned to the class itself.
contribute_to_related_class(self, cls, related)
For fields that relate themselves to other models, this is called once the related model is available, so that attributes can be added to that model as well. For example, this is how Django provides a reverse attribute on a related class when a ForeignKey is applied.
The two arguments it receives are cls, the model class the relationship was actually applied to, and related, the model the relationship points to, where other attributes may yet need to be applied. Like contribute_to_class(), this shouldn’t return anything, as it would simply be ignored anyway.
Altering Data Behavior
Given that most field types exist to interact with specific data types, one of the first things to consider is how to tell Django to handle that data type. This includes how to store it in the database, how to ensure validity of its value and how to represent that value in Python. These are some of the most fundamental aspects of field behavior, and properly altering them can open up a world of possibilities.
get_internal_type(self)
This method returns a string, which helps determine how the database should store values for the field. The string itself isn’t an actual database column type, but instead it’s applied to a mapping provided by the database backend to determine what type of column to use. This way, fields can be written without being tied to a specific database backend.
Because the return value for this function gets applied to a known dictionary of types to retrieve the database column name, that value must be a valid entry in that dictionary. Therefore, there’s a finite set of possible return values, which are listed here.
validate(self, value, instance)
When a model is being checked for the accuracy of its values, this method is used to determine whether the field’s contents are correct. The arguments it receives are the value of the field itself, and also the model with all of its fields. This allows it the option of validating not only the field’s own value, but also that it makes sense in the context of the greater model.
It should be obvious why this would be of use when validating an individual field’s value, but it’s less clear what value lies in using the rest of the model’s values. After all, when writing a field, there’s typically no way to know what other fields will be used alongside it.
Sometimes, however, a field may be written specifically for a particular model, and can therefore know in advance what the entire model will look like. In these cases, the field can, for example, check to see what type of account a person has, because the maximum value for the field depends on that other field.
to_python(self, value)
The value of a field can be stored in a number of different ways, depending on where it’s being stored. In a database, it can be one of a few basic types, such as strings, integers and dates, while when serializing a model, all values will be coerced to strings. That means that often, when instantiating a model, its value has to be forced back into its proper Python representation. This behavior is handled by the to_python() method, though it’s not quite as straightforward as it may seem on the surface.
The first thing to consider is that the value passed to to_python() could be one of a number of representations of the data. For instance, it could be whatever format is returned from the database adapter, such as a string, integer or native Python date, but it could also be a string retrieved from a serializer, or if the field manages a more complex custom data type that needs to be initialized, the value could actually be a fully-initialized instance of that type.
To illustrate this, consider the situation of BooleanField. Values that get passed into it could come in a variety of forms, so it’s to_python() method needs to anticipate this and make sure that it always returns a Boolean value or throws an exception indicating that the value wasn’t suitable for the field.
def to_python(self, value):
if value in (True, False): return value
if value in ('t', 'True', '1'): return True
if value in ('f', 'False', '0'): return False
raise exceptions.ValidationError(_("This value must be either True or False."))
As you can see, it has to check for a few different types of values that could all be coerced into Boolean values reliably. In addition to the native True and False, it checks for the string representations of the same, as well as a couple single-character representations that might turn up in various situations. If it finds something suitable, it simply returns the appropriate native Boolean value, raising the ValidationError described in the previous section if a suitable value couldn’t be found.
Unfortunately, to_python() is an extra method call that’s not always necessary, so it’s not always called when it seems like it would be. In particular, it’s provided mainly for validating data prior to committing to the database and when retrieving content from serialized data, so when retrieving from the database, it’s assumed that the data has already been validated, and the database backends generally suffice for returning the proper type.
Because of this, Django doesn’t call to_python() when retrieving data from the database. For the built-in types, and many potential add-on fields, this is sufficient, but for other data types or complex objects, some more work will be done to convert the database value to something appropriate to work with. To support these types of fields, Django provides a special way to force to_python() to be called when populating the field’s value.
Supporting Complex Types with SubfieldBase
Sometimes databases just don’t have the necessary data types to support certain types of applications. For example, most databases don’t have a way to store a length of time and present it to Python as a datetime.timedelta 4 object. PostgreSQL has a column type called interval 5 for this purpose, which does map directly to a Python timedelta as it should, but other databases don’t, which makes this impractical in terms of reusability. It would work suitably for PostgreSQL, but in order to make an application portable, it needs to be usable with more than one database.
Thankfully, timedelta stores its values in days, seconds and microseconds, and can write the entire value based on just a number of seconds passed in as a float. Therefore, it’s possible for a new DurationField to use a DecimalField to store a value in the database, convert to a float in Python, then pass it into timedelta for use on the model instance.
import datetime
import re
from django.core.exceptions import ValidationError
def to_python(value):
if isinstance(value, datetime.timedelta):
return value
match = re.match(r'(?:(d+) days?, )?(d+):(d+):(d+)(?:.(d+))?', str(value))
if match:
parts = match.groups()
# The parts in this list are as follows:
# [days, hours, minutes, seconds, microseconds]
# But microseconds need to be padded with zeros to work properly.
parts[4] = groups[4].ljust(6, '0')
# And they all need to be converted to integers, defaulting to 0
parts = [part and int(part) or 0 for part in groups]
return datetime.timedelta(parts[0], parts[3], parts[4],
hours=parts[1], minutes=parts[2])
try:
return datetime.timedelta(seconds=float(value))
except (TypeError, ValueError):
raise ValidationError('This value must be a real number.')
except OverflowError:
raise ValidationError('The maximum allowed value is %s' %
datetime.timedelta.max)
This is the type of process that simply can’t be handled without using to_python(), and it must take place every time the model is instantiated, even when coming from the database. However, calling an extra method call on every access from the database can get quite expensive, so it’s essential to be able to handle this without penalizing those fields that don’t use it.
As will be shown at the end of this chapter, a descriptor can be used to customize what happens when a field’s value is accessed, which can be an excellent way to control this type of behavior. Of course, descriptors can be tricky if they’re just a means to an end, and the to_python() behavior described here is a fairly common need for these complex data types, so Django provides a shortcut to ease the creation of this descriptor.
Located at django.db.models.fields.subclassing, the SubfieldBase metaclass is Django’s way of easing the creation of model fields whose to_python() method will always be called. By simply applying this to a model class, it takes care of the rest, setting up a descriptor that calls to_python() the first time the field is loaded. Therefore, the DurationField example would use this in the field definition as follows:
from django.db import models
from django.db.models.fields.subclassing import SubfieldBase
class DurationField(models.DecimalField, metaclass=SubfieldBase):
pass
# Field logic then continues here
Controlling Database Behavior
Another important aspect of fields is how they interact with the database. This can include how the data itself is stored, how it’s prepared before being sent to the database and how it’s prepared for comparison with values already in the database. This process is already taken by Django itself, with every existing field type providing a few methods to define this behavior.
For custom fields, it’s often necessary to override this behavior, interacting with the database in ways other than how Django itself would expect to do so. The following methods define nearly every aspect of how a field works with the database, so fields have a great deal of control over how the database interaction is handled.
db_type(self, connection)
Rarely overridden by individual fields, this method returns a database-specific string that controls how the column is created for use with the given field. Django internally uses the result of the get_internal_type() method in conjunction with a mapping provided by each individual backend to provide a return value from this method. That functionality is enough for the vast majority of field applications.
The most important thing to remember when considering the use of this method is that its return value is specific to a particular database backend. In order to use this field in projects with different backends, the connection argument is provided to help you decide what to use. In a simple case, you can use connection.settings_dict['ENGINE'] to determine what type of database the field is being used on, and behave accordingly. For example, if DurationField could in fact use interval in PostgreSQL, while still supporting other databases:
class DurationField(models.Field):
def db_type(self, connection):
engine = connection.settings_dict['ENGINE']
if engine == 'django.db.backends.postgresql_psycopg2':
return 'interval'
else:
return connection.creation.data_types['DecimalField']
One other feature of this method is that if you return None instead of a string, Django will skip the creation of this particular field. This can be necessary if the field must be created in a more complicated fashion than a single string can represent. Django will still attempt to reference the column when executing queries, though, so you’ll need to make sure you do in fact create the column before attempting to use this field.
Most of time, you’ll want to leave this method to Django, but it does provide a way to override the default behavior when you really need to. Just be careful doing this in a distributed application, because you’ll end up having to support multiple types of databases, not just the one you’re most familiar with.
get_prep_value(self, value)
There are a few methods that deal with preparing a value for different kids of use within the database, but they typically share the same code for preparing a value for use in the database at all. The get_prep_value() method is used by both of the following methods to perform this basic conversion.
In most cases, converting a Python object to some more basic type will suffice to allow a custom field to pass values to the database. By overriding get_prep_value(), the other database preparation methods can typically use their default implementations without issue. For example, DurationField requires this type of conversion, since timedelta objects can’t be passed directly to most databases, which led to using a DecimalField to control the column’s behavior. A custom get_prep_value() method can convert timedelta objects to Decimal values, which can then be passed to the database normally.
from django.db import models
from django.db.models.fields.subclassing import SubfieldBase
from django.utils import _decima l
class DurationField(models.DecimalField, metaclass=SubfieldBase):
def get_prep_value(self, value):
return _decimal.Decimal('%s.%s' % (value.days * 86400 + value.seconds,
value.microseconds))
# Field logic then continues here
get_db_prep_value(self, value, connection, prepared=False)
In cases when you need to prepare the value differently for different database connections, this method will allow you the flexibility to do so. The connection argument again represents the database connection being used, and can be used to make the necessary decisions about how to proceed. The prepared argument indicates whether the value has already been passed through get_prep_value(). If False, you should call that method before proceeding further.Here’s what DurationField could look like if it continued to split up its behavior between PostgreSQL and other databases:
from django.db import models
from django.db.models.fields.subclassing import SubfieldBase
from django.utils import _decimal
class DurationField(models.DecimalField, metaclass=SubfieldBase):
def get_prep_value(self, value):
# Nothing to do here, because get_db_prep_value() will do the dirty work
return value
def get_db_prep_value(self, value, connection, prepared=False):
if not prepared:
value = self.get_prep_value(value)
engine = connection.settings_dict['ENGINE']
if engine == 'django.db.backends.postgresql_psycopg2':
# PostgreSQL can handle timedeltas directly
return value
else:
return _decimal.Decimal('%s.%s' % (value.days * 86400 + value.seconds,
value.microseconds))
# Field logic then continues here
get_db_prep_save(self, value, connection)
This works much the same as get_db_prep_value(), but offers a way to offer separate behavior when actually saving values into the database, as opposed to other operations. In fact, if you don’t provide an implementation for this method, the default behavior will simply defer to get_db_prep_value(), which will usually suffice.
get_prep_lookup(self, lookup_type, value)
Another area where fields have to interact with the database is when making comparisons between Python objects and values already stored in the database. This takes place every time a QuerySet’s filter() method is used, for instance, in order to generate the necessary database query. Since comparisons might require different handling than saving, Django uses the get_prep_lookup() method to manage this task.
When called, this method receives two explicit arguments, detailing how the lookup is expected to take place. The first, lookup_type, is the type of comparison that was requested in the filter() method. The second, value, is the Python object that was provided for comparison against database values.
While value is fairly straightforward, lookup_type is a little different, because it’s a string containing the requested comparison type. There are several of these available as part of Django’s database API,6 each having its own expectations. This is the full list, including the purpose of each:
Fields that inherit from some existing field can usually avoid overriding this method, as the parent class usually does the right thing. Other times, unfortunately, the child class needs specific handling for certain lookup types, where this can be quite useful. Still other times, it’s necessary to restrict certain types of lookups entirely.
One useful side effect of having Python code executed as part of the lookup process is that it allows exceptions to be thrown for lookups that aren’t valid for that field. This works just like anywhere else, where if you raise an exception, it will bail out of the query early, displaying a message indicating what happened.
WHERE’D MY ERROR GO?
Unfortunately, even though it’s possible—and often quite useful—to raise exceptions within get_prep_lookup(), sometimes you may find that they get suppressed. If this happens, the query will appear to execute, but you’ll likely receive just an empty list as its result, rather than seeing your error.
Due to the some of the hoops QuerySets have to jump through internally, certain types of errors—including TypeError, which seems like an obvious choice to use—get caught and suppressed, causing Django to move on with the process in spite of not getting a valid value for that field.
In order to make sure that the error gets raised to its fullest and works as expected, be sure to use ValueError instead of TypeError, as it doesn’t get caught in the same trap.
get_db_prep_lookup(self, lookup_type, value, connection, prepared=False)
This performs essentially the same task as get_prep_lookup(), except that its output will be fed directly into the database query. It receives the same arguments, with the addition of connection and prepared, which work just like the arguments passed into get_db_prep_value(). The default implementation defers to get_prep_lookup(), which will be sufficient for most needs.
Dealing with Files
Many applications have need to manage content that goes beyond what’s traditionally stored in database. Beyond the usual numbers and strings, there’s a world of other data formats, from audio and video to print-ready Portable Document Format (PDF) files and plenty more. Content like this isn’t well suited for being stored directly in the database—though in some cases it’s at least possible—but it’s still useful to tie it to other content that is in the database.
To handle this, Django provides a special FileField, with extra methods designed to facilitate access to files. It also uses many of the hooks described in this chapter to store a reference to the file in the database, as well as provide a special object that can access files in a portable manner. Django also provides an ImageField, which inherits much of its functionality from FileField, while adding some of its own, specifically tailored for dealing with the special needs of images.
Subclasses of FileField shouldn’t generally need to override many of its methods, since they’re mostly related to those features of a file that are common to all file types. This includes things like the filename and relative path, which don’t have anything to do with the specifics of a particular type of file. Some, however, such as save_file(), can be overridden to provide special handling of attributes related to a specific type of file.
get_directory_name(self)
This method simply returns a relative path that will be stored in the database along with the filename. By default, this looks at the upload_to attribute of the field to determine what the directory should be, and even subclasses should respect this behavior. Exactly how that attribute is used, however, is where subclasses can customize this method to great effect.
Normally, Django creates a directory name using two pieces of information: the upload_to string itself and the current date. The date the file was uploaded is applied to the directory name, replacing certain characters with portions of the date. This allows individual fields to more accurately control where their files are stored, which helps keep directories smaller, and can possibly even make better use of disk capacity.
In a subclass, however, it may be more useful to generate the directory name based on some other type of information, such as the current site’s domain name in multisite setups, or the Internet Protocol (IP) address of the machine where the upload was received, in larger production environments where there are multiple Web servers sharing common storage.
Essentially, anything’s fair game here, as long as it only requires information that can be determined by only having access to the FileField instance. The current site or IP address can be obtained without regard to the current model at all, as can the current time. Other information, however, such as the user who submitted the file, the IP address of his or her remote computer, or the object the file will be attached to, is not accessible from this function, and thus can’t be used.
Of course, there is another option to specify some of this additional information, but doing so bypasses this method entirely. By specifying a callable for upload_to, as described in Django’s file documentation,7 the directory can be generated based on the object it will be attached to, which may include the User who owns the object.
Note that when using a callable as upload_to, that callable is expected to return the entire path, including the directory and filename, so get_directory_name() won’t be called at all in such cases, unless that callable explicitly calls it. Also, the incoming request still isn’t available, even to that callable, so making directory naming decisions based on that information will require a custom view.
get_filename(self, filename)
This works in much the same way as get_directory_name(), except that it’s responsible for specifying the filename portion of the path instead of the directory. It receives the original filename that was specified with the incoming file, and returns a new filename that will be used in the database, as well as the underlying storage system.
If a FileField subclass has need to customize the filename that will be used for a particular file, such as stripping out certain characters or altering the file’s extension, this would be the place to do it. That’s also why it receives the original filename as well, so that it has a way to create a filename that’s at least partially related to the one provided by the user.
By default, its output is combined with that of get_directory_name() to form the full path to be stored in the database and passed to the storage system. Like its counterpart, however, this is only true if the upload_to argument to the field was not a callable. If a callable was specified, it’s responsible for specifying the entire path, including the filename. Therefore, in such cases, this method will only be called if the upload_to callable specifically requests it.
generate_filename(self, instance, filename)
This is the default method used to generate the entire path. It uses the same function signature as a callable upload_to argument, because it plays the exact same role. In fact, internally to FileField, all references for generating the filename to be used for the file reference this method; if a callable was supplied to upload_to, it’s simply assigned to this same name, replacing the default behavior.
The default behavior is to use os.path.join() to combine the output of both the get_directory_name() and get_filename() methods, ignoring the model instance provided as an argument. If a FileField subclass needs the ability to specify the file’s entire path all at once, this method would be the place to do it.
Of course, remember that if a callable was supplied as the upload_to argument, this method will get replaced. This is true regardless of what behavior is supplied by a FileField subclass; the needs of a specific instance always win over the behavior of its class. So, while overriding this behavior can provide a more useful default, it doesn’t remove an individual developer’s ability to replace it entirely.
save_form_data(self, instance, data)
This is a utility method for forms to use as a shortcut for saving a file associated with a model instance. It accepts an instance of the model the field was attached to, as well as the uploaded file data provided by the form. By default, it just extracts the necessary information from the uploaded file object, and passes it through to the standard file saving methods.
The instance argument is an instance of the model where the FileField was defined, and the data argument is an UploadedFile object, as described in Chapter 8. The uploaded file contains a name attributes, which contains the filename and a read() method, which is used to access the file’s contents, so that it can be saved properly.
As this is the primary way files are handled by most areas of Django itself, overriding this field provides an excellent opportunity to tie into extended functionality based on specific field types. For example, Django’s own ImageField uses this as an opportunity to store the width and height of an image in separate fields, so they can be indexed and searched in the database directly. Other file types could take this same approach, storing certain attributes of the file in other fields for easier access later on.
Since this method gets access to the entire file’s contents, it’s possible to pass those contents into most libraries that deal with files. Anything that can read an open file object can process uploaded content by simply wrapping it in a StringIO 8 object. That way, the contents can be accessed without having to write them to the storage system first, only to have to read them back again.
delete_file(self, instance, sender)
While this may look like simply a way to delete a file, it actually serves a very particular purpose, which is alluded to by the presence of a sender argument. The contribute_to_class() method of FileField sets up this method as a listener for the post_delete signal. It’s not intended to be called individually, but instead it gets called every time a model instance with a FileField is deleted. As described for post_delete, the instance argument is the object that was just deleted, and the sender argument is the model class for that instance.
When triggered, it checks to see if the file referenced by this field on the specified instance should be deleted. After all, if no other instances are referencing the same file, and it’s not the default values for new instances, it’s quite likely that no references to the file remain. In those cases, the file is permanently removed from the storage system.
The uses for overriding this are clear, because the logic for when to delete the file are included directly within this method. If a FileField subclass needs to have different rules for this, simply overriding this method is enough to make it happen.
The obvious example is if files should always remain, for historical reasons, even after the model instances associated with them have been deleted. Providing that behavior is a simple matter of just defining an empty implementation of this method.
from django.db import models
class PermanentFileField(models.FileField):
def delete_file(self, instance, sender, **kwargs):
pass
Of course, there are other possible use cases for this as well, but the specifics of what those would look like will depend very much on the needs of an individual application.
attr_class
As a simple attribute, rather than a method, attr_class might not seem like it would provide much power or flexibility. Thankfully, looks are often deceiving, as it’s actually the gateway to some very useful features. The attr_class attribute is set to a class that will be used to represent the field’s value when referenced in Python. That means that the value of this simple attribute is actually the primary way of specifying what features are available on the public API for data entered into a particular FileField instance.
The following section describes the behavior of the class specified by default for this attribute, and how its methods can be overridden to provide additional functionality.
When a model defines a FileField, the value made available as the attribute on actual model instances is a special object designed specifically for managing files. Located at django.db.models.fields.files, the File class provides a number of platform-independent and storage-independent methods for accessing a file’s content and properties of that content, as well as for saving new files and deleting existing ones.
Because it’s the public-facing API for accessing files, it’s often quite useful to provide additional functionality for file types that have common qualities that will need to be referenced often. This provides a nice, clean, object-oriented way to encapsulate that common code in one place, rather than requiring the rest of the application to write it over and over again.
For example, Django’s own ImageField provides its own subclass, ImageFile, which contains additional methods for accessing the width and height of an image, as well as caching it to speed up subsequent accesses. It’s an excellent example of how easy it is to provide this extra functionality.
In addition to providing new methods, though, there are a number of existing methods that could benefit from being overridden. These are a bit less likely to be of use directly, but as ImageFile shows, they can be used to perform some important tasks, such as updating or invalidating cached values.
For the most part, the methods described next map directly to file storage methods described in Chapter 8. The main difference is that these are specific to a particular file type, and can be customized for aspects that are unique to that file type, while storage systems are just designed to work with files, without regard to what type of content gets handled.
This returns the path of the file, if it’s stored on the local filesystem. For files stored on other backends, which can’t be accessed with Python’s built-in open() function, this will raise an AttributeError, because the corresponding method isn’t available on the related storage system object.
This is provided mostly as a compatibility layer with older versions of Django, for those projects that were written before the introduction of this new file handling system. In the real world, projects written for newer versions of Django should avoid the use of this method, and instead use the open() method listed in this section to access files in a more portable fashion. Overriding it will also be of little use, so it’s listed here for completeness with the rest of the API.
This method returns the URL where the file can be retrieved on the Web. It might be served up from the Django project itself, a media server operated by the site’s owners, or even a storage service operated by a third party. The exact details of where this URL comes from are specified by the storage system, so this method is a portable way to access the URL for the file.
Overriding this provides little benefit for the most situations, but there are a few reasons to do so, depending on the situation. One example might be a FileField subclass that manages HTML files with a specific structure, so that the URL might contain a name reference, to direct browsers to a specific point in the file.
This retrieves the size of the underlying file, caching it for future reference. While this can be a very useful feature, there’s little value in overriding it in a subclass. The nature of file size is such that it doesn’t vary depending on file type, and there’s not really anything that can be done to customize how the size is obtained, so it’s just included here for completeness.
This retrieves the file’s content and returns an open file or file-like object, which allows access to the file. This is the preferred method of accessing a file’s contents in a portable fashion, since it passes through to the storage system for the majority of its functionality.
The mode attribute takes all the same options as Python’s own open() function,9 and can be used to open the file for read or write access. One use of overriding this method could be to change the default access mode, but only for changing whether it should be opened in binary mode by default or not. The default should always at least be to open the file for reading, rather than writing.
Another potential reason to subclass this would be to provide custom behaviors to the returned file-like object. By default, this method will return whatever object is returned by the storage system, but particular file types might have use for customizing methods on that object, such as write() or close() to alter how and when the file is written. Because this method is responsible for returning an open file-like object, it can wrap the true file-like object in another, passing through to the real object after doing whatever extra work needs doing.
save(self, name, content, save=True)
As the name implies, this saves a new file to the storage system, replacing the file currently in place on the model instance. The arguments should be mostly self-explanatory, with name being the name the new file should be saved as, and content being the actual contents of the file to written using that name.
Also fairly self-explanatory, this deletes the file directly from the storage system, regardless of which storage system is being used. It also removes the filename from the model instance, so that it no longer references the file.
The save argument works just like the one from the save() method, determining whether the model instance is saved or not. Also like save(), if False is provided, it’s important to make sure the instance is in fact saved eventually. Otherwise, it will contain a reference to a file that has already been deleted. Perhaps worse yet, if another instance saves a file with the same name, the reference from the first instance will no longer be orphaned, but will in fact point to the wrong file entirely.
Overriding this provides most of the same benefits as overriding save(), by being able to remove any cached information so it doesn’t cause confusion if accessed later.
Chapter 2 described the signal dispatching system bundled with Django, and how signals work in general. As explained, signals can be created and made available from any Python module, and can be used for any purpose. For dealing with models, several signals provided out of the box, and can be used in a number of situations.
The following signals are all available at django.db.models.signals, and each sends the model class as the standard sender argument to the listener. In addition, many signals include a model instance as an additional argument. These and other additional arguments are detailed in the descriptions of each individual signal listed here.
class_prepared
This signal fires when Django’s ModelBase metaclass has finished processing a model class, indicating that the class is completely configured and ready to be used. Since the metaclass operates as soon as Python encounters the class declaration, class_prepared is fired before Python even continues processing the module that contains that declaration.
One important note to consider, however, is that this fires just prior to the model being registered with AppCache. Therefore, if a listener for class_prepared looks through AppCache to inspect the models that have been processed up to that point, the model that fired the signal won’t yet be present. There may be some uses for inspecting the application cache at this point in the process, but without a full application cache, its value is quite limited.
Unlike most of the other signals listed in this section, class_prepared only sends the standard sender argument. Since there isn’t any instance available at the point in time when the signal is fired and the _meta attribute on the new model class contains all the information about how it was declared, the model itself is enough to obtain all the information that’s available at that point in time.
>>> from django.db import models
>>> def listener(sender, **kwargs):
... print('%s.%s' % (sender._meta.app_label, sender._meta.object_name))
...
>>> models.signals.class_prepared.connect(listener)
>>> class Article(models.Model):
... title = models.CharField(max_length=255)
... class Meta:
... app_label = 'news'
...
news.Article
Like all signals, listeners for class_prepared can be registered with or without a specific model to listen for, though it may not seem like this would be possible. After all, if the listener must be registered prior to the signal being fired, and the signal is fired before Python even continues with the rest of the module, how can it possibly be registered with a class to listen for? Even if it could, what possible purpose could it serve?
The answer to both of these questions is contribute_to_class(). Remember that attributes on a model are given the opportunity to customize how they’re applied to the model. When an object with a contribute_to_class() method is encountered, that’s called instead of the usual setattr(), where it’s passed the model class and the attribute name, allowing the object to perform whatever functionality it wants to.
The key here is that contribute_to_class() receives the model class as an argument. It makes for an excellent opportunity to register a listener for class_prepared specifically for the class being processed. In fact, depending on the need at hand, this is not only possible, but could be downright essential.
Consider a situation where a field-like object needs to know everything about the model it’s attached to in order to properly configure itself. Since there’s no guarantee that all the other fields have been processed by the time contribute_to_class() is called on the object in question, it’s necessary to defer the rest of the configuration until the class has finished processing.
pre_init and post_init
When a model is instantiated, pre_init fires before any other work is performed. It gets dispatched even before any of the arguments passed into the model are assigned to their appropriate attributes. This is a good opportunity to inspect the arguments that will be assigned to the instance prior to that actually happening, especially since this allows a listener to fire before encountering any errors that might come as a result of the arguments specified.
Because this takes place prior to any of the field values being populated on the object itself, it doesn’t send the new object along when the signal is fired. Instead, it passes along two additional arguments besides sender that correspond to the positional and keyword arguments that were passed in to the model.
Note that even though these are the same names as those usually given to the excess argument technique described in Chapter 2, these are passed to the listener as explicit keyword arguments, rather than using * and **. Listeners must define these arguments explicitly in order for them to work properly.
>>> from django.db.models.signals import pre_init
>>> from news.models import Article
>>> def print_args(sender, args, kwargs, **signal_kwargs):
... print('%s(*%s, **%s)' % (sender._meta.object_name, args, kwargs))
...
>>> pre_init.connect(print_args, sender=Article)
>>> article = Article(title=u'Testing')
Article(*(), **{'title': u'Testing'})
Similarly, post_init gets fired as part of the model instantiation process, but at the end instead of the beginning, once all the arguments have been mapped to the appropriate attributes based on the fields that were defined on the model. Therefore, as the name implies, the object is completely initialized at this point.
It would make sense, then, that when post_init fires, it gets passed the fully configured model instance as well as the standard sender, which is the model class. The new object is passed in as the instance argument to the listener, which can then do with it whatever is necessary, according to the application.
>>> from django.db.models.signals import post_init
>>> from news.models import Article
>>> def print_args(sender, args, kwargs, **signal_kwargs):
... print('Instantiated %r' % instance)
...
>>> post_init.connect(sender=Article)
>>> article = Article(title=u'Testing')
Instantiated <Article: Testing>
pre_save and post_save
When a model instance is being committed to the database, Django provides two ways to hook into that process, both at the beginning and at the end. The primary difference, therefore, between the two is that pre_save is called before the object was committed to the database, while post_save is called afterward. This simple distinction can be very important, depending on the needs of the application.
When triggered by pre_save, a listener receives the model class as sender, and also the instance of the model as instance. This allows the listener to get access to—and even modify—the instance that’s about to be saved, before it hits the database. This can be a useful way to provide or override default arguments for models provided by third-party applications.
On the other hand, post_save is called after the save has been performed, and the instance has been committed to the database. This is a useful step in two ways, because it not only ensures that the data is in fact present in the database, which is necessary when dealing with related models, but it also occurs after Django has made the decision about whether to insert a new record into the database or update an existing record.
In addition to the sender and instance arguments that work the same way as in pre_save, listeners for post_save can receive another argument. The created argument is a Boolean indicating whether or not the instance had to be created from scratch. A value of True means it was newly inserted into the database, while False means an existing record was updated. When using the post_save signal to track database changes, this is an important distinction, and can be used to determine the behavior of other applications. To see this in action, see the history example in Chapter 11 of this book.
Because a model manager’s create() method does in fact commit a new instance to the database, it fires both of these signals. It’s also safe to assume that any time create() is used, the created argument will be True, but just remember that there may well be other times when that argument is also True.
>>> from django.db.models import signals
>>> from news.models import Article
>>> def before(instance, **kwargs):
... print('About to save %s' % instance)
...
>>> signals.pre_save.connect(before, sender=Article)
>>> def after(instance, created, **kwargs):
... print('%s was just %s' % (instance, created and 'created' or 'updated'))
...
>>> signals.post_save.connect(after, sender=Article)
>>> Article.objects.create(title='New article!')
About to save New article!
New Article! was just created<Article: New article!>
A NOTE ABOUT COMBINING PRE_SAVE() AND POST_SAVE()
There’s another very important difference between pre_save and post_save, because they’re not always called as a pair. Because pre_save is triggered at the beginning of the process, you can reliably assume that it will always be called every time a save() is initiated. However, post_save only happens at the end, so if anything goes wrong during the save itself, post_save won’t get triggered.
This is an important distinction, because it may seem convenient to register a pair of listeners for the model saving signals, expecting that both will always be called every time. While that may be true for the majority of cases, and certainly when nothing goes wrong, things do go wrong sometimes. Examples include an entry with a duplicate primary key or other unique column, data being of the wrong type or a timeout connecting to the database.
In situations where this type of behavior is required, the only reasonably sane way to go about it is to override the save() method on the model. This allows custom code to be run before and after the actual database interaction, but it also provides a way to identify problems that occurred in the process. In addition, it allows the code a better opportunity to pair the two pieces of functionality more fully, since if something does go wrong, it’s easier to identify, and thus any pending actions can be canceled as a result.
pre_delete and post_delete
Similar to the previous section in spirit, pre_delete and post_delete are the pair of signals relating to the deletion of model instances. They function almost identically to their saving counterparts, except that they both provide just the sender and instance arguments.
When using post_delete, keep in mind that the instance passed in to the listener will have already been removed from the database, so many of its methods will raise exceptions if used. This is especially true if it had previously related to instances of other models. Those relationships will have been lost by the time post_delete is triggered, so any handling of those situations should be done in pre_delete or by overriding the delete() method on the model. If you do override the model’s delete() method, you’ll need to make sure to access the model and its relationships prior to calling the delete() method on the parent class. Once you delete it through the parent class, you’ll be in the same situation as when using the post_delete signal.
Also, because the instance will have been deleted, its primary key value will no longer match up with anything in the database. However, in order to more accurately keep track of which object was deleted, the primary key value is left intact on the instance, and can be read using the pk shortcut described earlier in this chapter.
post_syncdb
Unrelated to a specific model, post_syncdb is instead triggered as part of the syncdb management command’s normal process. It provides a way for applications to identify when an application’s models have been installed into the database, in order to perform other tasks based on their definitions.
While there are likely other uses for this as well, the primary use for post_syncdb is to either configure the application itself the first time its models are installed in the database, or to identify other applications that are being installed, taking action appropriately. Within Django itself, there are examples of both types of functionality.
The key to making post_syncdb considerably effective is that it uses a different type of value for the sender argument that accompanies all signals. Instead of using a specific model, this signal sends the application’s models module, which is the object Django uses to identify an application. This allows a listener to be configured either for all applications or just the one that registered it.
All applications listed in the INSTALLED_APPS setting emit a post_syncdb signal every time the command is executed, even if nothing has changed. Therefore, in addition to sender, listeners of post_syncdb receive three additional arguments to indicate with more detail the circumstances under which syncdb was called, and help control their behavior in response.
from django.db.models import signals
def app_report(app, created_models, verbosity, **kwargs):
app_label = app.__name__.split('.')[-2]
if verbosity == 0:
# Don't do anything, because the
# user doesn't want to see this.
return
# Get a list of models created for just the current application
app_models = [m for m in created_models if m._meta.app_label == app_label]
if app_models:
# Print a simple status message
print('Created %s model%s for %s.' % (len(app_models),
len(app_models) > 1 and 's' or '',
app_label))
if verbosity == 2:
# Print more detail about the
# models that were installed
for model in app_models:
print(' %s.%s -> %s' % (app_label,
model._meta.object_name,
model._meta.db_table))
elif verbosity == 2:
print('%s had no models created.' % app_label)
signals.post_syncdb.connect(app_report)
Code for post_syncdb listeners is generally placed in an application’s management package, which is automatically loaded whenever manage.py is used for a project containing that application. This ensures that it doesn’t get unnecessarily loaded for situations where it’s not needed, while also making sure that it does get loaded whenever it might be necessary. Also, since it’s Python, code in your management package can do other things as well, such as inspect the INSTALLED_APPS setting and decide whether the listener should even be registered at all.
Given the wide array of tools available for individual models to customize their behavior, their interaction with the database, and that of the field associated with it, the options are nearly limitless. The techniques that follow represent just a small portion of what’s possible.
Loading Attributes on Demand
When working with certain types of data, it’s sometimes quite expensive to construct a complex Python object to represent a given value. Worse yet, some parts of the application might not even use that object, even though the rest of the model might be necessary. Some examples of this in the real world are complex geographic representations or large trees of nested objects.
In these cases, we must be able to get access to the full object when necessary, but it’s very important for performance to not have that object constructed if it won’t be used. Ideally, the data would be loaded from the database when the model is instantiated, but the raw value would just sit on the instance without being loaded into the full object. When the attribute is accessed, it would be constructed at that point, then cached so that subsequent accesses don’t have to keep reconstructing the object.
Looking back again to Chapter 2, descriptors are the perfect tool for this task, since they allow code to be run at the exact moment an attribute is accessed. Some care must be taken to make sure that the fully constructed object is cached properly for future use, but by using a separate name and attname, this is also fairly straightforward.
To illustrate how this would work in practice, consider a field designed to store and retrieve a pickled copy of any arbitrary Python object. There’s no way to know in advance how complicated the Python representation will be, so this is a situation where it’s ideal to delay the construction of that object until it’s actually necessary.
Storing Raw Data
The first step is to tell Django how to manage the raw data in the database, using a standard field. Since pickled objects are just strings, some form of text field would clearly be prudent, and since there’s no way to know in advance how large the pickled representation will be, the nearly limitless TextField seems like an obvious choice.
Of course, given that there will be some extra work going on for this new field, TextField alone won’t suffice. Instead, we’ll create a subclass that inherits the database functionality of TextField, while allowing extra customizations where necessary. Since fields are just Python classes like any other, this works just like you’d expect, but with one addition. In order to interact with the database using a different value than is used to interact with other Python code, the attname attribute needs to be different than the name attribute. This is controlled by a custom get_attname() method.
from django.db import models
class PickleField(models.TextField):
def get_attname(self):
return '%s_pickled' % self.name
This much alone will suffice for getting the field set up properly for the database. At this point, it’s even possible to assign a PickleField instance to a model and sync it with the database, and the column created will be perfectly usable for the duration of this example. Of course, it only manages the raw data so far; it won’t be able to handle real Python objects at all, much less deal with pickling and unpickling as necessary.
Pickling and Unpickling Data
To make the translation between a full Python object and a string representation that can be stored in the database, Python’s pickling modules10 will be the tool of choice. There are actually two separate modules provided by Python for this purpose: cPickle, written in C for improved performance, and pickle, written in pure Python for flexibility and portability. There are some minor differences between the two,11 but they can be used interchangeably.
Having two modules available makes importing a bit trickier than usual. For obvious reasons, it’s very valuable to have the greater performance when it’s available, but a key aspect of Python and Django is the ability to be used across multiple platforms and environments. Therefore, when looking to import a pickling module, it’s best to try the more efficient module first, falling back to the more portable module when necessary.
try:
import cPickle as pickle
except ImportError:
import pickle
With a pickle module available, we can give PickleField the ability to actually pickle and unpickle data. By providing a couple basic methods, it’s possible to interface with the underlying module in a more object-oriented manner. In addition, it’s safe to assume that when preparing to commit to the database, the field’s value will be the full Python object, which obviously must be pickled.
On the other hand, when using a QuerySet’s filter() method to make comparisons against values in the database, pickled data will be quite useless. It would technically be possible to pickle the query’s value to compare against that found in the database, but it would be comparing the pickled values, not the original Python objects, which could lead to incorrect results.
More importantly, even though a pickled value is guaranteed to be unpickled properly when necessary, it’s quite possible that the same value, pickled on different occasions or possibly on different machines, will have different strings representing the original object. This is a documented side effect of the way pickling works, and must be taken into account.
With all of this in mind, it’s unreasonable to allow any kind of comparison against pickled data, so an exception should be thrown if such a comparison is attempted. As described previously in this chapter, that behavior is controlled by get_db_pre_lookup(), which can be overridden to throw such an exception. The full field thus far follows:
class PickleField(models.TextField):
def pickle(self, obj):
return pickle.dumps(obj)
def unpickle(self, data):
return pickle.loads(str(data))
def get_attname(self):
return '%s_pickled' % self.name
def get_db_prep_lookup(self, lookup_type, value):
raise ValueError("Can't make comparisons against pickled data.")
Note that pickle and cPickle only support pickled data strings as plain byte strings, not as full Unicode strings. Since everything in Django gets coerced to Unicode wherever possible, including retrieving from the database, unpickle() needs to take the extra step of forcing it back to a byte string in order to be unpickled properly.
It may seem odd to define separate pickle() and unpickle() methods, when the pickling module is already available in the module’s namespace. After all, it’s not only extra lines of code for you, the developer, to write, but it’s also an extra function call that Python has to go through to get the job done, which slows things down slightly, and seemingly unnecessarily.
The biggest advantage of doing it this way is that if any other application has need to subclass PickleField and wishes to override exactly how the data gets pickled and unpickled, having explicit methods for it makes that process considerably easier. They can just be overridden like normal, and as long as the rest of PickleField just references the methods, the subclass will work quite well.
This gets us one step closer, now that PickleField can store values in the database properly. However, it still doesn’t solve the main issue of loading data into a Python object, and doing so only when it’s really necessary.
Unpickling on Demand
If we weren’t concerned with performance, it’d be easy to perform the unpickling step in the to_python() method and just use SubfieldBase to make sure it happens every time an object is instantiated, regardless of where it came from. Unfortunately, that would incur a good deal of unnecessary overhead for those cases where this field wouldn’t be accessed, so it’s still well worth loading it up on demand, only when it’s requested.
As mentioned earlier, Python descriptors are particularly well suited for this scenario. They get called when an attribute is accessed, and can execute custom code at that time, replacing standard Python behavior with something designed for the task at hand.
The first step is determining how to instantiate the descriptor, which also means identifying what data it will need in order to get the job done. In order to retrieve the raw data from the model instance properly, it’ll need access to the field object, from which it can gather the name of the field itself.
class PickleDescriptor(property):
def __init__(self, field):
self.field = field
That will store references to all the features of the field that will be useful later on. With those in place, it’s possible to write the __get__() and __set__() methods that will actually do the hard work in the long run. Actually, __set__() is the easier of the two to implement; it just has to assign the raw data to the instance’s namespace directly.
def __set__(self, instance, value):
instance.__dict__[self.field.name] = value
setattr(instance, self.field.attname, self.field.pickle(value))
With that in place, the trickiest bit of this whole process is the descriptor’s __get__() method, which must be able to perform the following tasks in order to work properly.
That last one’s actually a bit of a red herring, since it’s easy to make sure that a Python object is available at the end of the method, and just return that, without regard to where it came from. The rest, though, may look like quite a laundry list, but it’s really not that difficult to perform all those tasks in a small, readable method.
def __get__(self, instance, owner):
if instance is None:
return self
if self.field.name not in instance.__dict__:
# The object hasn't been created yet, so unpickle the data
raw_data = getattr(instance, self.field.attname)
instance.__dict__[self.field.name] = self.field.unpickle(raw_data)
return instance.__dict__[self.field.name]
It should be fairly clear how this method performs each of the requirements. The first block checks for accesses from the model class, raising an appropriate exception. The second block does three more tasks, by first checking for the presences of a cached copy, and continuing otherwise. Then, it does two more in one line, unpickling the raw data and storing it in the cache if the cache wasn’t already populated. At the end, it simply returns whatever’s in the cache, regardless of whether it was in the cache when the method began.
Putting It All Together
The only thing left to make the whole thing work is to get the descriptor on the model at the right time, so it’s in place to get called when the attribute is accessed. This is precisely the intent of contribute_to_class(), where Django already provides a way for third-party code, such as this, to tie into the model creation process. Just make sure to always call the conribute_to_class() method on the parent class as well, to make sure that all the standard Django functionality is applied as well as the application’s more specialized requirements.
def contribute_to_class(self, cls, name):
super(PickleField, self).contribute_to_class(cls, name)
setattr(cls, name, PickleDescriptor(self))
With all of that now in place, we have a total of three import statements, two new classes and one new field that performs a very useful task. This is just one example of how this technique can be put to use, and there are as many more as there are applications using complicated Python data structures. The important thing to take away from this example is how to use descriptors to populate those complex objects only when necessary, which can be a big win in situations where they might not always be used.
try:
import cPickle as pickle
except ImportError:
import pickle
from django.db import models
class PickleDescriptor(property):
def __init__(self, field):
self.field = field
def __get__(self, instance, owner):
if instance is None:
return self
if self.field.name not in instance.__dict__:
# The object hasn't been created yet, so unpickle the data
raw_data = getattr(instance, self.field.attname)
instance.__dict__[self.field.name] = self.field.unpickle(raw_data)
return instance.__dict__[self.field.name]
def __set__(self, instance, value):
instance.__dict__[self.field.name] = value
setattr(instance, self.field.attname, self.field.pickle(value))
class PickleField(models.TextField):
def pickle(self, obj):
return pickle.dumps(obj)
def unpickle(self, data):
return pickle.loads(str(data))
def get_attname(self):
return '%s_pickled' % self.name
def get_db_prep_lookup(self, lookup_type, value):
raise ValueError("Can't make comparisons against pickled data.")
def contribute_to_class(self, cls, name):
super(PickleField, self).contribute_to_class(cls, name)
setattr(cls, name, PickleDescriptor(self))
Creating Models Dynamically at Runtime
Chapter 2 demonstrated how Python classes are really just objects like any other, and can be created at runtime by using the built-in type() constructor and passing in some details about how it should be defined. Since Django models are really just Python declared in a specific way, it’s reasonable to expect that they could also be created at runtime using this same feature. Some care must be taken, but this can be an extremely useful technique in a variety of situations.
The trick is to remember how Python processes classes, and how Django processes its models. Chapter 2 already illustrated the basic tools necessary to make this work, so it’s now just a matter of applying that to the specific details of Django models. There are a few things that set models apart from other Python classes:
With these requirements outlined, it’s fairly easy to map a model declaration onto the arguments for type(). In particular, remember that there are three arguments required to construct a class: name, bases and attrs. The model’s name is clearly mapped to name, while the single subclass of models.Model can be wrapped in a tuple and passed into bases. The remainder of the class declaration would go into attrs, including a Meta class for any additional model-level configuration options.
A First Pass
To make a first pass at what this function might look like, let’s start with just the most basic aspect of class creation and work our way out from there. To begin with, consider a function that generates a class with the correct name and base class, to illustrate the basic technique for creating a class dynamically and returning it for use elsewhere.
from django.db import models
def create_model(name):
return type(name, (models.Model,), {})
Unfortunately, that’s actually a little too simplistic. Trying this out in Python will result in a KeyError, because Django expects the attribute dictionary to include a __module__ key, with its value being the import path of the module where the model was defined. This is normally populated by Python automatically for all classes defined in source files, but since we’re generating a model at runtime, it’s not available.
This is just one of the minor details that dynamic models have to face, and there’s really no way of avoiding it entirely. Instead, create_model() needs to be updated to provide a __module__ attribute directly. This is also another example of why it’s a good idea to put this code in one place; imagine having to deal with this every time a dynamic model is required. Here’s what it looks like to include a module path for the class:
def create_model(name, module_path):
return type(name, (models.Model,), {'__module__': module_path})
Now it can accept a module path and keep Django happy. Well, it can keep Django happy as long as the module path has already been imported, which means it has to actually exist. Under normal circumstances, the model’s __module__ attribute is set to the path of the module where it was defined. Since the model will only be processed while executing that module, it’s always guaranteed that the module will exist and have been imported successfully. After all, if it hadn’t, the model would’ve been encountered in the first place.
For now, since the only requirement of the module path is that it be valid and already imported, Django’s own django.db.models will make a reasonable candidate. It should be overridden where appropriate, of course, but it’s a decent default until things get rolling.
def create_model(name, attrs={}, module_path='django.db.models'):
attrs = dict(attrs, __module__=module_path)
return type(name, (models.Model,), attrs)
Clearly, these dynamic models shake things up quite a bit, bypassing much of how Python normally works with a process like this. The __module__ issue is just the first issue encountered, and one of the easiest to work around. Thankfully, even though there are a few others to be handled, it can be well worth it if used properly.
The next step in this basic example is to include a dictionary of attributes to be set as if they were declared directly on a class definition. This will allow fields to be included on the model, as well as custom managers and common methods like __unicode__(). Since we’re already passing a dictionary to be used as attributes, assigning additional items to that dictionary is a simple process.
def create_model(name, attrs={}, module_path='django.db.models'):
attrs = dict(attrs, __module__=module_path)
return type(name, (models.Model,), attrs)
Ordinarily, it’s not advisable to supply a mutable object, such as a dictionary, as a default argument, since modifications to it would affect all future executions of the function. In this example, however, it’s used only to populate a new dictionary, and is immediately replaced by that new dictionary. Because of this, it’s safe to use as the default argument, in an effort to keep the method reasonably succinct.
So far, we’ve set up a 3-line function to create basic models with any number of attributes, which can then be used in other areas of Django. Technically, this function alone could be used to generate any model imaginable, but it already provides a shortcut for setting up __module__, so it would make sense to provide another shortcut for setting up the model configuration by way of a Meta inner class. That way, code to create a model won’t have to set up that class directly.
Adding Model Configuration Options
Django models accept configuration through an inner class called Meta, which contains attributes for all the options that are specified. That should sound familiar, since that’s basically what models themselves do as well. Unfortunately, because of how Django processes the Meta class, we have to take a different approach.
The attributes defined within Meta are passed along into a special Options object, which lives at django.db.models.options. As part of this process, Options makes sure that no attributes were supplied that it doesn’t know how to handle. Unfortunately, because the fact that Meta is a class is just a way to separate its namespace from that of the main model. Options only knows how to handle old-style Python classes—that is, classes that don’t inherit from the built-in object type.
This is an important distinction, because calling type() directly creates a new-style class, even if it doesn’t inherit from object, or any subclasses for that matter. This ends up creating two additional attributes on the class that Options doesn’t know how to deal with, so it raises a TypeError to indicate the problem. That leaves two options for creating a Meta class: removing the additional attributes or creating an old-style class using some other means.
While it would be possible to just remove the attributes that offend Options, an even better idea would be to provide it exactly what it expects: an old-style class. Clearly, using type() is out of the question, which leaves us with just declaring a class using standard syntax. Since this is possible even within functions, and its namespace dictionary can be updated with new attributes, it’s a decent way to go about solving this problem.
from django.db import models
def create_model(name, attrs={}, meta_attrs={}, module_path='django.db.models'):
attrs['__module__'] = module_path
class Meta: pass
Meta.__dict__.update(meta_attrs, __module__=module_path)
attrs['Meta'] = Meta
return type(name, (models.Model,), attrs)
This will now accept two attribute dictionaries, one for the model itself, and another for the Meta inner class. This allows full customization of Django models that can be created at any time. While this may seem like a rather abstract concept at the moment, see Chapter 11 for a full example of how this can be used in practice to automatically record all changes to a model.
Now What?
With a solid foundation of Django’s models under your belt, the next step is to write some code that will allow users to interact with those models. The next chapter will show how views can provide your users with access to these models.
1 http://prodjango.com/sql-injection/
2 http://prodjango.com/model-inheritance/
3 http:/prodjango.com/serialization/
4 http://prodjango.com/timedelta/
5 http://prodjango.com/postgresql-interval/
6 http://prodjango.com/db-api/
7 http://prodjango.com/file-api/
8 http://prodjango.com/stringio/