This section contains several design patterns that can help you design and structure your models.
Problem: By design, model instances have duplicated data that cause data inconsistencies.
Solution: Break down your models into smaller models through normalization. Connect these models with logical relationships between them.
Imagine if someone designed our Post table (omitting certain columns) in the following way:
Superhero Name |
Message |
Posted on |
---|---|---|
Captain Temper |
Has this posted yet? |
2012/07/07 07:15 |
Professor English |
It should be 'Is' not 'Has'. |
2012/07/07 07:17 |
Captain Temper |
Has this posted yet? |
2012/07/07 07:18 |
Capt. Temper |
Has this posted yet? |
2012/07/07 07:19 |
I hope you noticed the inconsistent superhero naming in the last row (and captain's consistent lack of patience).
If we were to look at the first column, we are not sure which spelling is correct—Captain Temper or Capt. Temper. This is the kind of data redundancy we would like to eliminate through normalization.
Before we take a look at the fully normalized solution, let's have a brief primer on database normalization in the context of Django models.
Normalization helps you efficiently store data. Once your models are fully normalized, they will not have redundant data, and each model should contain data that is only logically related to it.
To give a quick example, if we were to normalize the Post table so that we can unambiguously refer to the superhero who posted that message, then we need to isolate the user details in a separate table. Django already creates the user table by default. So, you only need to refer to the ID of the user who posted the message in the first column, as shown in the following table:
User ID |
Message |
Posted on |
---|---|---|
12 |
Has this posted yet? |
2012/07/07 07:15 |
8 |
It should be 'Is' not 'Has'. |
2012/07/07 07:17 |
12 |
Has this posted yet? |
2012/07/07 07:18 |
12 |
Has this posted yet? |
2012/07/07 07:19 |
Now, it is not only clear that there were three messages posted by the same user (with an arbitrary user ID), but we can also find that user's correct name by looking up the user table.
Generally, you will design your models to be in their fully normalized form and then selectively denormalize them for performance reasons. In databases, Normal Forms are a set of guidelines that can be applied to a table to ensure that it is normalized. Commonly found normal forms are first, second, and third normal forms, although they could go up to the fifth normal form.
In the next example, we will normalize a table and create the corresponding Django models. Imagine a spreadsheet called 'Sightings' that lists the first time someone spots a superhero using a power or superhuman ability. Each entry mentions the known origins, super powers, and location of first sighting, including latitude and longitude.
Name |
Origin |
Power |
First Used At (Lat, Lon, Country, Time) |
---|---|---|---|
Blitz |
Alien |
Freeze Flight |
+40.75, -73.99; USA; 2014/07/03 23:12 +34.05, -118.24; USA; 2013/03/12 11:30 |
Hexa |
Scientist |
Telekinesis Flight |
+35.68, +139.73; Japan; 2010/02/17 20:15 +31.23, +121.45; China; 2010/02/19 20:30 |
Traveller |
Billionaire |
Time travel |
+43.62, +1.45, France; 2010/11/10 08:20 |
The preceding geographic data has been extracted from http://www.golombek.com/locations.html.
To confirm to the first normal form, a table must have:
Let's try to convert our spreadsheet into a database table. Evidently, our 'Power' column breaks the first rule.
The updated table here satisfies the first normal form. The primary key (marked with a *) is a combination of 'Name' and 'Power', which should be unique for each row.
Name* |
Origin |
Power* |
Latitude |
Longitude |
Country |
Time |
---|---|---|---|---|---|---|
Blitz |
Alien |
Freeze |
+40.75170 |
-73.99420 |
USA |
2014/07/03 23:12 |
Blitz |
Alien |
Flight |
+40.75170 |
-73.99420 |
USA |
2013/03/12 11:30 |
Hexa |
Scientist |
Telekinesis |
+35.68330 |
+139.73330 |
Japan |
2010/02/17 20:15 |
Hexa |
Scientist |
Flight |
+35.68330 |
+139.73330 |
Japan |
2010/02/19 20:30 |
Traveller |
Billionaire |
Time travel |
+43.61670 |
+1.45000 |
France |
2010/11/10 08:20 |
The second normal form must satisfy all the conditions of the first normal form. In addition, it must satisfy the condition that all non-primary key columns must be dependent on the entire primary key.
In the previous table, notice that 'Origin' depends only on the superhero, that is, 'Name'. It doesn't matter which Power we are talking about. So, Origin is not entirely dependent on the composite primary key—Name and Power.
Let's extract just the origin information into a separate table called 'Origins' as shown here:
Name* |
Origin |
---|---|
Blitz |
Alien |
Hexa |
Scientist |
Traveller |
Billionaire |
Now our Sightings table updated to be compliant to the second normal form looks like this:
Name* |
Power* |
Latitude |
Longitude |
Country |
Time |
---|---|---|---|---|---|
Blitz |
Freeze |
+40.75170 |
-73.99420 |
USA |
2014/07/03 23:12 |
Blitz |
Flight |
+40.75170 |
-73.99420 |
USA |
2013/03/12 11:30 |
Hexa |
Telekinesis |
+35.68330 |
+139.73330 |
Japan |
2010/02/17 20:15 |
Hexa |
Flight |
+35.68330 |
+139.73330 |
Japan |
2010/02/19 20:30 |
Traveller |
Time travel |
+43.61670 |
+1.45000 |
France |
2010/11/10 08:20 |
In third normal form, the tables must satisfy the second normal form and should additionally satisfy the condition that all non-primary key columns must be directly dependent on the entire primary key and must be independent of each other.
Think about the Country column for a moment. Given the Latitude and Longitude, you can easily derive the Country column. Even though the country where a superpowers was sighted is dependent on the Name-Power composite primary key it is only indirectly dependent on them.
So, let's separate the location details into a separate Countries table as follows:
Location ID |
Latitude* |
Longitude* |
Country |
---|---|---|---|
1 |
+40.75170 |
-73.99420 |
USA |
2 |
+35.68330 |
+139.73330 |
Japan |
3 |
+43.61670 |
+1.45000 |
France |
Now our Sightings table in its third normal form looks like this:
User ID* |
Power* |
Location ID |
Time |
---|---|---|---|
2 |
Freeze |
1 |
2014/07/03 23:12 |
2 |
Flight |
1 |
2013/03/12 11:30 |
4 |
Telekinesis |
2 |
2010/02/17 20:15 |
4 |
Flight |
2 |
2010/02/19 20:30 |
7 |
Time travel |
3 |
2010/11/10 08:20 |
As before, we have replaced the superhero's name with the corresponding User ID that can be used to reference the user table.
We can now take a look at how these normalized tables can be represented as Django models. Composite keys are not directly supported in Django. The solution used here is to apply the surrogate keys and specify the unique_together
property in the Meta
class:
class Origin(models.Model): superhero = models.ForeignKey(settings.AUTH_USER_MODEL) origin = models.CharField(max_length=100)
class Location(models.Model): latitude = models.FloatField() longitude = models.FloatField() country = models.CharField(max_length=100)
class Meta: unique_together = ("latitude", "longitude")
class Sighting(models.Model): superhero = models.ForeignKey(settings.AUTH_USER_MODEL) power = models.CharField(max_length=100) location = models.ForeignKey(Location) sighted_on = models.DateTimeField()
class Meta: unique_together = ("superhero", "power")
Normalization can adversely affect performance. As the number of models increase, the number of joins needed to answer a query also increase. For instance, to find the number of superheroes with the Freeze capability in USA, you will need to join four tables. Prior to normalization, any information can be found by querying a single table.
You should design your models to keep the data normalized. This will maintain data integrity. However, if your site faces scalability issues, then you can selectively derive data from those models to create denormalized data.
For instance, if counting the sightings in a certain country is very common, then add it as an additional field to the Location
model. Now, you can include the other queries using Django (object-relational mapping) ORM, unlike a cached value.
However, you need to update this count each time you add or remove a sighting. You need to add this computation to the save
method of Sighting, add a signal handler, or even compute using an asynchronous job.
If you have a complex query spanning several tables, such as a count of superpowers by country, then you need to create a separate denormalized table. As before, we need to update this denormalized table every time the data in your normalized models changes.
Denormalization is surprisingly common in large websites because it is tradeoff between speed and space. Today, space is cheap but speed is crucial to user experience. So, if your queries are taking too long to respond, then you might want to consider it.
Too much normalization is not necessarily a good thing. Sometimes, it can introduce an unnecessary table that can complicate updates and lookups.
For example, your User model might have several fields for their home address. Strictly speaking, you can normalize these fields into an Address model. However, in many cases, it would be unnecessary to introduce an additional table to the database.
Rather than aiming for the most normalized design, carefully weigh each opportunity to normalize and consider the tradeoffs before refactoring.
Problem: Distinct models have the same fields and/or methods duplicated violating the DRY principle.
Solution: Extract common fields and methods into various reusable model mixins.
While designing models, you might find certain common attributes or behaviors shared across model classes. For example, a Post
and Comment
model needs to keep track of its created
date and modified
date. Manually copy-pasting the fields and their associated method is not a very DRY approach.
Since Django models are classes, object-oriented approaches such as composition and inheritance are possible solutions. However, compositions (by having a property that contains an instance of the shared class) will need an additional level of indirection to access fields.
Inheritance can get tricky. We can use a common base class for Post
and Comments
. However, there are three kinds of inheritance in Django: concrete, abstract, and proxy.
Concrete inheritance works by deriving from the base class just like you normally would in Python classes. However, in Django, this base class will be mapped into a separate table. Each time you access base fields, an implicit join is needed. This leads to horrible performance.
Proxy inheritance can only add new behavior to the parent class. You cannot add new fields. Hence, it is not very useful for this situation.
Finally, we are left with abstract inheritance.
Abstract base classes are elegant solutions used to share data and behavior among models. When you define an abstract class, it does not create any corresponding table in the database. Instead, these fields are created in the derived non-abstract classes.
Accessing abstract base class fields doesn't need a JOIN
statement. The resulting tables are also self-contained with managed fields. Due to these advantages, most Django projects use abstract base classes to implement common fields or methods.
Limitations of abstract models are as follows:
Here is how the post and comment classes can be initially designed with an abstract base class:
class Postable(models.Model): created = models.DateTimeField(auto_now_add=True) modified = models.DateTimeField(auto_now=True) message = models.TextField(max_length=500) class Meta: abstract = True class Post(Postable): ... class Comment(Postable): ...
To turn a model into an abstract base class, you will need to mention abstract = True
in its inner Meta
class. Here, Postable
is an abstract base class. However, it is not very reusable.
In fact, if there was a class that had just the created
and modified
field, then we can reuse that timestamp functionality in nearly any model needing a timestamp. In such cases, we usually define a model mixin.
Model mixins are abstract classes that can be added as a parent class of a model. Python supports multiple inheritances, unlike other languages such as Java. Hence, you can list any number of parent classes for a model.
Mixins ought to be orthogonal and easily composable. Drop in a mixin to the list of base classes and they should work. In this regard, they are more similar in behavior to composition rather than inheritance.
Smaller mixins are better. Whenever a mixin becomes large and violates the Single Responsibility Principle, consider refactoring it into smaller classes. Let a mixin do one thing and do it well.
In our previous example, the model mixin used to update the created
and modified
time can be easily factored out, as shown in the following code:
class TimeStampedModel(models.Model): created = models.DateTimeField(auto_now_add=True) modified = models.DateTimeField(auto_now =True) class Meta: abstract = True class Postable(TimeStampedModel): message = models.TextField(max_length=500) ... class Meta: abstract = True class Post(Postable): ... class Comment(Postable): ...
We have two base classes now. However, the functionality is clearly separated. The mixin can be separated into its own module and reused in other contexts.
Problem: Every website stores a different set of user profile details. However, Django's built-in User
model is meant for authentication details.
Solution: Create a user profile class with a one-to-one relation with the user model.
Out of the box, Django provides a pretty decent User
model. You can use it when you create a super user or log in to the admin interface. It has a few basic fields, such as full name, username, and e-mail.
However, most real-world projects keep a lot more information about users, such as their address, favorite movies, or their superpower abilities. From Django 1.5 onwards, the default User
model can be extended or replaced. However, official docs strongly recommend storing only authentication data even in a custom user model (it belongs to the auth
app, after all).
Certain projects need multiple types of users. For example, SuperBook can be used by superheroes and non-superheroes. There might be common fields and some distinctive fields based on the type of user.
The officially recommended solution is to create a user profile model. It should have a one-to-one relation with your user model. All the additional user information is stored in this model:
class Profile(models.Model): user = models.OneToOneField(settings.AUTH_USER_MODEL, primary_key=True)
It is recommended that you set the primary_key
explicitly to True
to prevent concurrency issues in some database backends such as PostgreSQL. The rest of the model can contain any other user details, such as birthdate, favorite color, and so on.
While designing the profile model, it is recommended that all the profile detail fields must be nullable or contain default values. Intuitively, we can understand that a user cannot fill out all his profile details while signing up. Additionally, we will ensure that the signal handler also doesn't pass any initial parameters while creating the profile instance.
Ideally, every time a user model instance is created, a corresponding user profile instance must be created as well. This is usually done using signals.
For example, we can listen for the post_save
signal from the user model using the following signal handler:
# signals.py from django.db.models.signals import post_save from django.dispatch import receiver from django.conf import settings from . import models @receiver(post_save, sender=settings.AUTH_USER_MODEL) def create_profile_handler(sender, instance, created, **kwargs): if not created: return # Create the profile object, only if it is newly created profile = models.Profile(user=instance) profile.save()
Note that the profile model has passed no additional initial parameters except for the user instance.
Previously, there was no specific place for initializing the signal code. Typically, they were imported or implemented in models.py
(which was unreliable). However, with app-loading refactor in Django 1.7, the application initialization code location is well defined.
First, create a __init__.py
package for your application to mention your app's ProfileConfig
:
default_app_config = "profiles.apps.ProfileConfig"
Next, subclass the ProfileConfig
method in app.py
and set up the signal in the ready
method:
# app.py from django.apps import AppConfig class ProfileConfig(AppConfig): name = "profiles" verbose_name = 'User Profiles' def ready(self): from . import signals
With your signals set up, accessing user.profile
should return a Profile
object to all users, even the newly created ones.
Now, a user's details will be in two different places within the admin: the authentication details in the usual user admin page and the same user's additional profile details in a separate profile admin page. This gets very cumbersome.
For convenience, the profile admin can be made inline to the default user admin by defining a custom UserAdmin
as follows:
# admin.py from django.contrib import admin from .models import Profile from django.contrib.auth.models import User class UserProfileInline(admin.StackedInline): model = Profile class UserAdmin(admin.UserAdmin): inlines = [UserProfileInline] admin.site.unregister(User) admin.site.register(User, UserAdmin)
Assume that you need several kinds of user profiles in your application. There needs to be a field to track which type of profile the user has. The profile data itself needs to be stored in separate models or a unified model.
An aggregate profile approach is recommended since it gives the flexibility to change the profile types without loss of profile details and minimizes complexity. In this approach, the profile model contains a superset of all profile fields from all profile types.
For example, SuperBook will need a SuperHero
type profile and an Ordinary
(non-superhero) profile. It can be implemented using a single unified profile model as follows:
class BaseProfile(models.Model): USER_TYPES = ( (0, 'Ordinary'), (1, 'SuperHero'), ) user = models.OneToOneField(settings.AUTH_USER_MODEL, primary_key=True) user_type = models.IntegerField(max_length=1, null=True, choices=USER_TYPES) bio = models.CharField(max_length=200, blank=True, null=True) def __str__(self): return "{}: {:.20}". format(self.user, self.bio or "") class Meta: abstract = True class SuperHeroProfile(models.Model): origin = models.CharField(max_length=100, blank=True, null=True) class Meta: abstract = True class OrdinaryProfile(models.Model): address = models.CharField(max_length=200, blank=True, null=True) class Meta: abstract = True class Profile(SuperHeroProfile, OrdinaryProfile, BaseProfile): pass
We grouped the profile details into several abstract base classes to separate concerns. The BaseProfile
class contains all the common profile details irrespective of the user type. It also has a user_type
field that keeps track of the user's active profile.
The SuperHeroProfile
class and OrdinaryProfile
class contain the profile details specific to superhero and non-hero users respectively. Finally, the profile
class derives from all these base classes to create a superset of profile details.
Some details to take care of while using this approach are as follows:
Problem: Models can get large and unmanageable. Testing and maintenance get harder as a model does more than one thing.
Solution: Refactor out a set of related methods into a specialized Service
object.
Fat models, thin views is an adage commonly told to Django beginners. Ideally, your views should not contain anything other than presentation logic.
However, over time pieces of code that cannot be placed anywhere else tend to go into models. Soon, models become a dump yard for the code.
Some of the tell-tale signs that your model can use a Service
object are as follows:
SuperHero
profile with a web service.Models in Django follow the Active Record pattern. Ideally, they encapsulate both application logic and database access. However, keep the application logic minimal.
While testing, if we find ourselves unnecessarily mocking the database even while not using it, then we need to consider breaking up the model class. A Service
object is recommended in such situations.
Service objects are plain old Python objects (POPOs) that encapsulate a 'service' or interactions with a system. They are usually kept in a separate file named services.py
or utils.py
.
For example, checking a web service is sometimes dumped into a model method as follows:
class Profile(models.Model): ... def is_superhero(self): url = "http://api.herocheck.com/?q={0}".format( self.user.username) return webclient.get(url)
This method can be refactored to use a service object as follows:
from .services import SuperHeroWebAPI def is_superhero(self): return SuperHeroWebAPI.is_hero(self.user.username)
The service object can be now defined in services.py
as follows:
API_URL = "http://api.herocheck.com/?q={0}" class SuperHeroWebAPI: ... @staticmethod def is_hero(username): url =API_URL.format(username) return webclient.get(url)
In most cases, methods of a Service
object are stateless, that is, they perform the action solely based on the function arguments without using any class properties. Hence, it is better to explicitly mark them as static methods (as we have done for is_hero
).
Consider refactoring your business logic or domain logic out of models into service objects. This way, you can use them outside your Django application as well.
Imagine there is a business reason to blacklist certain users from becoming superhero types based on their username. Our service object can be easily modified to support this:
class SuperHeroWebAPI: ... @staticmethod def is_hero(username): blacklist = set(["syndrome", "kcka$$", "superfake"]) url =API_URL.format(username) return username not in blacklist and webclient.get(url)
Ideally, service objects are self-contained. This makes them easy to test without mocking, say, the database. They can be also easily reused.
In Django, time-consuming services are executed asynchronously using task queues such as Celery. Typically, the Service
Object actions are run as Celery tasks. Such tasks can be run periodically or after a delay.