HTTP requests to your web application usually entail database access, data processing, and template rendering. This is much more expensive in terms of processing than serving a static website.
The overhead in some requests can be significant when your site starts getting more and more traffic. This is where caching becomes precious. By caching queries, calculation results, or rendered content in an HTTP request, you will avoid cost-expensive operations in the following requests. This translates into shorter response times and less processing on the server side.
Django includes a robust cache system that allows you to cache data with different levels of granularity. You can cache a single query, the output of a specific view, parts of rendered template content, or your entire site. Items are stored in the cache system for a default time. You can specify the default timeout for cached data.
This is how you will usually use the cache framework when your application gets an HTTP request:
You can read detailed information about Django's cache system at https://docs.djangoproject.com/en/1.8/topics/cache/.
Django comes with several cache backends. These are:
backends.memcached.MemcachedCache
or backends.memcached.PyLibMCCache
: A memcached backend. memcached is a fast and efficient memory-based cache server. The backend to use depends on the memcached Python bindings you choose.backends.db.DatabaseCache
: Use the database as cache system.backends.filebased.FileBasedCache
: Use the file storage system. Serializes and stores each cache value as a separate file.backends.locmem.LocMemCache
: Local memory cache backend. This the default cache backend.backends.dummy.DummyCache
: A dummy cache backend intended only for development. It implements the cache interface without actually caching anything. This cache is per-process and thread-safe.We are going to use the memcached backend. Memcached runs in memory and it is allotted a specified amount of RAM. When the allotted RAM is full, Memcached starts removing the oldest data to store new data.
Download memcached from http://memcached.org/downloads. If you are using Linux, you can install memcached using the following command:
./configure && make && make test && sudo make install
If you are using Mac OS X, you can install Memcached with the Homebrew package manager using the command brew install Memcached. You can download Homebrew from http://brew.sh.
If you are using Windows, you can find a Windows binary version of memcached at http://code.jellycan.com/memcached/.
After installing Memcached, open a shell and start it using the following command:
memcached -l 127.0.0.1:11211
Memcached will run on port 11211
by default. However, you can specify a custom host and port by using the –l
option. You can find more information about Memcached at http://memcached.org.
After installing Memcached, you have to install its Python bindings. You can do it with the following command:
pip install python3-memcached==1.51
Django provides the following cache settings:
CACHES
: A dictionary containing all available caches for the project.CACHE_MIDDLEWARE_ALIAS
: The cache alias to use for storage.CACHE_MIDDLEWARE_KEY_PREFIX
: The prefix to use for cache keys. Set a prefix to avoid key collisions if you share the same cache between several sites.CACHE_MIDDLEWARE_SECONDS
: The default number of seconds to cache pages.The caching system for the project can be configured using the CACHES
setting. This setting is a dictionary that allows you to specify the configuration for multiple caches. Each cache included in the CACHES
dictionary can specify the following data:
BACKEND
: The cache backend to use.KEY_FUNCTION
: A string containing a dotted path to a callable that takes a prefix, version, and key as arguments and returns a final cache key.KEY_PREFIX
: A string prefix for all cache keys, to avoid collisions.LOCATION
: The location of the cache. Depending on the cache backend, this might be a directory, a host and port, or a name for the in-memory backend.OPTIONS
: Any additional parameters to be passed to the cache backend.TIMEOUT
: The default timeout, in seconds, for storing the cache keys. 300 seconds by default, which is five minutes. If set to None
cache keys will not expire.VERSION
: The default version number for the cache keys. Useful for cache versioning.Let's configure the cache for our project. Edit the
settings.py
file of the educa
project and add the following code to it:
CACHES = { 'default': { 'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache', 'LOCATION': '127.0.0.1:11211', } }
We are using the MemcachedCache
backend. We specify its location using address:port
notation. If you have multiple memcached
instances you can use a list for LOCATION
.
There is a third-party package called django-memcache-status that displays statistics for your memcached
instances in the administration site. For compatibility with Python3, install it from the following fork with the following command:
pip install git+git://github.com/zenx/django-memcache-status.git
Edit the settings.py
file and add 'memcache_status'
to the INSTALLED_APPS
setting. Make sure memcached is running, start the development server in another shell window and open http://127.0.0.1:8000/admin/
in your browser. Log in into the administration site using a superuser. You should see the following block:
This graph shows the cache usage. The green color represents free cache while red indicates used space. If you click the title of the box, it shows detailed statistics about your memcached
instance.
We have setup memcached for our project and are able to monitor it. Let's start caching data!
Django provides the following levels of caching listed below by ascendant order of granularity:
The low-level cache API allows you to store objects in the cache with any granularity. It is located at django.core.cache
. You can import it like this:
from django.core.cache import cache
This uses the default cache. It's equivalent to caches['default']
. Accessing a specific cache is also possible via its alias:
from django.core.cache import caches my_cache = caches['alias']
Let's take a look at how the cache API works. Open the shell with the command python manage.py
shell
and execute the following code:
>>> from django.core.cache import cache >>> cache.set('musician', 'Django Reinhardt', 20)
We access the default cache backend and use set(key, value, timeout)
to store a key named 'musician'
with a value that is the string 'Django Reinhardt'
for 20 seconds. If we don't specify a timeout, Django uses the default timeout specified for the cache backend in the CACHES
setting. Now execute the following code:
>>> cache.get('musician') 'Django Reinhardt'
We retrieve the key from the cache. Wait for 20 seconds and execute the same code:
>>> cache.get('musician') None
The 'musician'
cache key expired and the get()
method returns None because the key is not in the cache anymore.
>>> from courses.models import Subject >>> subjects = Subject.objects.all() >>> cache.set('all_subjects', subjects)
We perform a queryset on the Subject
model and store the returned objects in the 'all_subjects'
key. Let's retrieve the cached data:
>>> cache.get('all_subjects') [<Subject: Mathematics>, <Subject: Music>, <Subject: Physics>, <Subject: Programming>]
We are going to cache some queries in our views. Edit the views.py
file of the courses application and add the following import:
from django.core.cache import cache
In the get()
method of the CourseListView
, replace the following line:
subjects = Subject.objects.annotate( total_courses=Count('courses'))
With the following ones:
subjects = cache.get('all_subjects') if not subjects: subjects = Subject.objects.annotate( total_courses=Count('courses')) cache.set('all_subjects', subjects)
In this code, first we try to get the all_students
key from the cache using cache.get()
. This returns None
if the given key is not found. If no key is found (not cached yet, or cached but timed out) we perform the query to retrieve all Subject
objects and their number of courses, and we cache the result using cache.set()
.
Run the development server and open http://127.0.0.1:8000/
in your browser. When the view is executed, the cache key is not found and the QuerySet is executed. Open http://127.0.0.1:8000/admin/
in your browser and expand the memcached statistics. You should see usage data for the cache similar to the following ones:
Take a look at Curr Items, which should be 1. This shows that there is one item currently stored in the cache. Get Hits shows how many get commands were successful and Get Misses shows the get requests for keys that are missing. The Miss Ratio is calculated using both of them.
Now navigate back to http://127.0.0.1:8000/
using your browser and reload the page several times. If you take a look at the cache statistics now you will see several reads more (Get Hits and Cmd Get have increased).
Many times you will want to cache something that is based on dynamic data. In these cases, you have to build dynamic keys that contain all information required to uniquely identify the cached data. Edit the views.py
file of the courses application and modify the CourseListView
view to make it look like this:
class CourseListView(TemplateResponseMixin, View): model = Course template_name = 'courses/course/list.html' def get(self, request, subject=None): subjects = cache.get('all_subjects') if not subjects: subjects = Subject.objects.annotate( total_courses=Count('courses')) cache.set('all_subjects', subjects) all_courses = Course.objects.annotate( total_modules=Count('modules')) if subject: subject = get_object_or_404(Subject, slug=subject) key = 'subject_{}_courses'.format(subject.id) courses = cache.get(key) if not courses: courses = all_courses.filter(subject=subject) cache.set(key, courses) else: courses = cache.get('all_courses') if not courses: courses = all_courses cache.set('all_courses', courses) return self.render_to_response({'subjects': subjects, 'subject': subject, 'courses': courses})
In this case, we also cache both all courses and courses filtered by subject. We use the all_courses
cache key for storing all courses if no subject is given. If there is a subject we build the key dynamically with 'subject_{}_courses'.format(subject.id)
.
It is important to note that we cannot use a cached queryset to build other querysets, since what we cached are actually the results of the queryset. So we cannot do:
courses = cache.get('all_courses') courses.filter(subject=subject)
Instead we have to create the base queryset Course.objects.annotate(total_modules=Count('modules'))
, which is not going to be executed until it is forced, and use it to further restrict the queryset with all_courses.filter(subject=subject)
in case the data was not found in the cache.
Caching template fragments is a higher level approach. You need to load the cache template tags in your template using {% load cache %}
. Then you will be able to use the {% cache %}
template tag to cache specific template fragments. You will usually use the template tag as follows:
{% cache 300 fragment_name %} ... {% endcache %}
The {% cache %}
tag has two required arguments: The timeout, in seconds, and a name for the fragment. If you need to cache content depending on dynamic data, you can do so by passing additional arguments to the {% cache %}
template tag to uniquely identify the fragment.
Edit the /students/course/detail.html
of the students application. Add the following code at the top of it, just after the {% extends %}
tag:
{% load cache %}
Then, replace the following lines:
{% for content in module.contents.all %} {% with item=content.item %} <h2>{{ item.title }}</h2> {{ item.render }} {% endwith %} {% endfor %}
{% cache 600 module_contents module %} {% for content in module.contents.all %} {% with item=content.item %} <h2>{{ item.title }}</h2> {{ item.render }} {% endwith %} {% endfor %} {% endcache %}
We cache this template fragment using the name module_contents
and passing the current Module
object to it. Thus, we uniquely identify the fragment. This is important to avoid caching a module's contents and serving the wrong content when a different module is requested.
If the USE_I18N
setting is set to True
, the per-site middleware cache will respect the active language. If you use the {% cache %}
template tag you have use one of the translation-specific variables available in templates to achieve the same result, such as {% cache 600 name request.LANGUAGE_CODE %}
.
You can cache the output of individual views using the cache_page
decorator located at django.views.decorators.cache
. The decorator requires a timeout argument (in seconds).
Let's use it in our views. Edit the urls.py
file of the students application and add the following import:
from django.views.decorators.cache import cache_page
Then apply the cache_page
decorator the student_course_detail
and student_course_detail_module
URL patterns, as follows:
url(r'^course/(?P<pk>d+)/$', cache_page(60 * 15)(views.StudentCourseDetailView.as_view()), name='student_course_detail'), url(r'^course/(?P<pk>d+)/(?P<module_id>d+)/$', cache_page(60 * 15)(views.StudentCourseDetailView.as_view()), name='student_course_detail_module'),
Now the result for the StudentCourseDetailView
is cached for 15 minutes.
This is the highest-level cache. It allows you to cache your entire site.
To allow the per-site cache edit the settings.py
file of your project and add the UpdateCacheMiddleware
and FetchFromCacheMiddleware
classes to the MIDDLEWARE_CLASSES
setting as follows:
MIDDLEWARE_CLASSES = ( 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.cache.UpdateCacheMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.cache.FetchFromCacheMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', # ... )
Remember that middlewares are executed in the given order during the request phase, and in reverse order during the response phase. UpdateCacheMiddleware
is placed before CommonMiddleware
because it runs during response time, when middlewares are executed in reverse order. FetchFromCacheMiddleware
is placed after CommonMiddleware
intentionally, because it needs to access request data set by the latter.
Then, add the following settings to the settings.py
file:
CACHE_MIDDLEWARE_ALIAS = 'default' CACHE_MIDDLEWARE_SECONDS = 60 * 15 # 15 minutes CACHE_MIDDLEWARE_KEY_PREFIX = 'educa'
In these settings we use the default cache for our cache middleware and we set the global cache timeout to 15
minutes. We also specify a prefix for all cache keys to avoid collisions in case we use the same memcached backend for multiple projects. Our site will now cache and return cached content for all GET requests.
We have done this to test the per-site cache functionality. However, the per-site cache is not suitable for us, since the course management views need to show updated data to instantly reflect any changes. The best approach to follow in our project is to cache the templates or views that are used to display course contents to students.
We have seen an overview of the methods provided by Django to cache data. You should define your cache strategy wisely and prioritize the most expensive querysets or calculations.