Using the cache framework

HTTP requests to your web application usually entail database access, data processing, and template rendering. This is much more expensive in terms of processing than serving a static website.

The overhead in some requests can be significant when your site starts getting more and more traffic. This is where caching becomes precious. By caching queries, calculation results, or rendered content in an HTTP request, you will avoid cost-expensive operations in the following requests. This translates into shorter response times and less processing on the server side.

Django includes a robust cache system that allows you to cache data with different levels of granularity. You can cache a single query, the output of a specific view, parts of rendered template content, or your entire site. Items are stored in the cache system for a default time. You can specify the default timeout for cached data.

This is how you will usually use the cache framework when your application gets an HTTP request:

  1. Try to find the requested data in the cache.
  2. If found, return the cached data.
  3. If not found, perform the following steps:
    1. Perform the query or processing required to obtain the data.
    2. Save the generated data in the cache.
    3. Return the data.

You can read detailed information about Django's cache system at https://docs.djangoproject.com/en/1.8/topics/cache/.

Available cache backends

Django comes with several cache backends. These are:

  • backends.memcached.MemcachedCache or backends.memcached.PyLibMCCache: A memcached backend. memcached is a fast and efficient memory-based cache server. The backend to use depends on the memcached Python bindings you choose.
  • backends.db.DatabaseCache: Use the database as cache system.
  • backends.filebased.FileBasedCache: Use the file storage system. Serializes and stores each cache value as a separate file.
  • backends.locmem.LocMemCache: Local memory cache backend. This the default cache backend.
  • backends.dummy.DummyCache: A dummy cache backend intended only for development. It implements the cache interface without actually caching anything. This cache is per-process and thread-safe.

    Note

    For optimal performance, use a memory-based cache backend such as the Memcached backend.

Installing memcached

We are going to use the memcached backend. Memcached runs in memory and it is allotted a specified amount of RAM. When the allotted RAM is full, Memcached starts removing the oldest data to store new data.

Download memcached from http://memcached.org/downloads. If you are using Linux, you can install memcached using the following command:

./configure && make && make test && sudo make install

If you are using Mac OS X, you can install Memcached with the Homebrew package manager using the command brew install Memcached. You can download Homebrew from http://brew.sh.

If you are using Windows, you can find a Windows binary version of memcached at http://code.jellycan.com/memcached/.

After installing Memcached, open a shell and start it using the following command:

memcached -l 127.0.0.1:11211

Memcached will run on port 11211 by default. However, you can specify a custom host and port by using the –l option. You can find more information about Memcached at http://memcached.org.

After installing Memcached, you have to install its Python bindings. You can do it with the following command:

pip install python3-memcached==1.51

Cache settings

Django provides the following cache settings:

  • CACHES: A dictionary containing all available caches for the project.
  • CACHE_MIDDLEWARE_ALIAS: The cache alias to use for storage.
  • CACHE_MIDDLEWARE_KEY_PREFIX: The prefix to use for cache keys. Set a prefix to avoid key collisions if you share the same cache between several sites.
  • CACHE_MIDDLEWARE_SECONDS: The default number of seconds to cache pages.

The caching system for the project can be configured using the CACHES setting. This setting is a dictionary that allows you to specify the configuration for multiple caches. Each cache included in the CACHES dictionary can specify the following data:

  • BACKEND: The cache backend to use.
  • KEY_FUNCTION: A string containing a dotted path to a callable that takes a prefix, version, and key as arguments and returns a final cache key.
  • KEY_PREFIX: A string prefix for all cache keys, to avoid collisions.
  • LOCATION: The location of the cache. Depending on the cache backend, this might be a directory, a host and port, or a name for the in-memory backend.
  • OPTIONS: Any additional parameters to be passed to the cache backend.
  • TIMEOUT: The default timeout, in seconds, for storing the cache keys. 300 seconds by default, which is five minutes. If set to None cache keys will not expire.
  • VERSION: The default version number for the cache keys. Useful for cache versioning.

Adding memcached to your project

Let's configure the cache for our project. Edit the settings.py file of the educa project and add the following code to it:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    }
}

We are using the MemcachedCache backend. We specify its location using address:port notation. If you have multiple memcached instances you can use a list for LOCATION.

Montioring memcached

There is a third-party package called django-memcache-status that displays statistics for your memcached instances in the administration site. For compatibility with Python3, install it from the following fork with the following command:

pip install git+git://github.com/zenx/django-memcache-status.git

Edit the settings.py file and add 'memcache_status' to the INSTALLED_APPS setting. Make sure memcached is running, start the development server in another shell window and open http://127.0.0.1:8000/admin/ in your browser. Log in into the administration site using a superuser. You should see the following block:

Montioring memcached

This graph shows the cache usage. The green color represents free cache while red indicates used space. If you click the title of the box, it shows detailed statistics about your memcached instance.

We have setup memcached for our project and are able to monitor it. Let's start caching data!

Cache levels

Django provides the following levels of caching listed below by ascendant order of granularity:

  • Low-level cache API: Provides the highest granularity. Allows you to cache specific queries or calculations.
  • Per-view cache: Provides caching for individual views.
  • Template cache: Allows you to cache template fragments.
  • Per-site cache: The highest-level cache. It caches your entire site.

    Note

    Think about your cache strategy before implementing caching. Focus first on expensive queries or calculations, which are not calculated in a per-user basis.

Using the low-level cache API

The low-level cache API allows you to store objects in the cache with any granularity. It is located at django.core.cache. You can import it like this:

from django.core.cache import cache

This uses the default cache. It's equivalent to caches['default']. Accessing a specific cache is also possible via its alias:

from django.core.cache import caches
my_cache = caches['alias']

Let's take a look at how the cache API works. Open the shell with the command python manage.py shell and execute the following code:

>>> from django.core.cache import cache
>>> cache.set('musician', 'Django Reinhardt', 20)

We access the default cache backend and use set(key, value, timeout) to store a key named 'musician' with a value that is the string 'Django Reinhardt' for 20 seconds. If we don't specify a timeout, Django uses the default timeout specified for the cache backend in the CACHES setting. Now execute the following code:

>>> cache.get('musician')
'Django Reinhardt'

We retrieve the key from the cache. Wait for 20 seconds and execute the same code:

>>> cache.get('musician')
None

The 'musician' cache key expired and the get() method returns None because the key is not in the cache anymore.

Note

Always avoid storing a None value in a cache key because you won't be able to distinguish between the actual value and a cache miss.

Let's cache a QuerySet:

>>> from courses.models import Subject
>>> subjects = Subject.objects.all()
>>> cache.set('all_subjects', subjects)

We perform a queryset on the Subject model and store the returned objects in the 'all_subjects' key. Let's retrieve the cached data:

>>> cache.get('all_subjects')
[<Subject: Mathematics>, <Subject: Music>, <Subject: Physics>, <Subject: Programming>]

We are going to cache some queries in our views. Edit the views.py file of the courses application and add the following import:

from django.core.cache import cache

In the get() method of the CourseListView, replace the following line:

subjects = Subject.objects.annotate(
               total_courses=Count('courses'))

With the following ones:

subjects = cache.get('all_subjects')
if not subjects:
    subjects = Subject.objects.annotate(
                   total_courses=Count('courses'))
    cache.set('all_subjects', subjects)

In this code, first we try to get the all_students key from the cache using cache.get(). This returns None if the given key is not found. If no key is found (not cached yet, or cached but timed out) we perform the query to retrieve all Subject objects and their number of courses, and we cache the result using cache.set().

Run the development server and open http://127.0.0.1:8000/ in your browser. When the view is executed, the cache key is not found and the QuerySet is executed. Open http://127.0.0.1:8000/admin/ in your browser and expand the memcached statistics. You should see usage data for the cache similar to the following ones:

Using the low-level cache API

Take a look at Curr Items, which should be 1. This shows that there is one item currently stored in the cache. Get Hits shows how many get commands were successful and Get Misses shows the get requests for keys that are missing. The Miss Ratio is calculated using both of them.

Now navigate back to http://127.0.0.1:8000/ using your browser and reload the page several times. If you take a look at the cache statistics now you will see several reads more (Get Hits and Cmd Get have increased).

Caching based on dynamic data

Many times you will want to cache something that is based on dynamic data. In these cases, you have to build dynamic keys that contain all information required to uniquely identify the cached data. Edit the views.py file of the courses application and modify the CourseListView view to make it look like this:

class CourseListView(TemplateResponseMixin, View):
    model = Course
    template_name = 'courses/course/list.html'

    def get(self, request, subject=None):
        subjects = cache.get('all_subjects')
        if not subjects:
            subjects = Subject.objects.annotate(
                           total_courses=Count('courses'))
            cache.set('all_subjects', subjects)
        all_courses = Course.objects.annotate(
                           total_modules=Count('modules'))
        if subject:
            subject = get_object_or_404(Subject, slug=subject)
            key = 'subject_{}_courses'.format(subject.id)
            courses = cache.get(key)
            if not courses:
                courses = all_courses.filter(subject=subject)
                cache.set(key, courses)
        else:
            courses = cache.get('all_courses')
            if not courses:
                courses = all_courses
                cache.set('all_courses', courses)
        return self.render_to_response({'subjects': subjects,
                                        'subject': subject,
                                        'courses': courses})

In this case, we also cache both all courses and courses filtered by subject. We use the all_courses cache key for storing all courses if no subject is given. If there is a subject we build the key dynamically with 'subject_{}_courses'.format(subject.id).

It is important to note that we cannot use a cached queryset to build other querysets, since what we cached are actually the results of the queryset. So we cannot do:

courses = cache.get('all_courses')
courses.filter(subject=subject)

Instead we have to create the base queryset Course.objects.annotate(total_modules=Count('modules')), which is not going to be executed until it is forced, and use it to further restrict the queryset with all_courses.filter(subject=subject) in case the data was not found in the cache.

Caching template fragments

Caching template fragments is a higher level approach. You need to load the cache template tags in your template using {% load cache %}. Then you will be able to use the {% cache %} template tag to cache specific template fragments. You will usually use the template tag as follows:

{% cache 300 fragment_name %}
    ...
{% endcache %}

The {% cache %} tag has two required arguments: The timeout, in seconds, and a name for the fragment. If you need to cache content depending on dynamic data, you can do so by passing additional arguments to the {% cache %} template tag to uniquely identify the fragment.

Edit the /students/course/detail.html of the students application. Add the following code at the top of it, just after the {% extends %} tag:

{% load cache %}

Then, replace the following lines:

{% for content in module.contents.all %}
    {% with item=content.item %}
        <h2>{{ item.title }}</h2>
        {{ item.render }}
    {% endwith %}
{% endfor %}

With the following ones:

{% cache 600 module_contents module %}
    {% for content in module.contents.all %}
        {% with item=content.item %}
            <h2>{{ item.title }}</h2>
            {{ item.render }}
        {% endwith %}
    {% endfor %}
{% endcache %}

We cache this template fragment using the name module_contents and passing the current Module object to it. Thus, we uniquely identify the fragment. This is important to avoid caching a module's contents and serving the wrong content when a different module is requested.

Note

If the USE_I18N setting is set to True, the per-site middleware cache will respect the active language. If you use the {% cache %} template tag you have use one of the translation-specific variables available in templates to achieve the same result, such as {% cache 600 name request.LANGUAGE_CODE %}.

Caching views

You can cache the output of individual views using the cache_page decorator located at django.views.decorators.cache. The decorator requires a timeout argument (in seconds).

Let's use it in our views. Edit the urls.py file of the students application and add the following import:

from django.views.decorators.cache import cache_page

Then apply the cache_page decorator the student_course_detail and student_course_detail_module URL patterns, as follows:

url(r'^course/(?P<pk>d+)/$',
    cache_page(60 * 15)(views.StudentCourseDetailView.as_view()),
    name='student_course_detail'),

url(r'^course/(?P<pk>d+)/(?P<module_id>d+)/$',
    cache_page(60 * 15)(views.StudentCourseDetailView.as_view()),
    name='student_course_detail_module'),

Now the result for the StudentCourseDetailView is cached for 15 minutes.

Note

The per-view cache uses the URL to build the cache key. Multiple URLs pointing to the same view will be cached separately.

Using the per-site cache

This is the highest-level cache. It allows you to cache your entire site.

To allow the per-site cache edit the settings.py file of your project and add the UpdateCacheMiddleware and FetchFromCacheMiddleware classes to the MIDDLEWARE_CLASSES setting as follows:

MIDDLEWARE_CLASSES = (
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.cache.UpdateCacheMiddleware',
    'django.middleware.common.CommonMiddleware',

    'django.middleware.cache.FetchFromCacheMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    # ...
)

Remember that middlewares are executed in the given order during the request phase, and in reverse order during the response phase. UpdateCacheMiddleware is placed before CommonMiddleware because it runs during response time, when middlewares are executed in reverse order. FetchFromCacheMiddleware is placed after CommonMiddleware intentionally, because it needs to access request data set by the latter.

Then, add the following settings to the settings.py file:

CACHE_MIDDLEWARE_ALIAS = 'default'
CACHE_MIDDLEWARE_SECONDS = 60 * 15  # 15 minutes
CACHE_MIDDLEWARE_KEY_PREFIX = 'educa'

In these settings we use the default cache for our cache middleware and we set the global cache timeout to 15 minutes. We also specify a prefix for all cache keys to avoid collisions in case we use the same memcached backend for multiple projects. Our site will now cache and return cached content for all GET requests.

We have done this to test the per-site cache functionality. However, the per-site cache is not suitable for us, since the course management views need to show updated data to instantly reflect any changes. The best approach to follow in our project is to cache the templates or views that are used to display course contents to students.

We have seen an overview of the methods provided by Django to cache data. You should define your cache strategy wisely and prioritize the most expensive querysets or calculations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset