CHAPTER 8

image

Backend Protocols

As a framework, Django’s purpose is to provide a cohesive set of interfaces to make the most common tasks easier. Some of these tools are contained entirely within Django itself, where it’s easy to maintain consistency. Many other features are—or at least, could be—provided by external software packages.

Although Django itself supports some of the most common software packages for these various features, there are many more out there, especially in corporate environments. In addition to a developer’s preferences for one type of database over another, many other servers are already in use by existing applications that can’t be easily converted to use something different.

Because these types of problems do come up in real life, Django provides easy ways to reference these features without worrying about what implementation actually makes it happen in the background. This same mechanism also allows you to swap out many of these lower-level features with third-party code, to support connecting to other systems or just to customize some facet of behavior.

The sections listed throughout this chapter serve something of a dual purpose. In addition to documenting Django’s generic API for each of these features, each section will also describe how a new backend should be written to implement these features. This includes not only what classes and methods to declare, but also what the package structure might look like, as well as how each piece of the puzzle is expected to behave.

Database Access

Connecting to databases is one of the most fundamental requirements of a modern Web application, and there are a variety of options available. Currently, Django ships with support for some of the more popular open-source database engines, including MySQL, PostgreSQL and SQLite, and even some commercial offerings such as Oracle.

Given the unique features and SQL inconsistencies of different database systems, Django requires an extra layer between its models and the database itself, which must be written specifically for each database engine used. The supported options each ship within Django as a separate Python package containing this intermediary layer, but other databases can also be supported by providing this layer externally.

While Python provides a standardized API for accessing databases, PEP-249,1 each database system interprets the base SQL syntax in a slightly different way and supports a different set of features on top of it, so this section will focus on the areas provided by Django for hooking into the way models access the database. This leaves to the reader the nitty-gritty details of formulating the right queries in each situation.

django.db.backends

This references the backend package’s base module, from which the entirety of the database can be accessed. Accessing the database backend in this manner ensures a unified, consistent interface, regardless of which database package is being used behind the scenes.

Django does a lot of work to make this level of access unnecessary, but there’s only so far it can go without overcomplicating things. When the ORM fails to offer some necessary bit of functionality—for example, updating one column based on the value of another column in pure SQL—it’s always possible to go straight to the source and peek at what’s really going on and adjust the standard behavior or replace it altogether.

Because this is really just an alias for a backend-specific module, the full import paths listed throughout this chapter are only valid when trying to access the database in this manner. When implementing a new backend, the package path will be specific to that backend. For instance, if a backend for connecting with IBM’s DB22 were placed in a package named db2, this module would actually be located at db2/base.py.

DatabaseWrapper

One of the main features of a database backend is the DatabaseWrapper, the class that acts as a bridge between Django and the features of the database library itself. All database features and operations go through this class, in particular an instance of it that’s made available at django.db.connection.

An instance of DatabaseWrapper is created automatically, using the DATABASE_OPTIONS setting as a dictionary of keyword arguments. There isn’t any mandated set of arguments for this class, so it’s essential to document what arguments the backend accepts so developers can customize it accordingly.

There are a few attributes and methods on the DatabaseWrapper class that define some of the more general aspects of the backend’s behavior. Most of these are suitably defined in a base class provided to make this easier. By subclassing django.db.backends.BaseDatabaseWrapper, some sensible default behaviors can be inherited.

Though individual backends are free to override them with whatever custom behavior is appropriate, some must always be explicitly defined by a backend’s DatabaseWrapper. Where that’s the case, the following sections will state this requirement directly.

DatabaseWrapper.features

This object, typically an instance of a class specified as django.db.backends.DatabaseFeatures, contains attributes to indicate whether the backend supports each of a variety of database-related features Django can take advantage of. While the class could technically be named anything, because it’s only ever accessed as an attribute of DatabaseWrapper, it’s always best to remain consistent with Django’s own naming conventions to avoid confusion.

Like DatabaseWrapper itself, Django provides a base class specifying defaults for all the available attributes on this object. Located at django.db.backends.BaseDatabaseFeatures, this can be used to greatly simplify the definition of features in a particular backend. Simply override whatever feature definitions are different for the backend in question.

This is a list of supported features and their default support status:

  • allows_group_by_pk—Indicates whether GROUP BY clauses can use the primary key column. If so, Django can use this to optimize queries in these situations; defaults to False.
  • can_combine_inserts_with_and_without_auto_increment_pk—When inserting multiple records in one pass, this attribute indicates whether the backend can support some inserting records that have values for auto-incrementing primary keys alongside other records that don’t have values. This defaults to False, where Django will simply remove those primary key values from the data before inserting the records into the database.
  • can_defer_constraint_checks—Indicates whether the database allows a record to be deleted without first nullifying any relationships that point to that record; defaults to False.
  • can_distinct_on_fields—Indicates whether the database supports using the DISTINCT ON clause to only check uniqueness of certain fields. This defaults to True, so if the database doesn’t support this clause, be sure to also override the distinct_sql() method, described in the next section, to raise an exception when fields are requested.
  • can_introspect_foreign_keys—Indicates whether the database provides a way for Django to determine what foreign keys are in use; defaults to True.
  • can_return_id_from_insert—Indicates whether the backend can provide the new auto-incremented primary key ID immediately after the record is inserted. Defaults to False; if set to True, you’ll need to also supply the return_insert_id() function described in the next section.
  • can_use_chunked_reads—Indicates whether the database can iterate over portions of the result set without reading it all into memory at once. Defaults to True; if False, Django will load all results into memory before passing them back to an application.
  • empty_fetchmany_value—Specifies what value the database library returns to indicate that no more data is available, when fetching multiple rows; defaults to an empty list.
  • has_bulk_insert—Indicates whether the backend supports inserting multiple records in a single SQL statement; defaults to False.
  • has_select_for_update—Indicates whether the database supports SELECT FOR UPDATE queries, which locks the row while working with it; defaults to False.
  • has_select_for_update_nowait—If you use SELECT FOR UPDATE, and another query already has a lock, some backends allow you to specify a NOWAIT option to fail immediately, rather than wait for the lock to be released. This attribute indicates whether the database supports this feature; defaults to False.
  • interprets_empty_strings_as_nulls—Indicates whether the database treats an empty string as the same value as NULL; defaults to False.
  • needs_datetime_string_cast—Indicates whether dates need to be converted from a string to a datetime object after being retrieved from the database; defaults to True.
  • related_fields_match_type—Indicates whether the database requires relationship fields to be of the same type as the fields they relate to. This is used specifically for the PositiveIntegerField and PositiveSmallIntegerField types; if True, the actual type of the related field will be used to describe the relationship; if False—the default—Django will use an IntegerField instead.
  • supports_mixed_date_datetime_comparisons—Indicates whether the database supports comparing a date to a datetime using a timedelta when finding records; defaults to True. If set to True, make sure to also supply the date_interval_sql() method described in the next section.
  • supports_select_related—Indicates whether the backend allows a QuerySet to pull in related information in advance, to reduce the number of queries in many cases. It defaults to True, but can be set to False when working with non-relational databases, where the notion of “related” doesn’t really apply in the same way.
  • supports_tablespaces—This indicates whether the table supports tablespaces. They’re not part of the SQL standard, so this defaults to False. If this is set to True, be sure to implement the tablespace_sql() method described in the next section.
  • update_can_self_select—Indicates whether the database is capable of performing a SELECT subquery on a table that’s currently being modified with an UPDATE query; defaults to True.
  • uses_autocommit—Indicates whether the backend allows the database to manage auto-commit behavior directly; defaults to False.
  • uses_custom_query_class—Indicates whether the backend supplies its own Query class, which would be used to customize how queries are performed; defaults to False.
  • uses_savepoints—Indicates whether the database supports savepoints in addition to full transactions. Savepoints allow database queries to be rolled back on a more granular basis, without requiring the entire transaction to be undone if something goes wrong. This attribute defaults to False; setting it to True will also require implementations for the savepoint_create_sql(), savepoint_commit_sql(), and savepoint_rollback_sql(sid) methods described in the next section.

There are some additional attributes on this class that aren’t used directly by Django, except in tests. If you try to use any of these features, Django will simply pass through the raw database errors. These attributes are only used in tests to confirm that the database should in fact raise an error for the related operation.

  • allow_sliced_subqueries—Indicates whether the backend can perform slice operations on subqueries; defaults to True.
  • allows_primary_key_0—Indicates whether the backend allows 0 to be used as the value of a primary key column; defaults to True.
  • has_real_datatype—Indicates whether the database has a native datatype to represent real numbers; defaults to False.
  • ignores_nulls_in_unique_constraints—When checking for duplicates on a table with a unique constraint spanning multiple columns, some databases will take NULL values into account and prevent duplicates, while others will ignore them. This attribute defaults to True, which indicates that the database will allow duplicate entries if the only duplicated columns contain NULL values.
  • requires_explicit_null_ordering_when_grouping—Indicates whether the database needs an extra ORDER BY NULL clause when using a GROUP BY clause to prevent the database from trying to order the records unnecessarily; defaults to False.
  • requires_rollback_on_dirty_transaction—If a transaction can’t be completed for any reason, this attribute indicates whether the transaction needs to be rolled back before a new transaction can be started; defaults to False.
  • supports_1000_query_parameters—Indicates whether the backend supports up to 1,000 parameters passed into the query, particularly when using the IN operator; defaults to True.
  • supports_bitwise_or—As its name suggests, this one indicates whether the database supports the bitwise OR operation; defaults to True.
  • supports_date_lookup_using_string—Indicates whether you can use strings instead of numbers when querying against date and datetime fields; defaults to True.
  • supports_forward_references—If the database checks foreign key constraints at the end of the transaction, one record will be able to reference another that has yet to be added to the transaction. This is True by default, but you’ll need to set it to False if the database instead checks these constraints for each record inside a transaction.
  • supports_long_model_names—This one is more self-explanatory, indicating whether the database allows table names to be longer than you might normally expect. This defaults to True and is mostly used to test MySQL, which supports only 64 characters in a table name.
  • supports_microsecond_precision—Indicates whether datetime and time fields support microseconds at the database level; defaults to True.
  • supports_regex_backreferencing—Indicates whether the database’s regular expression engine supports the use of grouping and backreferences of those groups; defaults to True.
  • supports_sequence_reset—Indicates whether the database supports resetting sequences; defaults to True.
  • supports_subqueries_in_group_by—Indicates whether the database supports selecting from a subquery while also performing aggregation using a GROUP BY clause; defaults to True.
  • supports_timezones—Indicates whether you can supply datetime objects that have time zones when interacting with datetime fields in the database; defaults to True.
  • supports_unspecified_pk—If a model uses a primary key other than the default auto-incrementing option, each instance will typically need to specify a primary key. If the database saves the instance even without a primary key, you’ll need to set this to True so that Django can skip the test for that behavior.
  • test_db_allows_multiple_connections—Indicates whether a test-only database supports multiple connections. This defaults to True because most databases do support it, but others might use things like in-memory databases for testing, which might not support multiple connections.

DatabaseWrapper.ops

This is the gateway to most of the database-specific features, primarily to handle the various differences in how each database handles certain types of SQL clauses. Each database vendor has its own set of special syntaxes that need to be supported, and defining those in the backend allows Django to operate without needing to worry about those details.

Like the situations described previously, backends only need to write those operations that deviate from the standard. BaseDatabaseOperations, also living at django.db.models.backends, provides default behaviors for many of these operations, while others must be implemented by the backend itself. The following list explains their purposes and default behaviors.

  • autoinc_sql(table, column)—Returns the SQL necessary to create auto-incrementing primary keys. If the database has a field to support this natively, that field will be chosen using the creation module described in the “Creation of New Structures” section, and this method should return None instead of any SQL statements, which is also the default behavior.
  • bulk_batch_size(fields, objs)—When inserting records in bulk, you will find that some databases have limits that require the records to be split up into multiple batches. Given the fields to insert and the objects containing values for those fields, this method returns the number of records to insert in a single batch. The default implementation simply returns the number of objects, thus always using a single batch to insert any number of records.
  • cache_key_culling_sql()—Returns an SQL template used for selecting a cache key to be culled. The returned template string should contain one %s placeholder, which will be the name of the cache table. It should also include a %%s reference, so that it can be replaced later with the index of the last key before the one that should be culled.
  • compiler(compiler_name)—Returns an SQL compiler based on the given compiler name. By default, this method will import a module according to the compiler_module attribute on the BaseDatabaseOperations object and look up the given compiler_name within that module. The compiler_module is set to "django.db.models.sql.compiler", but you can override it if you’d like to use your own compiler without overriding this method.
  • date_extract_sql(lookup_type, field_name)—Returns an SQL statement that pulls out just a portion of a date so it can be compared to a filter argument. The lookup_type will be one of "year", "month", or "day", while field_name is the name of the table column that contains the date to be checked. This has no default behavior and must be defined by the backend to avoid a NotImplementedError.
  • date_interval_sql(sql, connector, timedelta)—Returns an SQL clause that will perform an operation with a date or datetime column and a timedelta value. The sql argument will contain the necessary SQL for the date or datetime column, and the connector will contain the operator that will be used with the timedelta value. This method is responsible for formatting the expression, as well as for describing the timedelta using the database’s vocabulary.
  • date_trunc_sql(lookup_type, field_name)—Returns an SQL statement that drops off that portion of the date that’s beyond the specificity provided by lookup_type. The possible values are the same as those for date_extract_sql(), but this differs in that if lookup_type is "month", for example, this will return a value that specifies both the month and the year, while date_extract_sql() will return the month without the year. Also like date_extract_sql(), this is no default behavior and must be implemented.
  • datetime_cast_sql()—Returns the SQL required to force a datetime value into whatever format the database library uses to return a true datetime object in Python. The return value will be used as a Python format string, which will receive just the field name, to be referenced as %s in the string. By default, it simply returns "%s", which will work just fine for databases that don’t require any special type casting.
  • deferrable_sql()—Returns the SQL necessary to append to a constraint definition to make that constraint initially deferred, so that it won’t get checked until the end of the transaction. This will be appended immediately after the constraint definition, so if a space is required, the return value must include the space at the beginning. By default, this returns an empty string.
  • distinct_sql(fields)—Returns an SQL clause to select unique records, optionally based on a list of field names. The default implementation returns "DISTINCT" when fields is empty, and raises NotImplementedError when fields is populated, so be sure to override this if the database does support checking for uniqueness based on a limited set of fields.
  • drop_foreignkey_sql()—Returns the SQL fragment that will drop a foreign key reference as part of an ALTER TABLE statement. The name of the reference will be appended automatically afterward, so this needs to specify only the command itself. For example, the default return value is simply "DROP CONSTRAINT".
  • drop_sequence_sql(table)—Returns an SQL statement to drop the auto-incrementing sequence from the specified table. This forms something of a pair with autoinc_sql() because the sequence only needs to be dropped explicitly if it was created explicitly. By default, this returns None to indicate no action is taken.
  • end_transaction_sql(success=True)—Returns the SQL necessary to end an open transaction. The success argument indicates whether the transaction was successful and can be used to determine what action to take. For example, the default implementation returns "COMMIT;" if success is set to True and "ROLLBACK;" otherwise.
  • fetch_returned_insert_id(cursor)—Returns the ID of the last inserted record for backends that support getting that information. The default implementation calls cursor.fetchone()[0].
  • field_cast_sql(db_type)—Returns an SQL fragment for casting the specified database column type to some value that can be more accurately compared to filter arguments in a WHERE clause. The return value must be a Python format string, with the only argument being the name of the field to be cast. The default return value is "%s".
  • force_no_ordering()—Returns a list of names that can be used in an ORDER BY clause to remove all ordering from the query. By default, this returns an empty list.
  • for_update_sql(nowait=False)—Returns an SQL clause that will request a lock when selecting data from the database. The nowait argument indicates whether to include the necessary clause to fail immediately if a lock is already in place, rather than waiting for that lock to be released.
  • fulltext_search_sql(field_name)—Returns an SQL fragment for issuing a full-text search against the specified field, if supported. The string returned should also include a %s placeholder for the user-specified value to be searched against, which will be quoted automatically outside this method. If full-text search isn’t supported by the database, the default behavior will suffice by raising a NotImplementedError with an appropriate message to indicate this.
  • last_executed_query(cursor, sql, params)—Returns the last query that was issued to the database, exactly as it was sent. By default, this method has to reconstruct the query by replacing the placeholders in the sql argument with the parameters supplied by params, which will work correctly for all backends without any extra work. Some backends may have a faster or more convenient shortcut to retrieve the last query, so the database cursor is provided as well, as a means to use that shortcut.
  • last_insert_id(cursor, table_name, pk_name)—Returns the ID of the row inserted by the last INSERT into the database. By default, this simply returns cursor.lastrowid, as specified by PEP-249, but other backends might have other ways of retrieving this value. To help access it accordingly, the method also receives the name of the table where the row was inserted and the name of the primary key column.
  • lookup_cast(lookup_type)—Returns the SQL necessary to cast a value to a format that can be used with the specified lookup_type. The return value must also include a %s placeholder for the actual value to be cast, and by default it simply returns "%s".
  • max_in_list_size()—Returns the number of items that can be used in a single IN clause. The default return value, None, indicates that there’s no limit on the number of those items.
  • max_name_length()—Returns the maximum number of characters the database engine allows to be used for table and column names. This returns None by default, which indicates there’s no limit.
  • no_limit_value()—Returns the value that should be used to indicate a limit of infinity, used when specifying an offset without a limit. Some databases allow an offset to be used without a limit, and in these cases, this method should return None. By default, this raises a NotImplementedError, and must be implemented by a backend to allow offsets to be used without limits.
  • pk_default_value()—Returns the value to be used when issuing an INSERT statement to indicate that the primary key field should use its default value—that is, increment a sequence—rather than some specified ID; defaults to "DEFAULT".
  • prep_for_like_query(x)—Returns a modified form of x, suitable for use with a LIKE comparison in the query’s WHERE clause. By default, this escapes any percent signs (%), underscores (_) or double backslashes (\) found in x with extra backslashes as appropriate.
  • prep_for_ilike_query(x)—Just like prep_for_like_query(), but for case-insensitive comparisons. By default, this is an exact copy of prep_for_like_query(), but can be overridden if the database treats case-insensitive comparisons differently.
  • process_clob(value)—Returns the value referenced by a CLOB column, in case the database needs some extra processing to yield the actual value. By default, it just returns the provided value.
  • query_class(DefaultQueryClass)—If the backend provides a custom Query class, as indicated by DatabaseWrapper.features.uses_custom_query_class, this method must return a custom Query class based on the supplied DefaultQueryClass. If uses_custom_query_class is False, this method is never called, so the default behavior is to simply return None.
  • quote_name(name)—Returns a rendition of the given name with quotes appropriate for the database engine. The name supplied might have already been quoted once, so this method should also take care to check for that and not add extra quotes in that case. Because there’s no established standard for quoting names in queries, this must be implemented by the backend, and will raise a NotImplementedError otherwise.
  • random_function_sql()—Returns the necessary SQL for generating a random value; defaults to "RANDOM()".
  • regex_lookup(lookup_type)—Returns the SQL for performing a regular expression match against a column. The return value should contain two %s placeholders, the first for the name of the column and the other for the value to be matched. The lookup type would be either regex or iregex, the difference being case-sensitivity. By default, this raises a NotImplementedError, which would indicate that regular expressions aren’t supported by the database backend. However, for simple cases, regex and iregex can be supported using the DatabaseWrapper.operators dictionary described in the next section.
  • return_insert_id()—Returns a clause that can be used at the end of an INSERT query to return the ID of the newly inserted record. By default, this simply returns None, which won’t add anything to the query.
  • savepoint_create_sql(sid)—Returns an SQL statement for creating a new savepoint. The sid argument is the name to give the savepoint so it can be referenced later.
  • savepoint_commit_sql(sid)—Explicitly commits the savepoint referenced by the sid argument.
  • savepoint_rollback_sql(sid)—Rolls back a portion of the transaction according to the savepoint referenced by the sid argument.
  • set_time_zone_sql()—Returns an SQL template that can be used to set the time zone for the database connection. The template should accept one %s value, which will be replaced with the time zone to use. By default, this returns an empty string, indicating that the database doesn’t support time zones.
  • sql_flush(style, tables, sequences)—Returns the SQL necessary to remove all the data from the specified structures, while leaving the structures themselves intact. Because this is so different from one database engine to another, the default behavior raises a NotImplementedError and must be implemented by the backend.
  • sequence_reset_by_name_sql(style, sequences)—Returns a list of SQL statements necessary to reset the auto-incrementing sequences named in the sequences list. Like autoinc_sql() and drop_sequence_sql(), this is useful only for databases that maintain independent sequences for automatic IDs, and can return an empty list if not required, which is the default behavior.
  • sequence_reset_sql(style, model_list)—Like sequence_reset_by_name_sql(),—Returns a list of SQL statements necessary to reset auto-incrementing sequences, but the specified list contains Django models instead of sequence names. This also shares the same default behavior of returning an empty list.
  • start_transaction_sql()—Returns the SQL used to enter a new transaction; defaults to "BEGIN;".
  • sql_for_tablespace(tablespace, inline=False)—Returns the SQL to declare a tablespace, or None if the database doesn’t support them, which is the default.
  • validate_autopk_value(value)—Validates that a given value is suitable for use as a serial ID in the database. For example, if the database doesn’t allow zero as a valid ID, that value should raise a ValueError. By default, this simply returns the value, which indicates that it was valid.
  • value_to_db_date(value)—Converts a date object to an object suitable for use with the database for DateField columns.
  • value_to_db_datetime(value)—Converts a datetime object to a value suitable for use with DateTimeField columns.
  • value_to_db_time(value)—Converts a time object to a value that can be used with the database for TimeField columns.
  • value_to_db_decimal(value)—Converts a Decimal object to a value that the database can place in a DecimalField column.
  • year_lookup_bounds(value)—Returns a two-item list representing the lower and upper bounds of a given year. The value argument is an int year and each of the return values is a string representing a full date and time. The first return value is the lowest date and time that is considered part of the supplied year, while the second is the highest date and time that is considered part of that same year.
  • year_lookup_bounds_for_date_feld(value)—Also returns a 2-item list representing the upper and lower date and time boundaries for the year supplied as value. By default, this defers to year_lookup_bounds() but can be overridden in case the database can’t compare a full date/time value against a DateField.

Comparison Operators

Many of the comparisons that can be done in a database follow a simple format, with one value being followed by some kind of operator, then followed by another value to compare it to. Because this is such a common case, and is quite simple to work with, Django uses a much simpler method for defining the operators for these types of comparisons.

Another attribute on the DatabaseWrapper object, operators, contains a dictionary mapping various lookup types to the database operators that implement them. This relies very heavily on the basic structure, because while the key for this dictionary is the lookup type, the value is the SQL fragment that should be placed after the name of the field being compared.

For example, consider the common case where the "exact" lookup is handled by the standard = operator, which would be handled by a dictionary like the following:

class DatabaseWrapper(BaseDatabaseWrapper):
    operators = {
        "exact": "= %s",
    }

This dictionary would then be filled out with the other operators supported by Django.

Obtaining a Cursor

Combining all of these database-specific features with Django’s object-oriented database API makes available a world of possibilities, but they’re all designed to cover the most common cases. Databases support a wide variety of additional functionality that’s either less commonly used or extremely disparate across different implementations. Rather than try to support all these features in all databases, Django instead provides easy access straight to the database itself.

The cursor() method of DatabaseWrapper returns a database cursor straight from the third-party library used to connect with the database itself. In keeping with standard Python policy, this cursor object is compatible with PEP-249, so it might even be possible to use other database abstraction libraries with it. Because the behavior of the attributes and methods on this object are outside Django’s control—often varying wildly across implementations—it’s best to consult the full PEP and your database library’s documentation for details on what can be done with it.

Creation of New Structures

One of the more convenient features Django’s database connection provides is the ability to automatically create tables, columns, and indexes based solely on model definitions declared in Python. Along with a powerful database querying API, this is a key feature in avoiding the use of SQL code throughout an application, keeping it clean and portable.

While the SQL syntax itself is reasonably well standardized with regards to creation of data structures, the names and options available for individual field types are quite varied across different implementations. This is where Django’s database backends come in, providing a mapping of Django’s basic field types to the appropriate column types for that particular database.

This mapping is stored in the backend package’s creation module, which must contain a DatabaseCreation class that subclasses django.db.backends.creation.BaseDatabaseCreation. This class contains an attribute named data_types, with contains a dictionary with keys that match up with the available return values from the various Field subclasses and string values that will be passed to the database as the column’s definition.

The value can also be a Python format string, which will be given a dictionary of field attributes so that customized field settings can be used to determine how the column is created. For example, this is how CharField passes along the max_length attribute. While many field types have common attributes, the ones that are of most use to the column type are likely specific to each individual field. Consult the field’s source code to determine what attributes are available for use in this mapping.

There are a number of basic field types available as internal column types:

  • AutoField—An auto-incrementing numeric field, used for primary keys when one isn’t defined explicitly in the model.
  • BooleanField—A field representing just two possible values: on and off. If the database doesn’t have a separate column that represents this case, it’s also possible to use a single-character CharField to store "1" and "0" to simulate this behavior.
  • CharField—A field containing a limited about of free-form text. Typically, this uses a variable-length string type in the database, using the extra max_length attribute to define the maximum length of a stored value.
  • CommaSeparatedIntegerField—A field containing a list of integers, typically representing IDs, which are stored in a single string separated by commas. Because the list is stored as a string, this also uses a variable-length string type on the database side. Although some databases might have a more intelligent and efficient means of storing this type of data, the field’s code still expects a string of numbers, so the backend should always return one.
  • DateField—A standard date, without any time information associated with it. Most databases should have a date column type, so this should be easy to support. Just make sure the column type used returns a Python datetime.date upon retrieval.
  • DateTimeField—A date, but with associated time information attached, excluding time zones. Again, most reasonable databases will support this easily, but make sure the Python library for it returns a datetime.datetime when retrieving from the database.
  • DecimalField—A fixed-precision decimal number. This is another example of using field attributes to define the database column because the max_digits and decimal_places field attributes should control the database column equivalents.
  • FileField—The name and location of a file stored elsewhere. Django doesn’t support storing files as binary data in the database, so its files are referenced by a relative path and name, which is stored in the associated column. Because that’s text, this again uses a standard variable-length text field, which also utilizes the max_length field attribute.
  • FilePathField—The name and path of a file in a storage system. This field is similar to FileField in many respects, but this is intended to allow users to choose from existing files, while FileField exists to allow saving new files. Because the data actually being stored is essentially the same format, it works the same way, using a variable-length string specified using the max_length attribute.
  • FloatField—A field containing a floating point number. It doesn’t matter if the database stores the number with fixed precision internally, as long as the Python library returns a float for values stored in the column.
  • IntegerField—A field containing a signed 32-bit integer.
  • BigIntegerField—A field containing a signed 64-bit integer.
  • IPAddressField—An Internet Protocol (IP) address, using the current IPv43 standard, represented in Python as a string.
  • GenericIPAddressField—An IP address using either the original IPv4 standard or the newer IPv64 standard.
  • NullBooleanField—A Boolean field that also allows NULL values to be stored in the database.
  • PositiveIntegerField—A field containing an unsigned 32-bit integer.
  • PositiveSmallIntegerField—A field containing an unsigned 8-bit integer.
  • SmallIntegerField—A field containing a signed 8-bit integer.
  • TextField—An unlimited-length text field, or at least the largest text field the database makes available. The max_length attribute has no effect on the length of this field.
  • TimeField—A field representing the time of day, without any associated date information. The database library should return a datetime.time object for values in this column.

Introspection of Existing Structures

In addition to being able to create new table structures based on model information, it’s also possible to use an existing table structure to generate new models. This isn’t a perfect process because some model information doesn’t get stored in the table’s own definition, but it’s a great starting point for new projects that have to work with existing databases, usually to run alongside a legacy application that’s being phased out.

The backend should provide a module called introspection.py for this purpose, which provides a DatabaseIntrospection class with a number of methods for retrieving various details about the table structures. Each method receives an active database cursor; all arguments and return values of each of these methods are documented in the following list, as well as another mapping for picking the right field types based on the underlying column types.

  • get_table_list(cursor)—Returns a list of table names that are present in the database.
  • get_table_description(cursor, table_name)—Given the name of a specific table, found using get_table_list(), this returns a list of tuples, each describing a column in the table. Each tuple follows PEP-249’s standard for the cursor’s description attribute: (name, type_code, display_size, internal_size, precision, scale, null_ok). The type_code here is an internal type used by the database to identify the column type, which will be used by the reverse mapping described at the end of this section.
  • get_relations(cursor, table_name)—Given a table’s name, this returns a dictionary detailing the relationships the table has with other tables. Each key is the column’s index in the list of all columns, while the associated value is a 2-tuple. The first item is the index of the related field according to its table’s columns, and the second item is the name of the associated table. If the database doesn’t provide an easy way to access this information, this function can instead raise NotImplementedError, and relationships will just be excluded from the generated models.
  • get_key_columns(cursor, table_name)—Given a table’s name, this returns a list of columns that relate to other tables and how those references work. Each item in the list is a tuple consisting of the column name, the table it references, and the column within that referenced table.
  • get_indexes(cursor, table_name)—Given the name of a table, this returns a dictionary of all the fields that are indexed in any way. The dictionary’s keys are column names, while the values are additional dictionaries. Each value’s dictionary contains two keys: 'primary_key' and 'unique', each of which is either True or False. If both are False, the column is still indicated as indexed by virtue of being in the outer dictionary at all; it’s just an ordinary index, without primary key or unique constraints. Like get_relations(), this can also raise NotImplementedError if there’s no easy way to obtain this information.

In addition to the preceding methods, the introspection class also provides a dictionary called data_types_reverse, which maps the type_code values in the dictionary returned from get_table_description(). The keys are whatever values are returned with as type_code, regardless of whether that’s a string, integer, or something else entirely. The values are strings containing the names of the Django fields that will support the associated column type.

DatabaseClient

Living in the database backend’s client.py module, this class is responsible for calling the command-line interface (shell) for the current database specified by DATABASE_ENGINE. This is called using the manage.py dbshell command, allowing users to manage the underlying tables’ structure and data manually if necessary.

The class consists of just a single method, runshell(), which takes no arguments. This method is then responsible for reading the appropriate database settings for the given backend and configuring a call to the database’s shell program.

DatabaseError and IntegrityError

Pulled in from {{ backend }}.base, these classes allow exceptions to be handled easily, while still being able to swap out databases. IntegrityError should be a subclass of DatabaseError, so that applications can just check for DatabaseError if the exact type of error isn’t important.

Third-party libraries that conform to PEP-249 will already have these classes available, so they can often just be assigned to the base module’s namespace and work just fine. The only time they would need to be subclassed or defined directly is if the library being used doesn’t behave in a way that’s similar to other databases supported by Django. Remember, it’s all about consistency across the entire framework.

Authentication

While the combination of a username and password is a very common authentication method, it’s far from the only one available. Other methods, such as OpenID, use completely different techniques, which don’t even include a username or password. Also, some systems that do use usernames and passwords may already be storing that information in a different database or structure than Django looks at by default, so some extra handling still needs to be done to verify credentials against the right data.

To address these situations, Django’s authentication mechanism can be replaced with custom code, supporting whatever system needs to be used. In fact, multiple authentication schemes can be used together, with each falling back to the next if it doesn’t produce a valid user account. This is all controlled by a tuple of import paths assigned to the AUTHENTICATION_BACKENDS setting. They will be tried in order from first to last, and only if all backends return None will it be considered a failure to authenticate. Each authentication backend is just a standard Python class that provides two specific methods.

get_user(user_id)

Any time a user’s ID is known in advance, whether from a session variable, a database record, or somewhere else entirely, the authentication backend is responsible for converting that ID into a usable django.contrib.auth.models.User instance. What it means to be an ID could be different for different backends, so the exact type of this argument might also change depending on the backend being used. For django.contrib.auth.backends.ModelBackend, the default that ships with Django, this is the database ID where the user’s information is stored. For others, it might be a username, a domain name, or something else entirely.

authenticate(**credentials)

When the user’s ID isn’t known, it’s necessary to ask for some credentials, with which the appropriate User account can be identified and retrieved. In the default case, these credentials are a username and password, but others may use a URL or a single-use token, for example. In the real world, the backend won’t accept arguments using the ** syntax, but rather it will accept just those arguments that make sense for it. However, because different backends will take different sets of credentials, there’s no single method definition that will suit all cases.

PASSING INFORMATION TO CUSTOM BACKENDS

You might have noticed from the previous sections that the data passed in to an authentication backend depends very much on the backend being used. Django, by default, passes in a username and password from its login form, but other forms can supply whatever other credentials are appropriate for the form.

Storing User Information

One aspect of authentication that might not seem obvious is that all users must, for all intents and purposes, still be represented in Django as User objects in the django.contrib.auth application. This isn’t strictly required by Django as a framework, but most applications—including the provided admin interface—expect users to exist in the database and will make relationships with that model.

For backends that call out to external services for authentication, this means duplicating every user in Django’s database to make sure applications work correctly. On the surface, this sounds like a maintenance nightmare; not only does every existing user need to be copied, but new users need to be added and changes to user information should also be reflected in Django. If all this had to be managed by hand for all users, it would certainly be a considerable problem.

Remember, though, that the only real requirement for an authentication backend is that it receives the user’s credentials and returns a User object. In between, it’s all just standard Python, and the whole of Django’s model API is up for grabs. Once a user has been authenticated behind the scenes, the backend can simply create a new User if one doesn’t already exist. If one does exist, it can even update the existing record with any new information that’s updated in the “real” user database. This way, everything can stay in sync without having to do anything special for Django. Just administer your users using whatever system you’re already using, and let your authentication backend handle the rest.

Files

Web applications typically spend most of their time dealing with information in databases, but there are a number of reasons an application might need to work directly with files as well. Whether it be users uploading avatars or presentations, generating images or other static content on the fly, or even backing up log files on a regular basis, files can become a very important part of an application. As with many other things, Django provides both a single interface for working with files and an API for additional backends to provide additional functionality.

The Base File Class

Regardless of source, destination or purpose, all files in Django are represented as instances of django.core.files.File. This works very much like Python’s own file object, but with a few additions and modifications for use on the Web and with large files. Subclasses of File can alter what goes on behind the scenes, but the following API is standard for all file types. The following attributes are available on all File objects:

  • File.closed—A Boolean indicating whether the file has been closed. When instantiated, all File objects are open, and its contents can be accessed immediately. The close() method sets this to True, and the file must be reopened using open() before its contents can be accessed again.
  • File.DEFAULT_CHUNK_SIZE—Typically an attribute of the file’s class rather than an instance of it, this determines what size chunks should be used with the chunks() method.
  • File.mode—The access mode the file was opened with; defaults to 'rb'.
  • File.name—The name of the file, including any given path relative to where it was opened.
  • File.size—The size of the file’s contents, in bytes.

The following methods are also available on File objects:

  • File.chunks(chunk_size=None)—Iterates over the file’s contents, yielding it in one or more smaller chunks to avoid filling up the server’s available memory with large files. If no chunk_size is provided, the DEFAULT_CHUNK_SIZE, which defaults to 64 KB, will be used.
  • File.close()—Closes the file so its contents become inaccessible.
  • File.flush()—Writes any new pending contents to the actual filesystem.
  • File.multiple_chunks(chunk_size=None)—Returns True if the file is big enough to require multiple calls to chunks() to retrieve the full contents, or False if it can all be read in one pass. The chunk_size argument works the same as in chunks(). Note that this will not actually read the file at this point; it determines the value based on the file’s size.
  • File.open(mode=None)—Reopens the file if it had been previously closed. The mode argument is optional and will default to whatever mode the file had used when it was last open.
  • File.read(num_bytes=None)—Retrieves a certain number of bytes from the file. If called without a size argument, this will read the remainder of the file.
  • File.readlines()—Retrieves the content of the file as a list of lines, as indicated by the presence of newline characters ( and ) in the file. These newline characters are left at the end of each line in this list.
  • File.seek(position)—Moves the internal position of the file to the specified location. All read and write operations are relative to this position, so this allows different parts of the file to be accessed by the same code.
  • File.tell()—Returns the position of the internal pointer, as the number of bytes from the beginning of the file.
  • File.write(content)—Writes the specified contents to the file. This is only available if the file was opened in write mode (a mode beginning with 'w').
  • File.xreadlines()—A generator version of readlines() yielding one line, including newline characters, at a time. In keeping with Python’s own transition away from xreadlines(), this functionality is also provided by iterating over the File object itself.

Handling Uploads

When accepting files from users, things get a little bit trickier, because these files shouldn’t necessarily be saved alongside the rest of your files until your code has had a chance to review them. To facilitate this, Django treats uploaded files a bit differently, using upload handlers to decide what subclass of File should be used to represent them. Each upload handler has a chance to step in during the upload and alter how Django proceeds.

Upload handlers are specified with the FILE_UPLOAD_HANDLERS setting, which takes a sequence of import paths. As uploaded files are being processed, Django calls various methods on each of these handlers in turn, so they can inspect the data as it comes in. There’s no need to all these directly, as it’s automatically handled by Django’s request processing code, but the API for new upload handlers provides ample opportunity to customize how incoming files are managed.

  • FileUploadHandler.__init__(request)—The handler is initialized every time a request comes in with files attached, and the incoming request is passed in so the handler can decide if it needs to handle the files for the request. For example, if it’s designed to write details of the upload to the console of the development server, it might check if the DEBUG setting is True and if request.META['REMOTE_ADDR'] is in the INTERNAL_IPS setting. If a handler should always process every request, this doesn’t need to be defined manually; the inherited default will suffice for most cases.
  • FileUploadHandler.new_file(field_name, file_name, content_type, content_length, charset=None)—This is called for each file submitted in the request, with various details about the file, but none of its actual content. The field_name is the form field name that was used to upload the file, while the file_name is the name of the file itself as reported by the browser. The content_type, content_length and charset are all properties of the file’s contents, but they should be taken with a grain of salt because they can’t be verified without accessing the file’s contents.While not strictly required, the primary function of this method is to set aside a place for the file’s content to be stored when received_data_chunk() is called. There’s no requirement on what type of storage is used, or what attribute is used for it, so nearly anything’s fair game. Common examples are temporary files or StringIO objects. Also, this method provides a way to decide whether certain features should be enabled, such as automatically generated thumbnails of images, determined by the content_type.
  • FileUploadHandler.receive_data_chunk(raw_data, start)—This is one of only two required methods and is called repeatedly throughout the processing of the file, each time receiving a portion of the file’s contents as raw_data, with start being the offset within the file where that content was found. The amount of data called each time is based on the handler’s chunk_size attribute, which defaults to 64KiB.Once this method has completed processing the data chunk, it can also control how other handlers deal with that data. This is determined by whether the method returns any data or not, with any data returned being passed along to the next handler in line. If it returns None, Django will simply repeat the process with the next chunk of data.
  • FileUploadHandler.file_complete(file_size)—As a complement to new_file(), this method is called when Django finds the end of the file in the request. Because this is also the only time the file’s total size can be known with certainty, Django gives each handler a chance to determine what to do with that information.This is the only other required method on an upload handler and should return an UploadedFile object if the file was processed by this handler. The UploadedFile returned will be used by the associated form as the content for the field used to upload the file. If the handler didn’t do anything with the file, for whatever reason, this can return None. However, be careful with this because at least one upload handler must return an UploadedFile to be used with forms.
  • FileUploadHandler.upload_complete()—While file_complete() is called when each file is finished loading, upload_complete() is called once per request, after all uploaded files have been processed completely. If the handler needs to set up any temporary resources while dealing with all the files, this method is the place to clean up after itself, freeing up resources for the rest of the application.

Notice that many of the features made possible by these methods rely on one method knowing what decisions a previous method has already made, but there’s no obvious way to persist this information. Since handlers are instantiated on every incoming request and process files one at a time, it’s possible to simply set custom attributes on the handler object itself, which future method calls can read back to determine how to proceed.

For example, if __init__() sets self.activated to False, receive_data_chunk() can read that attribute to determine whether it should process the chunks it receives or just pass them through to the next handler in line. It’s also possible for new_file() to set the same or similar attribute, so those types of decisions can be made on a per-file basis as well as per-request.

Because each handler works in isolation from the others, there isn’t any standard imposed on which attributes are used or what they’re used for. Instead, interaction among the various installed upload handlers is handled by raising a number of exceptions in various situations. Proper operation of an upload handler doesn’t require the use of any of these, but they can greatly customize how a number of them can work together. Like FileUploadHandler, these are all available at django.core.files.uploadhander.

  • StopUpload—Tells Django to stop processing all files in the upload, preventing all handlers from handling any more data than they’ve already processed. It also accepts a single optional argument, connection_reset, a Boolean indicating whether Django should stop without reading in the remainder of the input stream. The default value of False for this argument means that Django will read the entire request before passing control back to a form, while True will stop without reading it all in, resulting in a “Connection Reset” message shown in the user’s browser.
  • SkipFile—Tells the upload process to stop processing the current file, but continue on with the next one in the list. This is a much more appropriate behavior if there were a problem with a single file in the request, which wouldn’t affect any other files that might be uploaded at the same time.
  • StopFutureHandlers—Only valid if thrown from the new_file() method, this indicates the current upload handler will handle current file directly, and no other handlers should receive any data after it. Any handlers that process data before the handler that raises this exception will continue to execute in their original order, as determined by their placement within the FILE_UPLOAD_HANDLERS setting.

Storing Files

All file storage operations are handled by instances of StorageBase, which lives at django.core.files.storage, with the default storage system specified by an import path in the DEFAULT_FILE_STORAGE setting. A storage system encompasses all the necessary functions for dealing with how and where files are stored and retrieved. By using this extra layer, it’s possible to swap out which storage system is used, without having to make any changes to existing code. This is especially important when moving from development to production because production servers often have specialized needs for storing and serving static files.

To facilitate this level of flexibility, Django provides an API for dealing with files that goes beyond the standard open() function and associated file object provided by Python. Earlier in this chapter, Django’s File object was described, explaining what features are available for dealing with individual files. However, when looking to store, retrieve or list files, storage systems have a different set of tools available.

  • Storage.delete(name)—Deletes a file from the storage system.
  • Storage.exists(name)—Returns a Boolean indicating whether the specified name references a file that already exists in the storage system.
  • Storage.get_valid_name(name)—Returns a version of the given name that’s suitable for use with the current storage system. If it’s already valid, it will be returned unchanged. One of only two methods with default implementations, this will return filenames suitable for a local filesystem, regardless of operating system.
  • Storage.get_available_name(name)—Given a valid name, this returns a version of it that’s actually available for new files to be written, without overwriting any existing files. Being the other method with a default behavior, this will add underscores to the end of the requested name until an available name is found.
  • Storage.open(name, mode='rb', mixin=None)—Returns an open File object, through which the file’s contents can be accessed. The mode accepts all the same arguments as Python’s open() function, allowing for both read and write access. The optional mixin argument accepts a class to be used alongside the File subclass provided by the storage system, to enable additional features on the file returned.
  • Storage.path(name)—Returns the absolute path to the file on the local filesystem, which can be used with Python’s built-in open() function to access the file directly. This is provided as a convenience for the common case where files are stored on the local filesystem. For other storage systems, this will raise a NotImplementedError if there is no valid filesystem path at which the file can be accessed. Unless you’re using a library that only accepts file paths instead of open file objects, you should always open files using Storage.open(), which works across all storage systems.
  • Storage.save(name, content)—Saves the given content to the storage system, preferably under the given name. This name will be passed through get_valid_name() and get_available_name() before being saved, and the return value of this method will be the name that was actually used to store the content. The content argument provided to this method should be a File object, typically as a result of a file upload.
  • Storage.size(name)—Returns the size, in bytes, of the file referenced by name.
  • Storage.url(name)—Returns an absolute URL where the file’s contents can be accessed directly by a Web browser.
  • listdir(path)—Returns the contents of the directory specified by the path argument. The return value is a tuple containing two lists: the first for directories located at the path and the second for files located at that same path.

By default, Django ships with FileSystemStorage, which, as the name implies, stores files on the local filesystem. Typically this means the server’s hard drive, but there are many ways to map other types of filesystems to local paths, so there are already a number of possibilities. There are even more storage options available, though, and there are plenty of ways to customize how even the existing options behave. By subclassing StorageBase, it’s possible to make available a number of other options.

There are a number of things a storage system must provide, starting with most of these methods. One of those methods, get_available_name(), doesn’t strictly need to be supplied by the new storage class because its default implementation is suitable for many situations; overriding it is a matter of preference, not requirement. On the other hand, the get_valid_name() method has a default behavior that’s suitable for most backends, but some might have different file naming requirements and would require a new method to override it.

Two other methods, open() and save(), have still further requirements. By definition, both of these require special handling for each different storage system, but they shouldn’t be overridden directly in most situations. They provide additional logic beyond what’s necessary to store and retrieve files, and that logic should be maintained. Instead, they defer the interaction with the actual storage mechanism to _open() and _save(), respectively, which have a simpler set of expectations.

  • Storage._open(name, mode='rb')—The name and mode arguments are the same as open(), but it no longer has the mixin logic to deal with, so _open() can focus solely on returning a File object suitable for accessing the requested file.
  • Storage._save(name, content)—The arguments here are the same as save(), but the name provided here will have already gone through get_valid_name() and get_available_name(), and the content is guaranteed to be a File instance. This allows the _save() method to focus solely on committing the file’s content to the storage system with the given name.

In addition to providing these methods, most custom storage systems will also need to provide a File subclass with read() and write() methods that are designed to access the underlying data in the most efficient manner. The chunks() method defers to read() internally, so there shouldn’t need to be anything done there to make large files more memory-friendly for applications to work with. Keep in mind that not all filesystems allow reading or writing just part of a file, so the File subclass might also need to take additional steps to minimize both memory usage and network traffic in these situations.

Session Management

When users are casually browsing a Web site, it’s often useful to track some information for them temporarily, even if there are no User accounts associated with them yet. This can range from the time they first visited the site to a shopping cart. The typical solution in these cases is a session—a server-side data store referenced by a key stored in a browser-side cookie. Django comes with built-in support for sessions, with a bit of room for configuration.

Most of the session process is constant: identifying a user without a session, assigning a new key, storing that key in a cookie, retrieving that key later on and acting like a dictionary the entire time. There are some basic settings for the name of the key and how long to use it, but to actually persist any information across multiple page views, the key is used to reference some data stored somewhere on the server, and that’s where the bulk of the customization comes in.

Django uses the SESSION_ENGINE setting to identify which data store class should handle the actual data itself. Three data stores ship with Django itself, covering common tactics like files, database records, and in-memory cache, but there are other options available in different environments, and even the stock classes might require additional customization. To accommodate this, SESSION_ENGINE accepts full import paths, allowing a session data store to be placed in any Django application. This import path points to a module containing a class named SessionStore, which provides the full data store implementation.

Like most of Django’s swappable backends, there’s a base implementation that provides most of the features, leaving fewer details for the subclass to cover. For sessions, that base class is SessionBase, located at django.contrib.sessions.backends.base. That’s what handles the session key generation, cookie management, dictionary access, and accessing to the data store only when necessary. This leaves the custom SessionStore class to implement just five methods, which combine to complete the entire process.

  • SessionStore.exists(session_key)—Returns True if the provided session key is already present in the data store, or False if it’s available for use in a new session.
  • SessionStore.load()—Loads session data from whatever storage mechanism the data store uses, returning a dictionary representing this data. If no session data exists, this should return an empty dictionary, and some backends may require the new dictionary to be saved as well, prior to returning.
  • SessionStore.save()—Commits the current session data to the data store, using the current session key as an identifier. This should also use the session’s expiration date or age to identify when the session would become invalid.
  • SessionStore.delete(session_key)—Removes the session data associated with the given key from the data store.
  • SessionStore.create()—Creates a new session and returns it so external code can add new values to it. This method is responsible for creating a new data container, generating a unique session key, storing that key in the session object, and committing that empty container to the backend before returning.

Also, to help session data stores access the necessary information to do their work, Django also provides a few additional attributes that are managed by SessionBase.

  • session_key—The randomly generated session key stored in the client-side cookie.
  • _session—A dictionary containing the session data associated with the current session key.
  • get_expiry_date()—Returns a datetime.datetime object representing when the session should expire.
  • get_expiry_age()—Returns the number of seconds after which the session should expire.

By implementing just five methods on a subclass of SessionBase, it’s possible to store session data nearly anywhere. Even though this data isn’t tied to a User object, it’s still specific to individual people browsing the site. To store temporary information that’s useful for everyone, a little something else is in order.

Caching

When an application has a lot of seldom-changing information to deal with, it’s often useful to cache this information on the server so it doesn’t have to be generated each and every time it’s accessed. This can save on memory usage on the server, processing time per request, and ultimately helps the application serve more requests in the same amount of time.

There are a number of ways to access Django’s caching mechanism, depending on just how much information needs to be cached. The online documentation5 covers the many general cases on how to set up site-wide caching and per-view caching, but the lower-level details merit a bit more explanation.

Specifying a Backend

Specifying a cache backend in Django works quite a bit differently than other backends discussed in this chapter. Even though there are multiple configuration options to consider, there’s just one setting to control them all. This setting, CACHE_BACKEND, uses the URI syntax6 to accept all the necessary information in a way that can be parsed reliably. It can be split up into three separate parts, each with its own requirements.

CACHE_BACKEND = '{{ scheme }}://{{ host }}/?{{ arguments }}'
  • The scheme portion specifies which backend code should be used to serve out the cache. Django ships with four backends that cover most cases—db, file, locmem, and memcached 7—which are well documented online and cover the majority of cases. For custom backends, this portion of the setting can also accept a full import path to a module that implements the protocol described in the next section.
  • The host specifies where the cache should actually be stored, and its format will vary depending on the backend used. For example, db expects a single database name, file expects a full directory path, memcached expects a list of server addresses, and locmem doesn’t require anything at all. The host can also include by a trailing slash, which can help readability because it makes the whole setting look more like a URI.
  • Arguments are optional and can be provided to customize how caching takes place within the backend. They’re provided using the query-string format, with one argument required for all backends: timeout, the number of seconds before an item should be removed from the cache. Two more arguments are also available for most backends (including all those supplied by Django except for memcached): max_entries, the total number of items that should be stored in the cache before culling old items; and cull_frequency, which controls how many items to purge from the cache when it reaches max_entries.
  • One important thing to realize about cull_frequency is that its value isn’t actually how often items should be removed. Instead, the value is used in a simple formula, 1 / cull_frequency, which determines how many items are affected. So, if you’d like to purge 25% of the items at a time, that’s equivalent to 1/4, so you’d pass cull_frequency=4 as an argument to the cache backend, while half (1/2) of the entries would require passing cull_frequency=2. Essentially, cull_frequency is the number of times the cache must be culled to guarantee all items are purged.

Using the Cache Manually

In addition to the standard site-wide and per-view caching options, it’s also quite simple to use the cache directly, storing specific values so they can be retrieved later without having to perform expensive operations for data that doesn’t change often. This low-level API is available in a generic form through the cache object, living at django.core.cache. Most of the usefulness of this object comes from three methods—get(), set(), and delete()—which work mostly how you’d expect.

>>> cache.set('result', 2 ** 16 – 64 * 4)
>>> print cache.get('result')
65280
>>> cache.delete('result')
>>> print cache.get('result')
None

There are a few details about these methods that bear a little more explanation, and also some additional methods that prove useful. Here is a full list of the available methods, along with their functional details.

  • CacheClass.set(key, value, timeout=None)—This sets the specified value in the cache, using the provided key. By default, the timeout for values to expire from the cache is determined by the timeout passed into the CACHE_BACKEND setting, but that can be overridden by specifying a different timeout as an argument to this method.
  • CacheClass.get(key, default=None)—This method returns the value contained in the cache for the specified key. Normally, cache.get() returns None if the key doesn’t exist in the cache, but sometimes None is a valid value to have in the cache. In these cases, just set default to some value that shouldn’t exist in the cache, and that will be returned instead of None.
  • CacheClass.delete(key)—This deletes the value associated with the given key.
  • CacheClass.get_many(keys)—Given a list of keys, it returns a corresponding list of their values. For some backends, like memcached, this can provide a speed increase over calling cache.get() for each individual key.
  • CacheClass.has_key(key)—This method returns True if the specified key has a value already in the cache or False if the key wasn’t set or has already expired.
  • CacheClass.add(key, value, timeout=None)—This method only attempts to add a new key to the cache, using the specified value and timeout. If the given key already exists in the cache, this method will not update the cache to the new value.

A common idiom when working with cache is to first check to see if a value is already present in the cache and, if not, calculate it and store it in the cache. Then, the value can be retrieved from the cache regardless of whether it was there to begin with, making the code nice and simple. To make this a bit more Pythonic, the cache object also functions a bit like a dictionary, supporting the in operator as an alias for the has_key() method.

def get_complex_data(complex_data):
    if 'complex-data-key' not in cache:
        # Perform complex operations to generate the data here.
        cache.set('complex-data-key', complex_data)
    return cache.get('complex-data-key')

Template Loading

While Chapter 6 showed that when a view or other code requests a template to render, it just passes in a name and a relative path, the actual retrieval of templates is done by special loaders, each of which accesses templates in a different way. By supplying the import paths to one or more of these to the TEMPLATE_LOADERS setting, Django doesn’t need to know in advance how or where you’ll store your templates.

Django ships with three template loaders, representing the most common ways templates are expected to be used, loading files from the filesystem in certain configurations. When these options aren’t enough, it’s fairly straightforward to add your own template loader to locate and retrieve templates in whatever way is best for your environment.

This is actually one of the easiest pluggable interfaces to write because it’s really just a single function. There isn’t even any assumption of what that function should called, much less what module it should be in, or any class it needs to be a part of. The entry in TEMPLATE_LOADERS points directly at the function itself, so no other structure is necessary.

load_template_source(template_name, template_dirs=None)

While the loader can be called anything, the name Django uses for all of its template loaders is load_template_source, so it’s generally best to stick to that convention for ease of understanding. This is also typically placed in its own module, but again, the import path has to be supplied explicitly, so just make sure its location is well-documented.

The first argument is obviously the name of the template to be loaded, which is usually just a standard filename. This doesn’t have to map to an actual file, but views will typically request templates using a filename, so it’s up to the template loader to convert this name to whatever reference is used for templates. That may be database records, URLs pointing to external storage systems, or anything else your site might use to store and load templates.

The second argument to load_template_source() is a list of directories to use when searching for the template. Within Django itself, this is typically not provided, so the default of None is used, indicating that the TEMPLATE_DIRS setting should be used instead. A loader that uses the filesystem should always follow this behavior to maintain consistency with the way other template loaders work. If the loader retrieves templates from somewhere else, this argument can simply be ignored.

What goes on inside the template loader will be quite different from one template loader to the next, varying based on how each loader locates templates. Once a template is found, the loader must return a tuple containing two values: the template’s contents as a string, and a string indicating where the template was found. That second value is used to generate the origin argument to the new Template object, so that it’s easy to find a template if anything goes wrong.

If the given name doesn’t match any templates the loader knows about, it should raise the TemplateDoesNotExist exception, described in Chapter 6. This will instruct Django to move on to the next template loader in the list or to display an error if there are no more loaders to use.

load_template_source.is_usable

If the Python environment doesn’t have the requirements for a template loader to operate, Django also provides a way for the loader to indicate that it shouldn’t be used. This is useful if a template loader relies on a third-party library that hasn’t been installed. Adding an is_usable attribute to the function, set to True or False, will tell Django whether the template loader can be used.

load_template(template_name, template_dirs=None)

In addition to simply loading the source code for a template, this method is responsible for returning a template capable of being rendered. By default, the source returned from load_template_source() is processed by Django’s own template language, but this gives you a chance to replace that with something else entirely. This should still use the load_template_source() method internally to fetch the code for the template, so that users can separate the decision of where to find templates from how those templates should be interpreted.

The return value only needs one method to work properly: render(context). This render() method accepts a template context and returns a string generated by the template source code. The context passed in here works a lot like a standard dictionary, but Django’s contexts are actually a stack of dictionaries, so if you intend to pass this context into another template rendered, you probably need to flatten it to a single dictionary first.

flat_dict = {}
for d in context.dicts:
    flat_dict.update(d)

After this, you’ll have a single dictionary with all the values in it, which is usually suitable for most template languages.

Context Processors

When a template gets rendered, it’s passed a context of variables, which it uses to display information and make basic presentation decisions. If a special type of context, RequestContext, is used, which is available from django.template right alongside the standard Context, Django runs through a list of context processors, each of which gets the opportunity to add new variables to the context of the template. This is not only a great way to add common variables to every template used on the site, but it’s a really easy way to supply information based on information from the incoming HttpRequest object.

The interface for a context processor is quite simple; it’s nothing more than a standard Python function that takes a request as its only argument and returns a dictionary of data to be added to the template’s context. It should never raise an exception, and if no new variables need to be added, based on the specified request, it should just return an empty dictionary. Here’s an example context processor to add an ip_address variable that contains the requesting user’s IP address.

def remote_addr(request):
    return {'ip_address': request.META['REMOTE_ADDR']}

image Note  REMOTE_ADDR isn’t reliable behind proxies and load balancers because its value will be that of the proxy, rather than the true remote IP address. If you’re using these kinds of software, be sure to use the values that are appropriate for your environment.

Installing a context processor is as easy as adding a string to the CONTEXT_PROCESSORS setting list, with each entry being a full Python import path, including the name of the function on the end of it. Also, remember that context processors are only called when templates are rendered using RequestContext. Since context processors accept the incoming request as an argument, there’s no way to call them without this information.

Applied Techniques

The available uses of the tools described in this chapter are many and varied, but there are a few simple examples of how they can be put to good use for some common needs. Take these with a pinch of salt and a sprig of parsley, and make them your own. Without prior knowledge of an application’s working environment, any examples that can be given will, by definition, be fairly abstract, but they should serve as a good outline of how these techniques can be put to good use.

Scanning Incoming Files for Viruses

For sites that allow users to upload files to be distributed to other users, a large amount of trust is placed on the quality of those incoming files. As with any form of user input, there must be a certain level of distrust in this information because there’s always someone out there who wants to do harm to your site and its users.

When looking to let users share specific types of files, it’s often easy to validate using third-party libraries designed to understand those files. Sharing arbitrary files, on the other hand, opens up a world of other possibilities, many of which put your site and its users at risk. Protecting against viruses is an important part of the safety of such an application, and Django’s upload handlers make this an extremely simple task.

For this example, we’ll use an excellent open source virus scanning application, ClamAV,8 which is designed for use in servers, along with pyclamd,9 a Python library for interacting with ClamAV. Together, these provide an easy-to-use interface for scanning any incoming file before it’s even passed to the rest of the application. If a virus is found, the offending file can simply be removed from the input stream immediately, before it can do any harm to anyone.

import pyclamd
from django.core.files import uploadhandler
from django.conf import settings
 
# Set up pyclamd to access running instance of clamavd, according to settings
host = getattr(settings, 'CLAMAV_HOST', 'localhost')
port = getattr(settings, 'CLAMAV_PORT', 3310)
pyclamd.init_network_socket(host, port)
 
class VirusScan(uploadhandler.FileUploadHandler):
    def receive_data_chunk(self, raw_data, start):        try:
          if pyclamd.scan_stream(raw_data):
            # A virus was found, so the file should
            # be removed from the input stream.
            raise uploadhandler.SkipFile()
        except pyclamd.ScanError:
            # Clam AV couldn't be contacted, so the file wasn't scanned.
            # Since we can't guarantee the safety of any files,
            # no other files should be processed either.
            raise uploadhander.StopUpload()
 
        # If everything went fine, pass the data along
        return raw_data
 
    def file_complete(self, file_size):
        # This doesn't store the file anywhere, so it should
        # rely on other handlers to provide a File instance.
        return None

Your application may have more specific requirements, like explaining to users which virus was found and that they should consider cleaning their own system before attempting to share files with others. The key to this example is how easy it is to implement this type of behavior, which might seem very difficult on the surface.

Now What?

As much as there is to learn about accessing the protocols for these various types of backends, putting them to good use requires a good deal of imagination. There’s only so much a book like this can say about how and why to access or replace these lower-level interfaces, so it’s up to you to determine what’s best for your environment and your applications.

While this chapter discussed how to use and overhaul major portions of Django’s infrastructure, sometimes all that’s needed is a simple utility to replace or avoid a lot of redundant code. It’s important to know the difference, and the next chapter will outline the many basic utilities provided in Django’s core distribution.

1 http://prodjango.com/pep-249/

2 http://prodjango.com/db2/

3 http://prodjango.com/ipv4/

4 http://prodjango.com/ipv6/

5 http://prodjango.com/caching/

6 http://prodjango.com/uri/

7 http://prodjango.com/memcached/

8 http://prodjango.com/clamav/

9 http://prodjango.com/pyclamd/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset