Django and Amazon S3

By default, Django models with FileField use local filesystem storage for uploaded files. Recent versions, however, have implemented an excellent pluggable storage system. This means we can substitute our own file storage mechanism for the default filesystem storage.

Custom storage can be implemented for almost any backend storage system. Essentially anything Python can connect with and store data on can be used as a storage backend. For many popular storage services, community plugins already exist. One outstanding project is David Larlet's django-storages, available at:

http://code.welldev.org/django-storages/.

The django-storages application supports lots of popular storage services and even allows us to store our files in a database. For the purpose of this section, however, we are interested in its support of Amazon's S3 storage service.

S3 stands for Simple Storage Service. It is a product from Amazon's Web Services group that includes extremely competitive pricing for storage and bandwidth, a high availability around the world, and special integration with the Amazon CloudFront content delivery network. It also supports permissions-based authentication mechanisms and a service-level agreement that covers very rare downtime.

Amazon S3 can be accessed using either a REST or SOAP API. Wrapper libraries are available in almost all web-development languages, including Python. The django-storages app mentioned bundles everything needed to use S3 as a storage module. For developers interested in more direct access to S3 functionality, there is a community project called boto, available on Google Code at: http://code.google.com/p/boto/. In addition to S3, boto supports operations for the other Amazon Web Services tools.

In part due to the excellent design of the S3 storage backend included with django-storages, integrating S3 support into our Django models is relatively easy. In most cases all that are required are a few additions to our Django settings file. These settings specify the access keys for our Amazon S3 account, provided upon sign up with the service.

An example settings configuration looks like this:

DEFAULT_FILE_STORAGE = 'backends.s3.S3Storage'
AWS_ACCESS_KEY = 'xxxxxxxxxxxxxx'
AWS_SECRET_KEY = 'xxxxxxxxxxxxxx'
AWS_STORAGE_BUCKET_NAME = 'content4sale'

These settings and the django-storages app are all that we need to convert our FileField and ImageFields to S3. The AWS_STORAGE_BUCKET_NAME setting specifies the S3 bucket where our files will live. S3 uses buckets somewhat like file directories on the hard drive. Buckets only live at one level, though, meaning there are no subdirectories or subfolders.

Files in a bucket are accessed using a key. This key effectively becomes a file name and can simulate subfolder hierarchies by including the usual / character as part of its key. For example, if we wanted to store videos in our bucket under a video subdirectory, we could set the file key to something like this:

videos/cranberry_farming_intro.mp4

Technically this is not a file path, just a key that looks like one. However, web browsers, humans, and other tools will generally not notice the difference.

File-level authorization and security takes place at the S3 service level and can be managed with a variety of desktop and web-based tools.

We should also note that if we need to use a combination of S3 and filesystem storages, we can do so by not setting the DEFAULT_FILE_STORAGE setting and instead using the storage= keyword argument on the FileField or ImageField where we'd like to store the results in S3.

Query string request authentication

Amazon's S3 service allows developers to provide access to private files stored in an S3 bucket using Query String Request Authentication. This results in a URL that can be accessed by third parties without any passwords or other complications.

These special URLs can include an expiration time expressed in UNIX epoch format. Any request made after this time will be denied. This limits the potential for multiple downloads and other abuses. It's not foolproof, but is a quick and easy solution to the private downloads problem and it could be sophisticated enough for many applications.

Implementing Query String Request Authentication is a matter of constructing a standard S3 URL with a set of query string parameters. These parameters include our AWS access key, the expiration time, and a signature. The signature is the key component as it allows S3 to verify the validity of the requests made to the URL.

To create the signature parameter of the authenticated URL, we need to first create a string that describes the request we'll be allowing access and specifies the expiration time. We then apply the HMAC-SHA1 hashing function to the string, then base64 and URL encode the result.

AWS documentation includes more details on the process, including the specific format of the string we need to generate. In short, this string will resemble the following:

GET





1264516089

    
/files/movies/cranberry_instructional.mp4

Note that the expiration time is required both in our request string and in the final, authenticated URL.

An example method of generating the URL-encoded signature in Python is as follows:

import hmac, hashlib
import base64
import urllib
s = """GET





1264516089

/files/movies/cranberry_instructional.mp4""" 
digest = hmac.new(s, digestmod=hashlib.sha1).hexdigest() 
digest64 = base64.b64encode(digest) 
signature = urllib.quote(digest64)

Now that we have the signature, constructing the authenticated URL is simple. For this example, our authenticated request URL will look like this:

/files/movies/cranberry_instruction.mp4?AWSAccessKey=0PN32DSASDX33&Expires=1264516089&Signature=M2IwNWU4MDYzOGVmOTIzZWNhMTNjZDA5OGJmYmU4YWQ2N2Q3OTU1Yg%3D%3D

We can provide this URL to the customer after they purchase our instructional video in an HTML template or e-mail, and they will have temporary access to download the content in their browser. Currently there is no easy way to determine when a third party has successfully completed an authenticated download. This could be managed at the application level, however, by providing the user with a sufficient amount of time before their URL expires and optionally allowing them to regenerate an expired request for content they've purchased.

About Amazon AWS services requests

Communication with most Amazon web services can be handled using two different methods: REST and Query or SOAP. The REST and Query approach has been used throughout this book. It relies exclusively on standard HTTP functionality to pass data parameters to AWS functions. This involves constructing and signing a query string like that used in the previous section.

Amazon's AWS documentation guides use the following pseudo grammar to explain how to construct this string:

StringToSign = HTTPVerb + "
" +
ValueOfHostHeaderInLowercase + "
" +
HTTPRequestURI + "
" +         
CanonicalizedQueryString 

HTTPVerb is one of the usual HTTP methods like GET and POST. The host header value is simply the web services hostname we will be submitting requests to (that is. fps.sandbox.amazonaws.com or fps.amazonaws.com), and the URI is the path portion of the HTTP request. Often this is just /.

The CanonicalizedQueryString phrase is the set of HTTP parameters, usually a GET string, sorted, and excluding the signature parameter (which we will add before making the request). These parameters are passed as normal to the web services URL when we make the request. A partial query string looks like this:

version=2009-01-09&SignatureMethod=HmacSHA256&callerReferenceSender=jdl_123123&FundingAmount=25.0&pipelineName=SetupPrepaid

SOAP interfaces use an entirely different approach and have not been implemented for this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset