Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Content negotiation

Content compression with the Accept-Encoding header and language selection with the Accept-Language header are examples of content negotiation, where the client specifies its preferences regarding the format and the content of the requested resource. The following headers can also be used for this:

Accept: For requesting a preferred file format
Accept-Charset: For requesting the resource in a preferred character set

There are additional aspects to the content negotiation mechanism, but because it's inconsistently supported and it can become quite involved, we won't be covering it in this chapter. RFC 7231 contain all the details that you need. Take a look at sections such as 3.4, 5.3, 6.4.1, and 6.5.6, if you find that your application requires this.

Content types

HTTP can be used as a transport for any type of file or data. The server can use the Content-Type header in a response to inform the client about the type of data that it has sent in the body. This is the primary means an HTTP client determines how it should handle the body data that the server returns to it.

To view the content type, we inspect the value of the response header, as shown here:

>>> response = urlopen('http://www.debian.org')
>>> response.getheader('Content-Type')
'text/html'

The values in this header are taken from a list which is maintained by IANA. These values are variously called content types, Internet media types, or MIME types (MIME stands for Multipurpose Internet Mail Extensions, the specification in which the convention was first established). The full list can be found at http://www.iana.org/assignments/media-types.

There are registered media types for many of the types of data that are transmitted across the Internet, some common ones are:

Media type	Description
text/html	HTML document
text/plain	Plain text document
image/jpeg	JPG image
application/pdf	PDF document
application/json	JSON data
application/xhtml+xml	XHTML document

Another media type of interest is application/octet-stream, which in practice is used for files that don't have an applicable media type. An example of this would be a pickled Python object. It is also used for files whose format is not known by the server. In order to handle responses with this media type correctly, we need to discover the format in some other way. Possible approaches are as follows:

Examine the filename extension of the downloaded resource, if it has one. The mimetypes module can then be used for determining the media type (go to Chapter 3, APIs in Action to see an example of this).
Download the data and then use a file type analysis tool. TheUse the Python standard library imghdr module can be used for images, and the third-party python-magic package, or the GNU file command, can be used for other types.
Check the website that we're downloading from to see if the file type has been documented anywhere.

Content type values can contain optional additional parameters that provide further information about the type. This is usually used to supply the character set that the data uses. For example:

Content-Type: text/html; charset=UTF-8.

In this case, we're being told that the character set of the document is UTF-8. The parameter is included after a semicolon, and it always takes the form of a key/value pair.

Let's discuss an example, downloading the Python home page and using the Content-Type value it returns. First, we submit our request:

>>> response = urlopen('http://www.python.org')

Then, we check the Content-Type value of our response, and extract the character set:

>>> format, params = response.getheader('Content-Type').split(';')
>>> params
' charset=utf-8'
>>> charset = params.split('=')[1]
>>> charset
'utf-8'

Lastly, we decode our response content by using the supplied character set:

>>> content = response.read().decode(charset)

Note that quite often, the server either doesn't supply a charset in the Content-Type header, or it supplies the wrong charset. So, this value should be taken as a suggestion. This is one of the reasons that we look at the Requests library later in this chapter. It will automatically gather all the hints that it can find about what character set should be used for decoding a response body and make a best guess for us.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Content negotiation

Create new playlist

Sign In

Sign Up

Content negotiation

Content types

Table of Contents for
Content negotiation