To make use of the functionality that headers provide, we add headers to a request before sending it. To do this, we can't just use urlopen()
. We need to follow these steps:
Request
objecturlopen()
to send the request objectWe're going to learn how to customize a request for retrieving a Swedish version of the Debian home page. We will use the Accept-Language
header, which tells the server our preferred language for the resource it returns. Note that not all servers hold versions of resources in multiple languages, so not all servers will respond to Accept-Language
Linux home page.
First, we create a Request
object:
>>> from urllib.request import Request >>> req = Request('http://www.debian.org')
Next we add the header:
>>> req.add_header('Accept-Language', 'sv')
The add_header()
method takes the name of the header and the contents of the header as arguments. The Accept-Language
header takes two-letter ISO 639-1 language codes. The code for Swedish is sv
.
Lastly, we submit the customized request with urlopen()
:
>>> response = urlopen(req)
We can check if the response is in Swedish by printing out the first few lines:
>>> response.readlines()[:5] [b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> ', b'<html lang="sv"> ', b'<head> ', b' <meta http-equiv="Content-Type" content="text/html; charset=utf- 8"> ', b' <title>Debian -- Det universella operativsystemet </title> ']
Jetta bra! The Accept-Language
header has informed the server about our preferred language for the response's content.
To view the headers present in a request, do the following:
>>> req = Request('http://www.debian.org') >>> req.add_header('Accept-Language', 'sv') >>> req.header_items() [('Accept-language', 'sv')]
The urlopen()
method adds some of its own headers when we run it on a request:
>>> response = urlopen(req) >>> req.header_items() [('Accept-language', 'sv'), ('User-agent': 'Python-urllib/3.4'), ('Host': 'www.debian.org')]
A shortcut for adding headers is to add them at the same time that we create the request object, as shown here:
>>> headers = {'Accept-Language': 'sv'} >>> req = Request('http://www.debian.org', headers=headers) >>> req.header_items() [('Accept-language', 'sv')]
We supply the headers as a dict
to the Request
object constructor as the headers
keyword argument. In this way, we can add multiple headers in one go, by adding more entries to the dict
.
Let's take a look at some more things that we can do with headers.
The
Accept-Encoding
request header and the Content-Encoding
response header can work together to allow us to temporarily encode the body of a response for transmission over the network. This is typically used for compressing the response and reducing the amount of data that needs to be transferred.
This process follows these steps:
Accept-Encoding
headerContent-Encoding
headerLet's discuss how to request a document and get the server to use gzip
compression for the response body. First, let's construct the request:
>>> req = Request('http://www.debian.org')
Next, add the Accept-Encoding
header:
>>> req.add_header('Accept-Encoding', 'gzip')
And then, submit it with the help of urlopen()
:
>>> response = urlopen(req)
We can check if the server is using gzip
compression by looking at the response's Content-Encoding
header:
>>> response.getheader('Content-Encoding') 'gzip'
We can then decompress the body data by using the gzip
module:
>>> import gzip >>> content = gzip.decompress(response.read()) >>> content.splitlines()[:5] [b'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">', b'<html lang="en">', b'<head>', b' <meta http-equiv="Content-Type" content="text/html; charset=utf-8">', b' <title>Debian -- The Universal Operating System </title>']
Encodings are registered with IANA. The current list contains: gzip
, compress
, deflate
, and identity
. The first three refer to specific compression methods. The last one allows the client to specify that it doesn't want any encoding applied to the content.
Let's see what happens if we ask for no compression by using the identity
encoding:
>>> req = Request('http://www.debian.org') >>> req.add_header('Accept-Encoding', 'identity') >>> response = urlopen(req) >>> print(response.getheader('Content-Encoding')) None
When a server uses the identity
encoding type, no Content-Encoding
header is included in the response.
To tell the server that we can accept more than one encoding, add more values to the Accept-Encoding
header and separate them by commas. Let's try it. We create our Request
object:
>>> req = Request('http://www.debian.org')
Then, we add our header, and this time we include more encodings:
>>> encodings = 'deflate, gzip, identity' >>> req.add_header('Accept-Encoding', encodings)
Now, we submit the request and then check the response encoding:
>>> response = urlopen(req) >>> response.getheader('Content-Encoding') 'gzip'
If needed, relative weightings can be given to specific encodings by adding a q
value:
>>> encodings = 'gzip, deflate;q=0.8, identity;q=0.0'
The q
value follows the encoding name, and it is separated by a semicolon. The maximum q
value is 1.0
, and this is also the default if no q
value is given. So, the preceding line should be interpreted as my first preference for encoding is gzip
, my second preference is deflate
, and my third preference is identity
, if nothing else is available.