Another request header worth knowing about is the User-Agent
header. Any client that communicates using HTTP can be referred to as a user agent. RFC 7231 suggests that user agents should use the User-Agent
header to identify themselves in every request. What goes in there is up to the software that makes the request, though it usually comprises a string that identifies the program and version, and possibly the operating system and the hardware that it's running on. For example, the user agent for my current version of Firefox is shown here:
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140722 Firefox/24.0 Iceweasel/24.7.0
Although it has been broken over two lines here, it is a single long string. As you can probably decipher, I'm running Iceweasel (Debian's version of Firefox) version 24 on a 64-bit Linux system. User agent strings aren't intended for identifying individual users. They only identify the product that was used for making the request.
We can view the user agent that urllib
uses. Perform the following steps:
>>> req = Request('http://www.python.org') >>> urlopen(req) >>> req.get_header('User-agent') 'Python-urllib/3.4'
Here, we have created a request and submitted it using urlopen
, and urlopen
added the user agent header to the request. We can examine this header by using the get_header()
method. This header and its value are included in every request made by urllib
, so every server we make a request to can see that we are using Python 3.4 and the urllib
library.
Webmasters can inspect the user agents of requests and then use the information for various things, including the following:
The last two can cause problems for us because they can stop or interfere with us accessing the content that we're after. To work around this, we can try and set our user agent so that it mimics a well known browser. This is known as spoofing, as shown here:
>>> req = Request('http://www.debian.org') >>> req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140722 Firefox/24.0 Iceweasel/24.7.0') >>> response = urlopen(req)
The server will respond as if our application is a regular Firefox client. User agent strings for different browsers are available on the web. I'm yet to come across a comprehensive resource for them, but Googling for a browser and version number will usually turn something up. Alternatively you can use Wireshark to capture an HTTP request made by the browser you want to emulate and look at the captured request's user agent header.