Consuming web services in Python with urllib

In this section, we will learn how to use urllib and how we can build HTTP clients with this module.

The urllib module allows access to any resource published on the network (web page, files, directories, images, and so on) through various protocols (HTTP, FTP, SFTP). To start consuming a web service, we have to import the following libraries:

#! /usr/bin/env python3
import urllib.request
import urllib.parse

There are four functions in urllib:

  • request: Opens and reads the request's URL
  • error: Contains the errors generated by the request
  • parse: A tool to convert the URL
  • robotparse: Converts the robots.txt files

The urllib.request module allows access to a resource published on the internet through its address. If we go to the documentation of the Python 3 module (https://docs.python.org/3/library/urllib.request.html#module-urllib.request), we will see all the functions that have this class. The main one is urlopen, which works in the following way.

A urlopen function is used to create an object similar to a file, with which to read from the URL. This object has methods such as read, readline, readlines, and close, which work exactly the same as in the file objects, although in reality, we are working with wrapper's methods that abstract us from using sockets at a low level.

The urlopen function has an optional data parameter with which to send information to HTTP addresses using the POST method, where parameters are sent in the request itself; for example, to respond to a form. This parameter is a properly encoded string:

urllib.request.urlopen (url, data = None, [timeout,] *, cafile = None, capath = None, cadefault = False, context = None)

Retrieving the contents of a URL is a straightforward process when done using urllib. You can open the Python interpreter and execute the following instructions:

>>> from urllib.request import urlopen
>>> response = urlopen('http://www.packtpub.com')
>>> response
<http.client.HTTPResponse object at 0x7fa3c53059b0>
>>> response.readline()

We use the urllib.request.urlopen() function to send a request and receive a response for the resource at http://www.packtpub.com, in this case an HTML page. We will then print out the first line of the HTML we receive, with the readline() method from the response object.

This function also supports specifying a timeout for the request that represents the waiting time in the request; that is, if the page takes more than what we indicated, it will result in an error:

>>> print(urllib.request.urlopen(“http://packtpub.com”,timeout=30))

We can see from the preceding example that urlopen() returns an http.client.HTTPResponse instance. The response object gives us access to the data of the requested resource and the properties and the metadata of the response:

<http.client.HTTPResponse object at 0x03C4DC90>

If we get a response in JSON format, we can use the following Python json module:

>>> import json
>>> response = urllib.request.urlopen(url,timeout=30)
>>> json_response = json.loads(response.read())

In the variable response, we save the file that launches the request, and we use the read() function to read the content. Then we transform it into JSON format.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset