Sometimes data is made available via a web service over HTTP. The advantage is that we don't have to care that much about the technologies that the sending party is using. This is comparable to the way e-mail, for instance, works. However, we have to explicitly request information via an HTTP GET (often) or HTTP POST (uppercase by convention) method. Whenever we request a web page or download a file, we usually perform a GET request. The web server on the other side has to process the request. If there are many requests, we can potentially slow down the server, so organizations often take measures to prevent this. It may mean that further requests from you will be blocked.
Avoiding issuing the same request multiple times is advantageous for efficiency reasons too. Web browsers solve this with a cache, and we can do the same with the requests-cache
package. The cache is stored in a SQLite database by default.
A common use case that we will not cover is that of periodically retrieving information with HTTP. Obviously, we don't want to retrieve content if nothing has changed. The HTTP protocol provides efficient mechanisms to determine whether content was modified. A web server, however, is not required to report content changes.
Install requests-cache with the following command:
$ pip install --upgrade requests-cache
I tested the code with requests-cache 0.4.10.
import requests import requests_cache
requests_cache.install_cache()
%time requests.get('http://google.com')
%time requests.get('http://google.com')
requests_cache.clear()
%time requests.get('http://google.com')
Refer to the following screenshot for the end result:
The code is in the caching_requests.ipynb
file in this book's code bundle.