Downloading random pictures with asyncio

The code is probably the most challenging of the whole chapter, so don't feel bad if it is too much for you at this moment in time. I have added this example just as a mouthwatering device, to encourage you to dig deeper into the heart of Python asynchronous programming. Another thing worth knowing is that there are probably several other ways to write this same logic, so please bear in mind that this is just one of the possible examples.

The asyncio module provides infrastructure for writing single-threaded, concurrent code using coroutines, multiplexing IO access over sockets and other resources, running network clients and servers, and other related primitives. It was added to Python in version 3.4, and some claim it will become the de facto standard for writing Python code in the future. I don't know whether that's true, but I know it is definitely worth seeing an example:

# aio/randompix_corout.py
import os
from secrets import token_hex
import asyncio
import aiohttp

First of all, we cannot use requests any more, as it is not suitable for asyncio. We have to use aiohttp, so please make sure you have installed it (it's in the requirements for the book):

PICS_FOLDER = 'pics'
URL = 'http://lorempixel.com/640/480/'

async def download_image(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
return await resp.read()

The previous code does not look too friendly, but it's not so bad, once you know the concepts behind it. We define the async coroutine download_image, which takes a URL as parameter.

In case you don't know, a coroutine is a computer program component that generalizes subroutines for non-preemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations. A subroutine is a sequence of program instructions that performs a specific task, packaged as a unit.

Inside download_image, we create a session object using the ClientSession context manager, and then we get the response by using another context manager, this time from session.get. The fact that these managers are defined as asynchronous simply means that they are able to suspend execution in their enter and exit methods. We return the content of the response by using the await keyword, which allows suspension. Notice that creating a session for each request is not optimal, but I felt that for the purpose of this example I would keep the code as straightforward as possible, so I leave its optimization to you, as an exercise.

Let's proceed with the next snippet:

async def download(url, semaphore):
async with semaphore:
content = await download_image(url)
filename = save_image(content)
return filename

def save_image(content):
filename = '{}.jpg'.format(token_hex(4))
path = os.path.join(PICS_FOLDER, filename)
with open(path, 'wb') as stream:
stream.write(content)
return filename

Another coroutine, download, gets a URL and a semaphore. All it does is fetch the content of the image, by calling download_image, saving it, and returning the filename. The interesting bit here is the use of that semaphore. We use it as an asynchronous context manager, so that we can suspend this coroutine as well, and allow a switch to something else, but more than how, it is important to understand why we want to use a semaphore. The reason is simple, this semaphore is kind of the equivalent of a pool of threads. We use it to allow at most N coroutines to be active at the same time. We instantiate it in the next function, and we pass 10 as the initial value. Every time a coroutine acquires the semaphore, its internal counter is decreased by 1, therefore when 10 coroutines have acquired it, the next one will sit and wait, until the semaphore is released by a coroutine that has completed. This is a nice way to try to limit how aggressively we are fetching images from the website API.

The save_image function is not a coroutine, and its logic has already been discussed in the previous examples. Let's now get to the part of the code where execution takes place:

def batch_download(images, url):
loop = asyncio.get_event_loop()
semaphore = asyncio.Semaphore(10)
cors = [download(url, semaphore) for _ in range(images)]
res, _ = loop.run_until_complete(asyncio.wait(cors))
loop.close()
return [r.result() for r in res]

if __name__ == '__main__':
saved = batch_download(20, URL)
print(saved)

We define the batch_download function, which takes a number, images, and the URL of where to fetch them. The first thing it does is create an event loop, which is necessary to run any asynchronous code. The event loop is the central execution device provided by asyncio. It provides multiple facilities, including:

  • Registering, executing, and cancelling delayed calls (timeouts)
  • Creating client and server transports for various kinds of communication
  • Launching subprocesses and the associated transports for communication with an external program
  • Delegating costly function calls to a pool of threads

After the event loop is created, we instantiate the semaphore, and then we proceed to create a list of futures, cors. By calling loop.run_until_complete, we make sure the event loop will run until the whole task has been completed. We feed it the result of a call to asyncio.wait, which waits for the futures to complete.

When done, we close the event loop, and return a list of the results yielded by each future object (the filenames of the saved images). Notice how we capture the results of the call to loop.run_until_complete. We don't really care for the errors, so we assign _ to the second item in the tuple. This is a common Python idiom used when we want to signal that we're not interested in that object.

At the end of the module, we call batch_download and we get 20 images saved. They come in batches, and the whole process is limited by a semaphore with only 10 available spots.

And that's it! To learn more about asyncio, please refer to the documentation page (https://docs.python.org/3.7/library/asyncio.html) for the asyncio module on the standard library. This example was fun to code, and hopefully it will motivate you to study hard and understand the intricacies of this wonderful side of Python.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset