Downloading files with aiohttp

First, we must import the modules we need to make our HTTP requests asynchronous. All asynchronous functions will need to have the async keyword in the function definition.

We will start by defining our download_file function, which will take two parameters: the first parameter is the URL for downloading the image, and the second parameter is called parts, which is the number of parallel requests we want to make to the server.

To make our asynchronous program faster, this is how our script is going to work:

  1. We are going to make a head request to the file URL with the aiohttp.ClientSession().head(url) method.
  2. We are going to get the value of the Content-Length header for getting the file size with the size = int(resp.headers["Content-Length"]) instruction.
  3. With the get_partial_content method, we are sending multiple GET requests to the file URL using the range header to specify the range of bytes that we want.
  4. We are going assimilate all the responses using the final_result = sorted(task.result() for task in response) instruction.

You can find the following code in the download_file_aiohttp.py file:

#!/usr/bin/python3

import asyncio
import itertools
import aiohttp
import time
import os

async def download_file(url, parts):
async def get_partial_content(u, i, start, end):
async with aiohttp.ClientSession().get(u, headers={"Range": "bytes={}-{}".format(start, end - 1 if end else "")}) as _resp:
return i, await _resp.read()

async with aiohttp.ClientSession().head(url) as resp:
size = int(resp.headers["Content-Length"])

ranges = list(range(0, size, size // parts))

response, _ = await asyncio.wait([get_partial_content(url, i, start, end) for i, (start, end) in enumerate(itertools.zip_longest(ranges, ranges[1:], fillvalue=""))])

final_result = sorted(task.result() for task in response)
return b"".join(data for _, data in final_result)

In the previous code block, we defined our download_file()  method, which accepts the url and the parts number that divides requests as parameters. In the following code block, we are defining our main function, which we will use to download a file in an asynchronous way. We are going to use the asyncio event loop and the run_until_complete() method:

if __name__ == '__main__':
file_url = 'https://docs.python.org/3/archives/python-3.7.2-docs-pdf-letter.zip'
loop = asyncio.get_event_loop()
t1 = time.time()
bs = loop.run_until_complete(download_file(file_url, 10))
filename = os.path.basename(file_url)
with open(filename, 'wb') as file_handle:
file_handle.write(bs)
print('Finished downloading {filename}'.format(filename=filename))
print(time.time() - t1, 'seconds passed')

This is the output we get when we execute the download_file_aiohttp.py script:

client_session: <aiohttp.client.ClientSession object at 0x000001FABBB42DD8>
Unclosed connector
connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x000001FABBCEED08>, 715247.328)]']
connector: <aiohttp.connector.TCPConnector object at 0x000001FABBCE6C88>
Finished downloading python-3.7.2-docs-pdf-letter.zip
2.9168717861175537 seconds passed

When you execute the script, you will see information about the objects that were created internally by aiohttp, among which we can highlight aiohttp.client.ClientSession for managing the client session, aiohttp.client_proto.ResponseHandler for managing the response, and aiohttp.connector.TCPConnector for managing the connection.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset