Getting images from a URL with urllib

In this example, we can see how to extract images using urllib and regular expressions. The easy way to extract images from a URL is to use the re module to find img elements in the target URL.

You can find the following code in the extract_images_urllib.py file:

#!/usr/bin/env python3

from urllib.request import urlopen, urljoin
import re

def download_page(url):
return urlopen(url).read().decode('utf-8')

def extract_image_locations(page):
img_regex = re.compile('<img[^>]+src=["'](.*?)["']',
re.IGNORECASE)
return img_regex.findall(page)

if __name__ == '__main__':
target_url = 'http://www.packtpub.com'
packtpub = download_page(target_url)
image_locations = extract_image_locations(packtpub)
for src in image_locations:
print(urljoin(target_url, src))

In this screenshot, we can see the script execution for the packtpub.com domain:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset