The urllib package

Like http, urllib is also a package that has various modules for working with URLs. The urllib module allows you to access several websites via your script. We can also download data, parse data, modify headers, and more using this module.

urllib has a few different modules, which are listed here:

  • urllib.request: This is used for opening and reading URLs.
  • urllib.error: This contains exceptions raised by urllib.request.
  • urllib.parse: This is used for parsing URLs.
  • urllib.robotparser: This is used for parsing robots.txt files.

In this section, we are going to learn about opening a URL using urllib and how to read html files from the URL. We are going to see a simple example of the use of urllib. We will import urllib.requests. Then we assign the opening of the URL to a variable, then we will use a .read() command to read the data from the URL. 

Create a url_requests_example.py script and write the following content in it:

import urllib.request

x = urllib.request.urlopen('https://www.imdb.com/')
print(x.read())

Run the script as follows:

student@ubuntu:~/work$ python3 url_requests_example.py

Here is the output:

b'

<!DOCTYPE html>
<html
    xmlns:og="http://ogp.me/ns#"
    xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
         
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">

    
    
    

    
    
    

    <meta name="apple-itunes-app" content="app-id=342792525, app-argument=imdb:///?src=mdot">



        <script type="text/javascript">var IMDbTimer={starttime: new Date().getTime(),pt:'java'};</script>

<script>
    if (typeof uet == 'function') {
      uet("bb", "LoadTitle", {wb: 1});
    }
</script>
  <script>(function(t){ (t.events = t.events || {})["csm_head_pre_title"] = new Date().getTime(); })(IMDbTimer);</script>
        <title>IMDb - Movies, TV and Celebrities - IMDb</title>
  <script>(function(t){ (t.events = t.events || {})["csm_head_post_title"] = new Date().getTime(); })(IMDbTimer);</script>
<script>
    if (typeof uet == 'function') {
      uet("be", "LoadTitle", {wb: 1});
    }
</script>
<script>
    if (typeof uex == 'function') {
      uex("ld", "LoadTitle", {wb: 1});
    }
</script>

        <link rel="canonical" href="https://www.imdb.com/" />
        <meta property="og:url" content="http://www.imdb.com/" />
        <link rel="alternate" media="only screen and (max-width: 640px)" href="https://m.imdb.com/">

<script>
    if (typeof uet == 'function') {
      uet("bb", "LoadIcons", {wb: 1});
    }
</script>
  <script>(function(t){ (t.events = t.events || {})["csm_head_pre_icon"] = new Date().getTime(); })(IMDbTimer);</script>
        <link href="https://m.media-amazon.com/images/G/01/imdb/images/safari-favicon-517611381._CB483525257_.svg" mask rel="icon" sizes="any">
        <link rel="icon" type="image/ico" href="https://m.media-amazon.com/images/G/01/imdb/images/favicon-2165806970._CB470047330_.ico" />
        <meta name="theme-color" content="#000000" />
        <link rel="shortcut icon" type="image/x-icon" href="https://m.media-amazon.com/images/G/01/imdb/images/desktop-favicon-2165806970._CB484110913_.ico" />
        <link href="https://m.media-amazon.com/images/G/01/imdb/images/mobile/apple-touch-icon-web-4151659188._CB483525313_.png" rel="apple-touch-icon"> 

In the preceding example, we used the read() method, which returns the byte array. This prints the HTML data returned by the Imdb home page in a non-human-readable format, but we can use the HTML parser to extract some useful information from it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset