Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Scraping with Beautiful Soup 4

Any publicly accessible HTTP can be pulled with a requests library. As you remember, if the resulting value is stored as a JSON, requests have a built-in parsing method. For HTML, it is different: parsing HTML is no simple task. It is much more complex than your ordinary JSON; HTML files are large and can be invalid (browsers will often still "fix" and render them).

In order to do so, we'll be using Beautiful Soup 4 (BS4), one of the two main libraries for parsing HTML, together with LXML. Beautiful Soup also knows how to parse HTML, and can even repair invalid files. Once the document has Pythonic representation, we can drill down and retrieve specific elements we're interested in by using a combination of element ID, class, CSS properties, their order, and so on using either CSS selectors or the XPath mini-language.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Scraping with Beautiful Soup 4

Create new playlist

Sign In

Sign Up

Table of Contents for
Scraping with Beautiful Soup 4