Overview of Scraping

The following table summarizes the advantages and disadvantages of each approach to scraping:

Scraping approach Performance Ease of use Ease to install
Regular expressions Fast Hard Easy (built-in module)
Beautiful Soup Slow Easy Easy (pure Python)
Lxml Fast Easy Moderately difficult

If speed is not an issue to you and you prefer to only install libraries via pip, it would not be a problem to use a slower approach, such as Beautiful Soup. Or, if you just need to scrape a small amount of data and want to avoid additional dependencies, regular expressions might be an appropriate choice. However, in general, lxml is the best choice for scraping, because it is fast and robust, while regular expressions and Beautiful Soup are not as speedy or as easy to modify.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset