What is Scrapy?

Scrapy (https://scrapy.org/) is an open source collaborative platform that allows us to extract data from web pages used for a series of applications such as data mining, information processing, and historical registration.

This framework also allows us to expand its functionality and is portable because it is written in Python, which can be interpreted on Linux, Macintosh, and Windows systems.

Although the main objective of Scrapy is the extraction of data from web pages, this can also be used to extract data through the use of APIs, obtain the structure of the web, or simply as a general purpose extractor. Scrapy has the following features:

  • Fast and powerful: You write the rules to extract the data and Scrapy does the work for us
  • Easily extensible: Given its configuration, it can generate new functionality without having to modify the source code
  • Portable: It is written in Python and can run on Linux, Windows, Mac, and BSD

Since it is a framework, Scrapy has a series of powerful tools to scrape or extract information from websites easily and efficiently. These tools include the following:

  • Support to extract and select data from HTML/XML sources using CSS selectors and XPath expressions, with help methods to extract using regular expressions
  • An interactive console in IPython to test CSS and XPath expressions to extract data, which is very useful when building your own methods
  • Support for exporting records in multiple formats such as JSON, CSV, and XML
  • Support for handling foreign statements, non-standards, and broken codes
  • Strong extensibility, since it allows you to connect your own functionality using signals, extensions, and pipelines

To get started in Scrapy, we recommend installing Scrapy as shown in this Installation Guidehttps://doc.scrapy.org/en/latest/intro/install.html#intro-install.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset