What is Scrapy?

Scrapy (https://scrapy.org/) is an open source collaborative platform that allows us to extract data from web pages used for a series of applications such as data mining, information processing, and historical registration.

This framework also allows us to expand its functionality and is portable because it is written in Python, which can be interpreted on Linux, Macintosh, and Windows systems.

Although the main objective of Scrapy is the extraction of data from web pages, this can also be used to extract data through the use of APIs, obtain the structure of the web, or simply as a general purpose extractor. Scrapy has the following features:

Fast and powerful: You write the rules to extract the data and Scrapy does the work for us
Easily extensible: Given its configuration, it can generate new functionality without having to modify the source code
Portable: It is written in Python and can run on Linux, Windows, Mac, and BSD

Since it is a framework, Scrapy has a series of powerful tools to scrape or extract information from websites easily and efficiently. These tools include the following:

Support to extract and select data from HTML/XML sources using CSS selectors and XPath expressions, with help methods to extract using regular expressions
An interactive console in IPython to test CSS and XPath expressions to extract data, which is very useful when building your own methods
Support for exporting records in multiple formats such as JSON, CSV, and XML
Support for handling foreign statements, non-standards, and broken codes
Strong extensibility, since it allows you to connect your own functionality using signals, extensions, and pipelines

To get started in Scrapy, we recommend installing Scrapy as shown in this Installation Guide: https://doc.scrapy.org/en/latest/intro/install.html#intro-install.

Table of Contents for What is Scrapy?

Create new playlist

Sign In

Sign Up

Table of Contents for
What is Scrapy?