Starting a project

Now that Scrapy is installed, we can run the startproject command to generate the default structure for our first Scrapy project.

To do this, open the terminal and navigate to the directory where you want to store your Scrapy project, and then run scrapy startproject <project name>. Here, we will use example for the project name:

$ scrapy startproject example
$ cd example

Here are the files generated by the scrapy command:

    scrapy.cfg 
example/
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
spiders/
__init__.py

The important files for this chapter (and in general for Scrapy use) are as follows:

  • items.py: This file defines a model of the fields that will be scraped
  • settings.py: This file defines settings, such as the user agent and crawl delay
  • spiders/: The actual scraping and crawling code are stored in this directory

Additionally, Scrapy uses scrapy.cfg for project configuration, pipelines.py to process the scraped fields and middlewares.py to control request and response middleware, but they will not need to be modified for this example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset