Create your first python scraper with Scrapy

What is Scrapy

Scrapy is an open-source Python application framework designed for creating web scraping programs.

It became the de-facto standard for web scraping in Python for its capability to handle options peculiar to web scraping, like the adherence to the robots.txt file, and the throttling of the requests or changes in their User Agent.

Some of the key features are:

  • support for extracting data with XPATH or CSS selectors
  • an interactive shell console to test your scraper live
  • exporters in common formats like CSV, JSON, and more
  • a telnet console for debugging purposes

Installation

In one of the next posts, we'll see how to create a complete environment for web scraping with all the software needed for headful browsers and headless. For the purpose of this post, we can use the official Scrapy guide for its installation.

Continue reading