Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building…

Follow publication

Member-only story

Scraping reddit with Scrapy.

--

I’m going to walk through how to develop a basic spider using the Scrapy package. The purpose of this tutorial will be to introduce the basic concepts needed to understand and use the Scrapy package, and give you the tools to develop custom spiders for your data collection needs. Scrapy does have very good documentation, so I encourage you to look through the docs to dig a little deeper into what Scrapy has to offer.

Why use Scrapy?

If you’ve ever developed a web scraping script before, your well aware that there are many menial tasks that need to be coded from scratch in order to get a scraper up and running, especially if you want to do any large scale data collection. With Scrapy, a lot of these common scraping problems are dealt with by default and you can just worry about scraping the information you need from a source. Scrapy is also fast, and extendable. You can code custom logic based on the needs of your project. In order to demonstrate the basics of Scrapy, we are going to develop a spider. A spider is just a Scrapy class where you can declare how and what you want to scrape from a particular page(s).

To download Scrapy run the following command:

pip install scrapy

Making your Spider.

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Brendan Ferris
Brendan Ferris

Written by Brendan Ferris

Turning over rocks and seeing what crawls out.

No responses yet

Write a response