Scrapy — An easy way to get data from web page

ydcat
2 min readApr 24, 2022

The scrapy is a python library for quickly build web spiders. It provides many functions, makes new comes feel a little confusing. Here is a simple usage introduction.

Installation

Recommend install scrapy using pipenv.

pip install --user pipenv # if you have installed pip
pipenv install scrapy
pipenv shell # activate the env

Basic Usage

You could use scrapy command to start project.

scrapy startproject [Your project Name]

If already create the project directory, you could start the project by a little adjustment.

scrapy startproject [Your project Name] .

Then your project directory will look like this:

-project root directory
- scrapy.cfg
- project/
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
- spiders/ # a directory where you'll later put your spiders
__init__.py

You probably feel puzzled there. But you could start from the spiders directory first. When necessary using other function scrapy have.

Just create a *.py file in the spiders directory like below:

import scrapy

class SomeSpider(scrapy.Spider):
name = "xx" # should be unique within the project
start_urls = [] # put at least one url

def parser(self, response):
# use css or xpath selector to select elements
for item in response.css/xpath(some selectors):
# use yield to output all the data rows
yield {key: value}

# select next page if have
next_page = response.css(some selectors)
if next is not None:
# request next page
yield response.follow(next_page, callback=self.parse)

As we can see, the scrapy library almost conclude a formula for web scraping.

It come with lots of tools make our life easier: the powerful css and xpath selectors, the amazing scrapy shell. The combine of the two makes web crawl much more joyful. Just type following command and you could play with the http response.

scrapy shell [url]

Conclusion

It is a library I feel too late to know. Recommend everyone give it a try.

--

--

ydcat

Indie developer. Creator of iOS App — Inspire Board. Coding in Python, Swift and JavaScript. Share thoughts and innovation. Attempting to learn and write more.