Scrapy — An easy way to get data from web page

ydcat
2 min readApr 24, 2022

The scrapy is a python library for quickly build web spiders. It provides many functions, makes new comes feel a little confusing. Here is a simple usage introduction.

Installation

Recommend install scrapy using pipenv.

pip install --user pipenv # if you have installed pip
pipenv install scrapy
pipenv shell # activate the env

Basic Usage

You could use scrapy command to start project.

scrapy startproject [Your project Name]

If already create the project directory, you could start the project by a little adjustment.

scrapy startproject [Your project Name] .

Then your project directory will look like this:

-project root directory
- scrapy.cfg
- project/
__init__.py
items.py
middlewares.py
pipelines.py
settings.py
- spiders/ # a directory where you'll later put your spiders
__init__.py

You probably feel puzzled there. But you could start from the spiders directory first. When necessary using other function scrapy have.

Just create a *.py file in the spiders directory like below:

import scrapy

class SomeSpider(scrapy.Spider):
name = "xx" # should be unique within the project
start_urls = [] # put at least one url

def parser(self, response):
# use css or xpath selector to select elements
for item in response.css/xpath(some selectors):
# use yield to output all the data rows
yield {key: value}

# select next page if have
next_page = response.css(some selectors)
if next is not None:
# request next page
yield response.follow(next_page, callback=self.parse)

As we can see, the scrapy library almost conclude a formula for web scraping.

It come with lots of tools make our life easier: the powerful css and xpath selectors, the amazing scrapy shell. The combine of the two makes web crawl much more joyful. Just type following command and you could play with the http response.

scrapy shell [url]

Conclusion

It is a library I feel too late to know. Recommend everyone give it a try.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

ydcat
ydcat

Written by ydcat

0 Followers

Sharing thoughts about tech & idea.

No responses yet

Write a response