Scrapy: how to select elements

ydcat
2 min readMay 15, 2022
Photo by Sigmund on Unsplash

Recently, I decide to contribute to open source software as a starter programmer. As the official site said1:

Blog about Scrapy. Tell the world how you’re using Scrapy. This will help newcomers with more examples and will help the Scrapy project to increase its visibility.

Therefore, I decide to write another post about the great python library.

selector

The Scrapy provides two methods to locale data: XPath and CSS expressions. XPath support by many library out of box and is the de facto standard in extract data from HTML. CSS expressions can even selecting pseudo-elements which is Scrapy-/Parsel-specific.2

The selector can be used when the response is given.

response.selector.xpath("…").get() // return single element
response.selector.xpath("…").getall() // return a list
// or css selector
response.selector.css("…").get() // or getall()

XPath

XPath is something like //div/p. At first glance, it may scares you. But it just paths of HTML DOM. Given HTML text as follow:

<html>
<body>
<div>
<ul>
<li class="item">Dog</li>
<li class="item">Cat</li>
</ul>
</div>
</body>
</html>

The XPath of li tag of Dog is simply //html/body/div/ui/li[1]. It like the path in the operate system. You could use relative path as well: //ui/li[1].

You could get the text of the element by:

response.selector.xpath("//ui/li[1]/text()").get()

CSS expressions

The CSS selector can be handy when the HTML source name the CSS class with rules.

To get the text of the element, first select the class. Scrapy implements pseudo-elements which make select elements more easily.
// select text, ::text response.selector.css("item::text").get() // select attributes, ::attr(name)

Copy the selector of element

An easy way to obtain the XPath and CSS expression is through the browser. Chrome and Firefox are provide convenient function to let you do this.

Press F12, on the element, right click, then on the copy menu, select XPath or CSS Path. You get absolute path by this way. Chrome even have the function of select relative path.

Final words

I was so amazed by the functions provide by Scrapy. The authors of the package truly know what is essential for web scrawler. Its official documentation also easy to understand. Highly recommend you read it.

How do you use the Scrapy? Very welcome to share in the comments.

--

--

ydcat

Indie developer. Creator of iOS App — Inspire Board. Coding in Python, Swift and JavaScript. Share thoughts and innovation. Attempting to learn and write more.