Site icon Embarcadero RAD Studio, Delphi, & C++Builder Blogs

What Is The Best Web Scraping Library For Development On Windows?

scrapy blog banner

Did you know it’s easy to make use of some truly excellent Python libraries to super-charge your Delphi app development on Windows? Adding Python to your toolbox of Delphi code can enhance your app development, bringing in new capabilities which help you provide innovative and powerful solutions to your app’s users, which combine the best of Python with the supreme low-code and unparalleled power of native Windows development you get with Delphi.

Are you looking for how to build scalable website scrapers to automate your data collection process? You can build a scalable web scraper easily by creating a Delphi GUI for the Scrapy library. This post will show you how to build a Delphi GUI app, dedicated to the Scrapy library.

Watch this video by Jim McKeeth, as a comprehensive introduction about why you can love both Delphi and Python at the same time:

What is the Scrapy Library?

Scrapy is a fast high-level web crawling and web scraping framework used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Scrapy is maintained by Zyte (formerly Scrapinghub) and many other contributors.

What are the basic concepts I need to know to get started with Scrapy?

Command-line tool

Learn about the command-line tool used to manage your Scrapy project.

Spiders

Write the rules to crawl your websites.

Selectors

Extract the data from web pages using XPath.

Scrapy shell

Test your extraction code in an interactive environment.

Items

Define the data you want to scrape.

Item Loaders

Populate your items with the extracted data.

Item Pipeline

Post-process and store your scraped data.

Feed exports

Output your scraped data using different formats and storage.

Requests and Responses

Understand the classes used to represent HTTP requests and responses.

Convenient classes to extract links to follow from pages.

Settings

Learn how to configure Scrapy and see all available settings.

Exceptions

See all available exceptions and their meaning.

How do I get Scrapy Library?

You can easily install scrapy with pip, Python’s installer app:

[crayon-673faf8fc8f99532611545/]

Or, you can choose conda install, to avoid complexities, as Anaconda is known as the most trusted distribution for Data Science related tasks:

[crayon-673faf8fc8fa3716808551/]

How do I build a Delphi GUI for Scrapy Library?

The following is the user interface structure for our project:

User Interface Structure for our Scrapy4D Project

Here is the list of Components used in the Scrapy4D demo app:

  1. TPythonEngine
  2. TPythonModule
  3. TPythonType
  4. TPythonVersions
  5. TPythonGUIInputOutput
  6. TMemo
  7. TOpenDialog
  8. TSaveDialog
  9. TSplitter
  10. TPanel
  11. TLabel
  12. TComboBox
  13. TButton

You don’t need the “scrapy startproject myProject”, as you usually do when you use scrapy with Python, instead, the default example code is already embedded in this GUI app. You just need to click the “Execute” button to get started.

Navigate to the UnitScrapy4D.pas, and add the following line to the FormCreate, to load our basic scrapyApp.py:

[crayon-673faf8fc8fa5583087338/]
You can Seamlessly Load the Scrapy App Python File Inside the UnitScrapy4Dpas File

And make sure that the scrapyApp.py is in the same directory as our Scrapy4D.exe or inside your Delphi project folder. 

You can change the “scrapyApp.py” with any scrapy script you want, or you can load your scrapy scripts at runtime, by clicking the “Load script…” like we will show you in the next Demo Sections.

How to scrape Web pages using scrapy on the Delphi app?

Run your first web scraping project

Highly recommended practice: 

1. Use Anaconda Python instead of your usual Python. Using Scrapy with Anaconda, would save your time and save you from strange errors that would cause headaches.

2. This GUI was created by modifying Python4Delphi Demo34, which makes us possibly change the Python version in the runtime (this will save you from the seemingly complicated dll issues).

3. Set up these paths to your Environment Variable, for Anaconda Python:

[crayon-673faf8fc8fa7209191826/]

4. Creating a GUI for scrapy means that you will use scrapy outside the “normal” command-line process. You need to add these lines to all your Python code:

[crayon-673faf8fc8fa9171754191/]

5. Set MaskFPUExceptions(True); to the UnitScrapy4D.pas file, to avoid the Delphi raises an exception when floating operations result produce +/- infinity (e.g. division by zero) that caused by incompatible with a number of Python libraries such as NumPy, SciPy, pandas, and scrapy.

One of the best parts of this Scrapy4D Demo GUI is you can choose the Python version you prefer, and it can be interchangeable. 

This GUI Allows You to Choose any Python Version Installed on Your System And Run the Basic Example by Clicking the Execute Button

For the Python version, choose Conda, and most of the strange and seemingly complicated errors would disappear.

Next, click the “Execute” button to run the very basic example to crawl the example.com (the Python code is already embedded inside the UnitScrapy4D.dfm file). Here is the output:

Scrapy4D Most Basic Example

Scraping quotes from quotes.toscrape.com:  

The next example is, we want to scrape quotes from quotes.toscrape.com.

Load the demo02_quotesScraper.py at runtime by clicking the “Load script…” button, and then “Execute”. Here is the output:

Use Scrapy4D to Scrape Quotes

We successfully scrape the quotes!

More advanced example: Scraping Wikipedia

Next, let’s try a more advanced example: We want to collect multiple titles and URLs from multiple Wikipedia pages at once!

Load the demo03_wikiScraper.py at runtime by clicking the “Load script…” button, and then “Execute”. Here is the output:

Use Scrapy4D to Scrape Wikipedia

Interesting, right?

Advanced example: Scraping Google Scholar 

If the above examples still do not impress you, nor your boss or colleague, let’s try this advanced example: Scraping Google Scholar search results, for “machine learningquery search!

The original code for this example is credited to @geekan (https://github.com/geekan/), and all the contributors of  Reference [1]. I modified the code to update some obsolete Python lines, and to make it suitable to run inside of Python4Delphi GUI instead of regular command-line operations.

With several improvements, like setting up the right PATH for Windows, renewing some Python 2 code to Python 3, renewing some obsolete lines of code (like replacing the exception handling with the new convention), installing all required libraries & dependencies, and setting the code to send the output to the Delphi GUI instead of command prompt, and finally load the ../googlescholar/googlescholar/spiders/spider.py file, you will get this output: 

Use Scrapy4D to Scrape Google Scholar

Pretty impressive right?

You can check the final version of this example code here, and the full JSON output here. And visit my repo (Reference [3]) for complete source code.

Are you ready to try these Scrapy for Delphi demos?

Congratulations, now you have learned a lot about scrapy: A powerful web crawling and web scraping library, and how you can use Delphi to create a simple yet powerful GUI for it! We have learned how to scrape the very basic example.com, quotes.toscrape.com, multiple Wikipedia pages at once, and scrape all the “machine learning” papers from Google Scholar! Now you can explore it to boost your productivity in creating your own web data collection automation.

If you are looking for other powerful web scraping libraries, please read this article:

Download a free trial of RAD Studio Delphi today and try out these examples for yourself.

References & Further Readings

[1] geekan. (2018). scrapy-examples. GitHub repo. github.com/geekan/scrapy-examples

[2] Hakim, M. A. (2020). Web Scraping Menggunakan Scrapy. hkaLabs. hkalabs.com/blog/web-scraping-4-web-scraping-menggunakan-scrapy

[3] Hakim, M. A. (2022). Article17 – Scrapy4D Demo.  embarcaderoBlog-repo GitHub. github.com/MuhammadAzizulHakim/embarcaderoBlog-repo/tree/main/Article17%20-%20Scrapy4D%20Demo

[4] Mitchell, R. (2018). Web scraping with Python: Collecting more data from the modern web. ” O’Reilly Media, Inc.”.

[5] pyscripter. (2023). Python4Delphi-Demo34. GitHub repo. github.com/pyscripter/python4delphi/tree/master/Demos/Demo34

[6] Scrapy developers. (2008-2023). Scrapy 2.9 documentation. docs.scrapy.org/en/latest/index.html

Exit mobile version