Did you know it’s easy to make use of some truly excellent Python libraries to super-charge your Delphi app development on Windows? Adding Python to your toolbox of Delphi code can enhance your app development, bringing in new capabilities which help you provide innovative and powerful solutions to your app’s users, which combine the best of Python with the supreme low-code and unparalleled power of native Windows development you get with Delphi.
Are you looking for how to build scalable website scrapers to automate your data collection process? You can build a scalable web scraper easily by creating a Delphi GUI for the Scrapy library. This post will show you how to build a Delphi GUI app, dedicated to the Scrapy library.
Watch this video by Jim McKeeth, as a comprehensive introduction about why you can love both Delphi and Python at the same time:
Table of Contents
What is the Scrapy Library?
Scrapy is a fast high-level web crawling and web scraping framework used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
Scrapy is maintained by Zyte (formerly Scrapinghub) and many other contributors.
What are the basic concepts I need to know to get started with Scrapy?
Command-line tool
Learn about the command-line tool used to manage your Scrapy project.
Spiders
Write the rules to crawl your websites.
Selectors
Extract the data from web pages using XPath.
Scrapy shell
Test your extraction code in an interactive environment.
Items
Define the data you want to scrape.
Item Loaders
Populate your items with the extracted data.
Item Pipeline
Post-process and store your scraped data.
Feed exports
Output your scraped data using different formats and storage.
Requests and Responses
Understand the classes used to represent HTTP requests and responses.
Link Extractors
Convenient classes to extract links to follow from pages.
Settings
Learn how to configure Scrapy and see all available settings.
Exceptions
See all available exceptions and their meaning.
How do I get Scrapy Library?
You can easily install scrapy
with pip
, Python’s installer app:
Or, you can choose conda install
, to avoid complexities, as Anaconda is known as the most trusted distribution for Data Science related tasks:
How do I build a Delphi GUI for Scrapy Library?
The following is the user interface structure for our project:
Here is the list of Components used in the Scrapy4D
demo app:
TPythonEngine
TPythonModule
TPythonType
TPythonVersions
TPythonGUIInputOutput
TMemo
TOpenDialog
TSaveDialog
TSplitter
TPanel
TLabel
TComboBox
TButton
You don’t need the “scrapy startproject myProject
”, as you usually do when you use scrapy
with Python, instead, the default example code is already embedded in this GUI app. You just need to click the “Execute
” button to get started.
Navigate to the UnitScrapy4D.pas
, and add the following line to the FormCreate
, to load our basic scrapyApp.py
:
And make sure that the scrapyApp.py
is in the same directory as our Scrapy4D.exe
or inside your Delphi project folder.
You can change the “scrapyApp.py
” with any scrapy
script you want, or you can load your scrapy
scripts at runtime, by clicking the “Load script…
” like we will show you in the next Demo Sections.
How to scrape Web pages using scrapy
on the Delphi app?
Run your first web scraping project
Highly recommended practice:
1. Use Anaconda Python instead of your usual Python. Using Scrapy with Anaconda, would save your time and save you from strange errors that would cause headaches.
2. This GUI was created by modifying Python4Delphi
Demo34
, which makes us possibly change the Python version in the runtime (this will save you from the seemingly complicated dll issues).
3. Set up these paths to your Environment Variable
, for Anaconda Python:
4. Creating a GUI for scrapy
means that you will use scrapy
outside the “normal” command-line process. You need to add these lines to all your Python code:
5. Set MaskFPUExceptions(True);
to the UnitScrapy4D.pas
file, to avoid the Delphi raises an exception when floating operations result produce +/- infinity (e.g. division by zero
) that caused by incompatible with a number of Python libraries such as NumPy, SciPy, pandas, and scrapy
.
One of the best parts of this Scrapy4D
Demo GUI is you can choose the Python version you prefer, and it can be interchangeable.
For the Python version, choose Conda
, and most of the strange and seemingly complicated errors would disappear.
Next, click the “Execute
” button to run the very basic example to crawl the example.com
(the Python code is already embedded inside the UnitScrapy4D.dfm
file). Here is the output:
Scraping quotes from quotes.toscrape.com
:
The next example is, we want to scrape quotes from quotes.toscrape.com
.
Load the demo02_quotesScraper.py
at runtime by clicking the “Load script…
” button, and then “Execute
”. Here is the output:
We successfully scrape the quotes!
More advanced example: Scraping Wikipedia
Next, let’s try a more advanced example: We want to collect multiple titles
and URLs
from multiple Wikipedia pages at once!
Load the demo03_wikiScraper.py
at runtime by clicking the “Load script…
” button, and then “Execute
”. Here is the output:
Interesting, right?
Advanced example: Scraping Google Scholar
If the above examples still do not impress you, nor your boss or colleague, let’s try this advanced example: Scraping Google Scholar search results, for “machine learning
” query search!
The original code for this example is credited to @geekan (https://github.com/geekan/), and all the contributors of Reference [1]. I modified the code to update some obsolete Python lines, and to make it suitable to run inside of Python4Delphi
GUI instead of regular command-line operations.
With several improvements, like setting up the right PATH for Windows, renewing some Python 2 code to Python 3, renewing some obsolete lines of code (like replacing the exception handling with the new convention), installing all required libraries & dependencies, and setting the code to send the output to the Delphi GUI instead of command prompt, and finally load the ../googlescholar/googlescholar/spiders/spider.py
file, you will get this output:
Pretty impressive right?
You can check the final version of this example code here, and the full JSON
output here. And visit my repo (Reference [3]) for complete source code.
Are you ready to try these Scrapy for Delphi demos?
Congratulations, now you have learned a lot about scrapy
: A powerful web crawling and web scraping library, and how you can use Delphi to create a simple yet powerful GUI for it! We have learned how to scrape the very basic example.com
, quotes.toscrape.com
, multiple Wikipedia pages at once, and scrape all the “machine learning
” papers from Google Scholar! Now you can explore it to boost your productivity in creating your own web data collection automation.
If you are looking for other powerful web scraping libraries, please read this article:
Download a free trial of RAD Studio Delphi today and try out these examples for yourself.
References & Further Readings
[1] geekan. (2018). scrapy-examples. GitHub repo. github.com/geekan/scrapy-examples
[2] Hakim, M. A. (2020). Web Scraping Menggunakan Scrapy. hkaLabs. hkalabs.com/blog/web-scraping-4-web-scraping-menggunakan-scrapy
[3] Hakim, M. A. (2022). Article17 – Scrapy4D Demo. embarcaderoBlog-repo GitHub. github.com/MuhammadAzizulHakim/embarcaderoBlog-repo/tree/main/Article17%20-%20Scrapy4D%20Demo
[4] Mitchell, R. (2018). Web scraping with Python: Collecting more data from the modern web. ” O’Reilly Media, Inc.”.
[5] pyscripter. (2023). Python4Delphi-Demo34. GitHub repo. github.com/pyscripter/python4delphi/tree/master/Demos/Demo34
[6] Scrapy developers. (2008-2023). Scrapy 2.9 documentation. docs.scrapy.org/en/latest/index.html