Natural Language Processing: 5 Ways To Use NLP In Your Windows Apps

Table of Contents

How do I start using Natural Language Processing in Windows?

Natural language processing (NLP) is a subfield of Linguistics, Computer Science, and Artificial Intelligence which concerned with the interactions between computers and human language, in particular, how to program computers to process and analyze large amounts of natural language data, or teaching machines how to understand human languages and extract meaning from text.

The common tasks in NLP include Text Mining, Text Classification, Text Analysis, Sentiment Analysis, Word Sequencing, Speech Recognition & Generation, Machine Translation, and Dialog Systems, to name a few.

Since NLP relies on advanced computational skills and tools, developers need the best available tools to help them to make the most of NLP approaches and algorithms for creating services that can handle natural languages.

We can build Windows apps with Natural Language Processing capabilities using Embarcadero’s Python4Delphi (P4D). P4D empowers Python users with Delphi’s award-winning VCL functionalities for Windows which enables us to build native Windows apps 5x faster. This integration enables us to create a modern GUI with Windows 10 looks and responsive controls for our Python Natural Language Processing applications.

Python4Delphi makes it very easy to use Python as a scripting language for Delphi applications. It also comes with an extensive range of demos and tutorials. With Python4Delphi, you can integrate any Python features, functionalities, and libraries with Delphi to create a nice GUI for your Natural Language Processing applications in Windows.

In this tutorial, we will discuss the following:

How to use these 5 Python libraries with different Natural Language Processing capabilities to perform Natural Language Processing in Windows Apps: NLTK, FlashText, Gensim, TextBlob, and spaCy.

All of them would be integrated with Python4Delphi to create Windows Apps with Natural Language Processing capabilities.

Prerequisites: Before we begin to work, download and install the latest Python for your platform. Follow the Python4Delphi installation instructions mentioned here. Alternatively, you can check out the easy instructions found in the Getting Started With Python4Delphi video by Jim McKeeth.

Time to get started!

First, open and run our Python GUI using project Demo01 from Python4Delphi with RAD Studio. Then insert the script into the lower Memo, click the Execute script button, and get the result in the upper Memo. You can find the Demo01 source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.

Using Python4Delphi for Natural Language Processing — Open Demo01dproj

1. How do I enable NLTK for NLP inside Python4Delphi in Windows?

NLTK is a leading platform for building Python programs to work with human language data. Natural Language Processing or NLP for short — in a wide sense, to cover any kind of computer manipulation of natural language. NLP is a field in Machine Learning with the ability of a computer to understand, analyze, manipulate, and potentially generate human language.

NLTK provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

Do you want to perform Natural Language Processing tasks like predicting text, analyzing & visualizing sentence structure, Sentiment Analysis, gender classification, etc. in the Windows GUI app? You can easily solve these tasks by combining the NLTK library with Python4Delphi (P4D).

This section will show you how to get started using NLTK combined with Python4Delphi!

Getting and installing NLTK

First, here is how you can get NLTK:

pip install nltk

1	pip install nltk

Practical work in Natural Language Processing typically uses large bodies of linguistic data or corpora. You can add the popular NLTK datasets to your system using this command:

python -m nltk.downloader popular

1	python -m nltk.downloader popular

and don’t forget to put the path where your NLTK installed, to the System Environment Variables, here are the example:

C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Lib/site-packages
C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Scripts
C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38

C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Lib/site-packages

C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Scripts

C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38

Using NLTK for Python Natural Language Processing

The following is a code example of NLTK to create a classifier app that could predict gender from the people’s name as input (run this inside the lower Memo of Python4Delphi Demo01 GUI):

# Importing libraries
import random
from nltk.corpus import names
import nltk

def gender_features(word):
    return {'last_letter':word[-1]}

# Preparing a list of examples and corresponding class labels.
labeled_names = ([(name, 'male') for name in names.words('male.txt')]+
                 [(name, 'female') for name in names.words('female.txt')])

random.shuffle(labeled_names)

# We use the feature extractor to process the names data.
featuresets = [(gender_features(n), gender)
                for (n, gender)in labeled_names]

# Divide the resulting list of feature sets into a training set and a test set.
train_set, test_set = featuresets[500:], featuresets[:500]

# The training set is used to train a new "naive Bayes" classifier.
classifier = nltk.NaiveBayesClassifier.train(train_set)

print(classifier.classify(gender_features('Sherlock')))

# Output should be 'male'
print(nltk.classify.accuracy(classifier, train_set))

# Show most informative features
classifier.show_most_informative_features(10)

# Importing libraries

import random

from nltk.corpus import names

import nltk

def gender_features(word):

return {'last_letter':word[-1]}

# Preparing a list of examples and corresponding class labels.

labeled_names = ([(name, 'male') for name in names.words('male.txt')]+

[(name, 'female') for name in names.words('female.txt')])

random.shuffle(labeled_names)

# We use the feature extractor to process the names data.

featuresets = [(gender_features(n), gender)

for (n, gender)in labeled_names]

# Divide the resulting list of feature sets into a training set and a test set.

train_set, test_set = featuresets[500:], featuresets[:500]

# The training set is used to train a new "naive Bayes" classifier.

classifier = nltk.NaiveBayesClassifier.train(train_set)

print(classifier.classify(gender_features('Sherlock')))

# Output should be 'male'

print(nltk.classify.accuracy(classifier, train_set))

# Show most informative features

classifier.show_most_informative_features(10)

Here is the result in the Python GUI

Natural Language Processing Demo with Python4Delphi in Windows — NLTK Demo with Python4Delphi in Windows

2. How do I use FlashText for NLP inside Python4Delphi on Windows?

FlashText is a module that can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm.

FlashText algorithm is an algorithm for replacing keywords or finding keywords in a given text. FlashText can search or replace keywords in one pass over a document.

Do you want to perform Natural Language Processing tasks like replacing or extracting words in a text, in the Windows GUI app? This section will show you how to get started!

Getting and installing FlashText

First, here is how you can get FlashText:

pip install flashtext

1	pip install flashtext

Using FlashText for Python Natural Language Processing

The following is an introductory example of FlashText to perform keyword extraction, replacing keywords, and case sensitive (run this inside the lower Memo of Python4Delphi Demo01 GUI):

# Extract keywords
from flashtext import KeywordProcessor

keyword_processor = KeywordProcessor()

## Keyword_processor.add_keyword(<unclean name>, <standardised name>)
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found)

# Replace keywords
keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence)

# Case Sensitive example
from flashtext import KeywordProcessor

keyword_processor = KeywordProcessor(case_sensitive=True)
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.')
print(keywords_found)

# Extract keywords

from flashtext import KeywordProcessor

keyword_processor = KeywordProcessor()

## Keyword_processor.add_keyword(<unclean name>, <standardised name>)

keyword_processor.add_keyword('Big Apple', 'New York')

keyword_processor.add_keyword('Bay Area')

keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')

print(keywords_found)

# Replace keywords

keyword_processor.add_keyword('New Delhi', 'NCR region')

new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')

print(new_sentence)

# Case Sensitive example

from flashtext import KeywordProcessor

keyword_processor = KeywordProcessor(case_sensitive=True)

keyword_processor.add_keyword('Big Apple', 'New York')

keyword_processor.add_keyword('Bay Area')

keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.')

print(keywords_found)

Here is the FlashText code in the Python GUI

FlashText Demo using Python4Delphi for Natural Language Processing on Windows — FlashText Demo with Python4Delphi in Windows

3. How do I enable Gensim for Natural Language Processing inside Python4Delphi on Windows?

Gensim is an open-source library for Unsupervised Topic Modeling and Natural Language Processing, using Modern Statistical Machine Learning. Gensim has been used and cited in over 1400 commercial and academic applications as of 2018, in a diverse array of disciplines from medicine to insurance claim analysis to patent search.

Design principles of Gensim:

Practicality – As industry experts, they focus on proven, battle-hardened algorithms to solve real industry problems. More focus on engineering, less on academia.
Memory independence – There is no need for the whole training corpus to reside fully in RAM at any one time. Can process large, web-scale corpora using data streaming.
Performance – Highly optimized implementations of popular vector space algorithms using C, BLAS and memory-mapping.

By now, Gensim is known to be the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modeling from plain text.

Getting and installing Gensim

This section will guide you to combine Python4Delphi with the Gensim library, inside Delphi and C++Builder, from installing Gensim with pip to perform similarity queries tasks.

First, here is how you can get Gensim:

pip install gensim

1	pip install gensim

Using Gensim for Python Natural Language Processing

The following is a code example of Gensim to perform similarity queries (run this inside the lower Memo of Python4Delphi Demo01 GUI):

import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

# Creating the Corpus
from collections import defaultdict
from gensim import corpora

documents = [
    "Human machine interface for lab abc computer applications",
    "A survey of user opinion of computer system response time",
    "The EPS user interface management system",
    "System and human system engineering testing of EPS",
    "Relation of user perceived response time to error measurement",
    "The generation of random binary unordered trees",
    "The intersection graph of paths in trees",
    "Graph minors IV Widths of trees and well quasi ordering",
    "Graph minors A survey",
]

# Remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() 
if word not in stoplist]  
  for document in documents ] # Remove words that appear only once frequency = defaultdict(int) 
for text in texts:     
  for token in text:
         frequency[token] += 1 texts = [[token for token in text if frequency[token] > 1]
     for text in texts ] dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(text)
for text in texts] # Similarity interface from gensim import models
  lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2) 
  doc = "Human computer interaction"
  vec_bow = dictionary.doc2bow(doc.lower().split()) 
  vec_lsi = lsi[vec_bow]
  # Convert the query to LSI space
 print(vec_lsi) 
 # We will be considering `cosine similarity <http://en.wikipedia.org/wiki/Cosine_similarity>`_ 
 # to determine the similarity of two vectors.
 # Initializing query structures from gensim import similarities
 index = similarities.MatrixSimilarity(lsi[corpus])
  # Transform corpus to LSI space and index it
 index.save('C:/Users/ASUS/deerwester.index')
 index = similarities.MatrixSimilarity.load('C:/Users/ASUS/deerwester.index')
 # Performing queries sims = index[vec_lsi]
 # Perform a similarity query against the corpus print(list(enumerate(sims)))
 # Print (document_number, document_similarity) 2-tuples
 # Cosine measure returns similarities in the range `<-1, 1>` (the greater, the more similar),
 # so that the first document has a score of 0.99809301 etc.
 sims = sorted(enumerate(sims), key=lambda item: -item[1])
 for doc_position, doc_score in sims:
     print(doc_score, documents[doc_position])

import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

# Creating the Corpus

from collections import defaultdict

from gensim import corpora

documents = [

"Human machine interface for lab abc computer applications",

"A survey of user opinion of computer system response time",

"The EPS user interface management system",

"System and human system engineering testing of EPS",

"Relation of user perceived response time to error measurement",

"The generation of random binary unordered trees",

"The intersection graph of paths in trees",

"Graph minors IV Widths of trees and well quasi ordering",

"Graph minors A survey",

]

# Remove common words and tokenize

stoplist = set('for a of the and to in'.split())

texts = [[word for word in document.lower().split()

if word not in stoplist]

for document in documents ] # Remove words that appear only once frequency = defaultdict(int)

for text in texts:

for token in text:

frequency[token] += 1 texts = [[token for token in text if frequency[token] > 1]

for text in texts ] dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(text)

for text in texts] # Similarity interface from gensim import models

lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2)

doc = "Human computer interaction"

vec_bow = dictionary.doc2bow(doc.lower().split())

vec_lsi = lsi[vec_bow]

# Convert the query to LSI space

print(vec_lsi)

# We will be considering `cosine similarity <http://en.wikipedia.org/wiki/Cosine_similarity>`_

# to determine the similarity of two vectors.

# Initializing query structures from gensim import similarities

index = similarities.MatrixSimilarity(lsi[corpus])

# Transform corpus to LSI space and index it

index.save('C:/Users/ASUS/deerwester.index')

index = similarities.MatrixSimilarity.load('C:/Users/ASUS/deerwester.index')

# Performing queries sims = index[vec_lsi]

# Perform a similarity query against the corpus print(list(enumerate(sims)))

# Print (document_number, document_similarity) 2-tuples

# Cosine measure returns similarities in the range `<-1, 1>` (the greater, the more similar),

# so that the first document has a score of 0.99809301 etc.

sims = sorted(enumerate(sims), key=lambda item: -item[1])

for doc_position, doc_score in sims:

print(doc_score, documents[doc_position])

Gensim Python4Delphi results

Using Gensim Demo with Python4Delphi in Windows for Natural Language Processing — Gensim Demo with Python4Delphi in Windows

4. How do I use TextBlob for NLP inside Python4Delphi in Windows?

TextBlob is a Python library for processing textual data. It provides a simple and consistent API for diving into common Natural Language Processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Here are TextBlob powerful features at a glance:

Noun phrase extraction
Part-of-speech tagging (POS tagging)
Sentiment analysis
Classification (Naive Bayes, Decision Tree)
Tokenization (splitting text into words and sentences)
Word and phrase frequencies
Parsing
n-grams
Word inflection (pluralization and singularization) and lemmatization
Spelling correction
Add new models or languages through extensions
WordNet integration

First, here is how you can get TextBlob

pip install textblob

1	pip install textblob

Using TextBlob for Python Natural Language Processing

The following is a code example of TextBlob to perform part-of-speech (POS) tagging, noun phrase extraction, and sentiment analysis (run this inside the lower Memo of Python4Delphi Demo01 GUI):

from textblob import TextBlob

text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

blob = TextBlob(text)
print(blob.tags)

print(blob.noun_phrases)

for sentence in blob.sentences:
    print(sentence.sentiment.polarity)

from textblob import TextBlob

text = '''

The titular threat of The Blob has always struck me as the ultimate movie

monster: an insatiably hungry, amoeba-like mass able to penetrate

virtually any safeguard, capable of--as a doomed doctor chillingly

describes it--"assimilating flesh on contact.

Snide comparisons to gelatin be damned, it's a concept with the most

devastating of potential consequences, not unlike the grey goo scenario

proposed by technological theorists fearful of

artificial intelligence run rampant.

'''

blob = TextBlob(text)

print(blob.tags)

print(blob.noun_phrases)

for sentence in blob.sentences:

print(sentence.sentiment.polarity)

TextBlob Natural Language Processing Result

TextBlob Demo with Python4Delphi in Windows to show how to use Python4Delphi for Natural Language Processing — TextBlob Demo with Python4Delphi in Windows

5. How do I enable spaCy for NLP inside Python4Delphi in Windows?

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. It’s designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems.

Here are spaCy powerful features overview:

Support for 64+ languages
55 trained pipelines for 17 languages
Multi-task learning with pre-trained transformers like BERT
Pretrained word vectors
State-of-the-art speed
Production-ready training system
Linguistically-motivated tokenization
Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking, and more
Easily extensible with custom components and attributes
Support for custom models in PyTorch, TensorFlow, and other frameworks
Built-in visualizers for syntax and NER
Easy model packaging, deployment, and workflow management
Robust, rigorously evaluated accuracy

Installing spaCy for Natural Language Processing

pip install -U spacy

1	pip install -U spacy

Download trained pipeline here:

python -m spacy download en_core_web_sm

1	python -m spacy download en_core_web_sm

A spaCy Python code example

The following is a code example of spaCy to analyze syntax, find named entities, phrases and concepts to any given documents (run this inside the lower Memo of Python4Delphi Demo01 GUI):

import spacy

# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")

# Process whole documents
text = ("Delphi supports rapid application development (RAD). Prominent features are a visual designer and two application frameworks, VCL for Windows and FireMonkey (FMX) for cross-platform development. Delphi uses the Pascal-based programming language Object Pascal created by Anders Hejlsberg for Borland (now IDERA) as the successor to Turbo Pascal. It supports native cross-compilation to many platforms including Windows, Linux, iOS and Android. To better support development for Microsoft Windows and interoperate with code developed with other software development tools, Delphi supports independent interfaces of Component Object Model (COM) with reference counted class implementations, and support for many third-party components. Interface implementations can be delegated to fields or properties of classes. Message handlers are implemented by tagging a method of a class with the integer constant of the message to handle. Database connectivity is extensively supported through VCL database-aware and database access components. Later versions have included upgraded and enhanced runtime library routines, some provided by the community group FastCode.")
doc = nlp(text)

# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

import spacy

# Load English tokenizer, tagger, parser and NER

nlp = spacy.load("en_core_web_sm")

# Process whole documents

text = ("Delphi supports rapid application development (RAD). Prominent features are a visual designer and two application frameworks, VCL for Windows and FireMonkey (FMX) for cross-platform development. Delphi uses the Pascal-based programming language Object Pascal created by Anders Hejlsberg for Borland (now IDERA) as the successor to Turbo Pascal. It supports native cross-compilation to many platforms including Windows, Linux, iOS and Android. To better support development for Microsoft Windows and interoperate with code developed with other software development tools, Delphi supports independent interfaces of Component Object Model (COM) with reference counted class implementations, and support for many third-party components. Interface implementations can be delegated to fields or properties of classes. Message handlers are implemented by tagging a method of a class with the integer constant of the message to handle. Database connectivity is extensively supported through VCL database-aware and database access components. Later versions have included upgraded and enhanced runtime library routines, some provided by the community group FastCode.")

doc = nlp(text)

# Analyze syntax

print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])

print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])

# Find named entities, phrases and concepts

for entity in doc.ents:

print(entity.text, entity.label_)

Using Python and spaCy Natural Language Processing

spaCy Demo with Python4Delphi in Windows Using Python4Delphi for Natural Language Processing — spaCy Demo with Python4Delphi in Windows

Want to know some more? Then check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi.

References & further readings

[1] @kulkarnimahesh321. (2022).

Python | Gender Identification by name using NLTK. GeeksforGeeks. geeksforgeeks.org/python-gender-identification-by-name-using-nltk

[2] Singh, V. (2017).

Replace or retrieve keywords in documents at scale. arXiv preprint arXiv:1711.00046

[3] Řehůřek, R. (2009-2024).

Similarity Queries. Gensim documentation. radimrehurek.com/gensim/auto_examples/core/ run_similarity_queries.html

[4] Loria, S. and all TextBlob contributors. (2013-2024)

TextBlob: Simplified Text Processing. TextBlob documentation. textblob.readthedocs.io/en/dev

[5] Explosion. (2016-2024)

spaCy 101: Everything you need to know. spaCy documentation. Explosion. spacy.io/usage/spacy-101

[6] Hakim, M. A. (2021).

Article02 – 5 Python’s Natural Language Processing Libraries. embarcaderoBlog-repo GitHub. github.com/MuhammadAzizulHakim/ embarcaderoBlog-repo/tree/main/Article02%20-%205%20Python’s%20Natural%20Language %20Processing%20Libraries

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder.
Design. Code. Compile. Deploy.
Start Free Trial Upgrade Today

Free Delphi Community Edition Free C++Builder Community Edition

2 Comments


Al Pool

December 16, 2021 at 8:15 pm

Hello,

I have a module that takes a blob of text and looks for patterns to pull out contact info, phone, postal code, email, etc. It works to a point, but entity identification is nearly impossible.

I am wondering if NLP might be a solution?
- Reply
  
  Ian Barker
  
  December 17, 2021 at 7:34 am
  
  It might be the right solution. You could also look at these two articles which show you how to detect and label objects in images too:
  
  * https://blogs.embarcadero.com/detecting-objects-on-images-using-google-cloud-vision-api/
  * https://blogs.embarcadero.com/this-api-adds-machine-learning-computer-vision-to-your-app/