Table of Contents
How do I start using Natural Language Processing in Windows?
Natural language processing (NLP) is a subfield of Linguistics, Computer Science, and Artificial Intelligence which concerned with the interactions between computers and human language, in particular, how to program computers to process and analyze large amounts of natural language data, or teaching machines how to understand human languages and extract meaning from text.
The common tasks in NLP include Text Mining, Text Classification, Text Analysis, Sentiment Analysis, Word Sequencing, Speech Recognition & Generation, Machine Translation, and Dialog Systems, to name a few.
Since NLP relies on advanced computational skills and tools, developers need the best available tools to help them to make the most of NLP approaches and algorithms for creating services that can handle natural languages.
We can build Windows apps with Natural Language Processing capabilities using Embarcadero’s Python4Delphi (P4D). P4D empowers Python users with Delphi’s award-winning VCL functionalities for Windows which enables us to build native Windows apps 5x faster. This integration enables us to create a modern GUI with Windows 10 looks and responsive controls for our Python Natural Language Processing applications.
Python4Delphi makes it very easy to use Python as a scripting language for Delphi applications. It also comes with an extensive range of demos and tutorials. With Python4Delphi, you can integrate any Python features, functionalities, and libraries with Delphi to create a nice GUI for your Natural Language Processing applications in Windows.
In this tutorial, we will discuss the following:
How to use these 5 Python libraries with different Natural Language Processing capabilities to perform Natural Language Processing in Windows Apps: NLTK, FlashText, Gensim, TextBlob, and spaCy.
All of them would be integrated with Python4Delphi to create Windows Apps with Natural Language Processing capabilities.
Prerequisites: Before we begin to work, download and install the latest Python for your platform. Follow the Python4Delphi installation instructions mentioned here. Alternatively, you can check out the easy instructions found in the Getting Started With Python4Delphi video by Jim McKeeth.
Time to get started!
First, open and run our Python GUI using project Demo1 from Python4Delphi with RAD Studio. Then insert the script into the lower Memo, click the Execute button, and get the result in the upper Memo. You can find the Demo1 source on GitHub. The behind the scene details of how Delphi manages to run your Python code in this amazing Python GUI can be found at this link.
1. How do I enable NLTK for NLP inside Python4Delphi in Windows?
NLTK is a leading platform for building Python programs to work with human language data. Natural Language Processing or NLP for short — in a wide sense, to cover any kind of computer manipulation of natural language. NLP is a field in Machine Learning with the ability of a computer to understand, analyze, manipulate, and potentially generate human language.
NLTK provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
Do you want to perform Natural Language Processing tasks like predicting text, analyzing & visualizing sentence structure, Sentiment Analysis, gender classification, etc. in the Windows GUI app? You can easily solve these tasks by combining the NLTK library with Python4Delphi (P4D).
This section will show you how to get started using NLTK combined with Python4Delphi!
First, here is how you can get NLTK:
1 |
pip install nltk |
Practical work in Natural Language Processing typically uses large bodies of linguistic data or corpora. You can add the popular NLTK datasets to your system using this command:
1 |
python -m nltk.downloader popular |
and don’t forget to put the path where your NLTK installed, to the System Environment Variables, here are the example:
1 2 3 |
C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Lib/site-packages C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38/Scripts C:/Users/YOUR_USERNAME/AppData/Local/Programs/Python/Python38 |
The following is a code example of NLTK to create a classifier app that could predict gender from the people’s name as input (run this inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# Importing libraries import random from nltk.corpus import names import nltk def gender_features(word): return {'last_letter':word[-1]} # Preparing a list of examples and corresponding class labels. labeled_names = ([(name, 'male') for name in names.words('male.txt')]+ [(name, 'female') for name in names.words('female.txt')]) random.shuffle(labeled_names) # We use the feature extractor to process the names data. featuresets = [(gender_features(n), gender) for (n, gender)in labeled_names] # Divide the resulting list of feature sets into a training set and a test set. train_set, test_set = featuresets[500:], featuresets[:500] # The training set is used to train a new "naive Bayes" classifier. classifier = nltk.NaiveBayesClassifier.train(train_set) print(classifier.classify(gender_features('Sherlock'))) # Output should be 'male' print(nltk.classify.accuracy(classifier, train_set)) # Show most informative features classifier.show_most_informative_features(10) |
Here is the result in the Python GUI
2. How do I use FlashText for NLP inside Python4Delphi on Windows?
FlashText is a module that can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm.
FlashText algorithm is an algorithm for replacing keywords or finding keywords in a given text. FlashText can search or replace keywords in one pass over a document.
Do you want to perform Natural Language Processing tasks like replacing or extracting words in a text, in the Windows GUI app? This section will show you how to get started!
First, here is how you can get FlashText:
1 |
pip install flashtext |
The following is an introductory example of FlashText to perform keyword extraction, replacing keywords, and case sensitive (run this inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Extract keywords from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() ## Keyword_processor.add_keyword(<unclean name>, <standardised name>) keyword_processor.add_keyword('Big Apple', 'New York') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.') print(keywords_found) # Replace keywords keyword_processor.add_keyword('New Delhi', 'NCR region') new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.') print(new_sentence) # Case Sensitive example from flashtext import KeywordProcessor keyword_processor = KeywordProcessor(case_sensitive=True) keyword_processor.add_keyword('Big Apple', 'New York') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') print(keywords_found) |
Here is the FlashText code in the Python GUI
3. How do I enable Gensim for Natural Language Processing inside Python4Delphi on Windows?
Gensim is an open-source library for Unsupervised Topic Modeling and Natural Language Processing, using Modern Statistical Machine Learning. Gensim has been used and cited in over 1400 commercial and academic applications as of 2018, in a diverse array of disciplines from medicine to insurance claim analysis to patent search.
Design principles of Gensim:
- Practicality – As industry experts, they focus on proven, battle-hardened algorithms to solve real industry problems. More focus on engineering, less on academia.
- Memory independence – There is no need for the whole training corpus to reside fully in RAM at any one time. Can process large, web-scale corpora using data streaming.
- Performance – Highly optimized implementations of popular vector space algorithms using C, BLAS and memory-mapping.
By now, Gensim is known to be the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modeling from plain text.
Getting and installing Gensim
This section will guide you to combine Python4Delphi with the Gensim library, inside Delphi and C++Builder, from installing Gensim with pip to perform similarity queries tasks.
First, here is how you can get Gensim:
1 |
pip install gensim |
Using Gensim for Python Natural Language Processing
The following is a code example of Gensim to perform similarity queries (run this inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
import logging logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) # Creating the Corpus from collections import defaultdict from gensim import corpora documents = [ "Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering", "Graph minors A survey", ] # Remove common words and tokenize stoplist = set('for a of the and to in'.split()) texts = [[word for word in document.lower().split() if word not in stoplist] for document in documents ] # Remove words that appear only once frequency = defaultdict(int) for text in texts: for token in text: frequency[token] += 1 texts = [[token for token in text if frequency[token] > 1] for text in texts ] dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(text) for text in texts] # Similarity interface from gensim import models lsi = models.LsiModel(corpus, id2word=dictionary, num_topics=2) doc = "Human computer interaction" vec_bow = dictionary.doc2bow(doc.lower().split()) vec_lsi = lsi[vec_bow] # Convert the query to LSI space print(vec_lsi) # We will be considering `cosine similarity <http://en.wikipedia.org/wiki/Cosine_similarity>`_ # to determine the similarity of two vectors. # Initializing query structures from gensim import similarities index = similarities.MatrixSimilarity(lsi[corpus]) # Transform corpus to LSI space and index it index.save('C:/Users/ASUS/deerwester.index') index = similarities.MatrixSimilarity.load('C:/Users/ASUS/deerwester.index') # Performing queries sims = index[vec_lsi] # Perform a similarity query against the corpus print(list(enumerate(sims))) # Print (document_number, document_similarity) 2-tuples # Cosine measure returns similarities in the range `<-1, 1>` (the greater, the more similar), # so that the first document has a score of 0.99809301 etc. sims = sorted(enumerate(sims), key=lambda item: -item[1]) for doc_position, doc_score in sims: print(doc_score, documents[doc_position]) |
Gensim Python4Delphi results
4. How do I use TextBlob for NLP inside Python4Delphi in Windows?
TextBlob is a Python library for processing textual data. It provides a simple and consistent API for diving into common Natural Language Processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Here are TextBlob powerful features at a glance:
- Noun phrase extraction
- Part-of-speech tagging (POS tagging)
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration
First, here is how you can get TextBlob
1 |
pip install textblob |
The following is a code example of TextBlob to perform part-of-speech (POS) tagging, noun phrase extraction, and sentiment analysis (run this inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
from textblob import TextBlob text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' blob = TextBlob(text) print(blob.tags) print(blob.noun_phrases) for sentence in blob.sentences: print(sentence.sentiment.polarity) |
TextBlob Natural Language Processing Result
5. How do I enable spaCy for NLP inside Python4Delphi in Windows?
spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. It’s designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems.
Here are spaCy powerful features overview:
- Support for 64+ languages
- 55 trained pipelines for 17 languages
- Multi-task learning with pre-trained transformers like BERT
- Pretrained word vectors
- State-of-the-art speed
- Production-ready training system
- Linguistically-motivated tokenization
- Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking, and more
- Easily extensible with custom components and attributes
- Support for custom models in PyTorch, TensorFlow, and other frameworks
- Built-in visualizers for syntax and NER
- Easy model packaging, deployment, and workflow management
- Robust, rigorously evaluated accuracy
Installing spaCy for Natural Language Processing
1 |
pip install -U spacy |
Download trained pipeline here:
1 |
python -m spacy download en_core_web_sm |
A spaCy Python code example
The following is a code example of spaCy to analyze syntax, find named entities, phrases and concepts to any given documents (run this inside the lower Memo of Python4Delphi Demo01 GUI):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import spacy # Load English tokenizer, tagger, parser and NER nlp = spacy.load("en_core_web_sm") # Process whole documents text = ("Delphi supports rapid application development (RAD). Prominent features are a visual designer and two application frameworks, VCL for Windows and FireMonkey (FMX) for cross-platform development. Delphi uses the Pascal-based programming language Object Pascal created by Anders Hejlsberg for Borland (now IDERA) as the successor to Turbo Pascal. It supports native cross-compilation to many platforms including Windows, Linux, iOS and Android. To better support development for Microsoft Windows and interoperate with code developed with other software development tools, Delphi supports independent interfaces of Component Object Model (COM) with reference counted class implementations, and support for many third-party components. Interface implementations can be delegated to fields or properties of classes. Message handlers are implemented by tagging a method of a class with the integer constant of the message to handle. Database connectivity is extensively supported through VCL database-aware and database access components. Later versions have included upgraded and enhanced runtime library routines, some provided by the community group FastCode.") doc = nlp(text) # Analyze syntax print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks]) print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"]) # Find named entities, phrases and concepts for entity in doc.ents: print(entity.text, entity.label_) |
Using Python and spaCy Natural Language Processing
Want to know some more? Then check out Python4Delphi which easily allows you to build Python GUIs for Windows using Delphi.
Design. Code. Compile. Deploy.
Start Free Trial Upgrade Today
Free Delphi Community Edition Free C++Builder Community Edition
Hello,
I have a module that takes a blob of text and looks for patterns to pull out contact info, phone, postal code, email, etc. It works to a point, but entity identification is nearly impossible.
I am wondering if NLP might be a solution?
It might be the right solution. You could also look at these two articles which show you how to detect and label objects in images too:
* https://blogs.embarcadero.com/detecting-objects-on-images-using-google-cloud-vision-api/
* https://blogs.embarcadero.com/this-api-adds-machine-learning-computer-vision-to-your-app/