The AI Vision API from Google Cloud provides powerful pre-trained machine learning models that you can easily use in your desktop and mobile applications via REST or RPC API method calls. Assume you want your app to detect objects, locations, activities, animal species, and products, or you want it to detect not only faces but also their emotions. Or you may require the ability to read printed or handwritten text; all of this and much more is possible for free (up to the first 1000 units/month per feature) or at very low prices that are scalable to the use you make with no upfront commitments.
Table of Contents
“Detect Labels” – an ordinary name for some extraordinary AI
The option to “Detect Labels” is part of the AI Vision API that we can use to detect and extract information about entities in an image across a broad group of categories. With that information, we can identify general objects, locations, activities, animal species, products, and more.
We can use RAD Studio Delphi to easily set up a REST client library to take advantage of Google Cloud’s Vision API to empower our desktop and mobile applications. If the request is successful, the server returns a 200 OK HTTP status code and the response in JSON format.
Our RAD Studio and Delphi applications will be able to call the API and perform the detection on a local image file by sending the contents of the image file as a base64 encoded string in the body of the request. Alternatively, it can use an image file located in Google Cloud Storage, or on the Web, without the need to send the contents of the image file in the body of your request.
How do I set up Google’s Cloud Vision Label Detection AI API?
Make sure you refer to Google Cloud Vision API documentation; specifically the “Detect Label” section (https://cloud.google.com/vision/docs/labels), but, in general, this is what you need to do on the Google server-side:
- Visit https://cloud.google.com/vision and login with your Gmail account.
- Create or select a Google Cloud Platform (GCP) project.
- Enable the Vision API for that project.
- Enable the Billing for that project.
- Create an API Key credential.
AI vision in action!
Let’s take, for example, the image below, as a brief practice. Make sure you take a long look at the image and try to think about 5 to 10 things you notice in it. Focus on what calls your attention the most. To make it more fun write them down or say them out loud and then see if Google can guess correctly.
How do I call the Google Vision AI API Label Detection endpoint?
All we need to do is to call the API URL via an HTTP POST method passing the request JSON body with type LABEL_DETECTION and source as the link to the image we want to analyze. We can do that using REST Client libraries available in several programming languages. A quick start guide is available on Google’s documentation found here: (https://cloud.google.com/vision/docs/quickstart-client-libraries).
Actually, at the bottom page of the Google Cloud Vision documentation Guide (https://cloud.google.com/vision/docs/labels) there is an option: “Try This API” allowing you to post the JSON request body as shown below and get a JSON response.
In this article, you’ll learn about this topic, “What does the Google Vision API Object Localization endpoint return?”
[crayon-673f8d3d4715c173531093/] [crayon-673f8d3d47169344195718/]What does the Google Vision Machine Learning API Label Detection endpoint return?
After the call the result will be a list with a “Label” description, the confidence score (which ranges from 0-no confidence at all to 1-very high confidence), and a measure of how important/central a label is to the overall context of a page. The list of Labels is returned in English but you can always use Google Cloud Translation API to translate from English to the language of your preference.
The list in this case only contains 5 results because we have configured the parameter maxResults
to 5 in the JSON request. We can play with this parameter and make the list longer to see how much closer Google would get to the items you listed during our exercise. Later you can go on playing with other images and challenging your friends!
How do I connect my applications to Google Cloud Vision Label Detection API?
Once you have followed the basic steps to set up Label Detection API on Google’s side, make sure you go to the Console and in the “Credentials” menu item click on the “Create Credentials” button and add an API key. Copy this key as we will need it later.
RAD Studio Delphi and C++Builder make it very easy to connect to APIs because you can use the REST Debugger to automatically create the required REST components and then simply paste them into your app.
What components do I need to use?
In Delphi all the job is done using just 3 components to make the API call. They are the TRESTClient, TRESTRequest, and TRESTResponse. Once you connect the REST Debugger successfully and copy and paste the components you will notice that the API URL is set on the BaseURL property of TRESTClient. On the TRESTRequest component you will see that the request type is set to rmPOST, the ContentType is set to ctAPPLICATION_JSON, and that it contains one request body for the POST.
Next, run your RAD Studio Delphi and on the main menu click on Tools > REST Debugger. Configure the REST Debugger as follows marking the content-type as application/json
, and add the POST URL, the JSON request body, and the API key you created. Once you click the “Send Request” button you should see the JSON response, just like we demonstrated above.
How do I build a Windows desktop or Android/iOS mobile device application using the Google Cloud Vision AI API?
Now that you were able to successfully configure and test your API calls on the REST Debugger, just click the “Copy Components” button, go back to Delphi and create a new application project. Now paste the components on your application’s main form.
We can add some very simple code to a TButton OnClick event to make sure everything is configured correctly and we’re done. In five minutes we have made our very first call to the Google Vision machine learning API and we are able to receive JSON responses for whatever images on which we want to perform Label Detection. Please note that on the TRESTResponse component the RootElement is set to ‘responses[0].labelAnnotations
’. This means that the ‘labelAnnotations’ element in the JSON is specifically able to be pulled into an in-memory table (TFDMemTable).
The sample application features a TEdit as a destination to paste in the link to the image you want to analyze. Another TEdit is used for the maxResults parameter with a TMemo to display the JSON results of the REST API call. We have a TStringGrid component with which to navigate and display the data in a tabular way all of which demonstrates how to easily integrate the JSON response result with a TFDMemTable component. When the button is clicked the image is analyzed and the application presents the JSON response as text and as data in a grid. Now you have every thing you need in order to integrate with the response data and make your application process the information the way it better suits your needs.
A quick recap
In this blog post, we learned how to sign up for the Google Cloud Vision machine learning API in order to perform Label detection functions on images. We’ve seen how to use the RAD Studio REST Debugger to connect to the API’s endpoint and copy that code into a real application. Finally, we’ve seen how simple and quick it is to use RAD Studio Delphi to create a real Windows (and Linux, macOS, Android, or iOS) application that connects to the Google Cloud Vision API, performs Label Detection image analysis, and returns an in-memory dataset ready for us to use.
Ready to create and integrate the machine learning API for label detection on your own? You may use the Multiple Platform App Development tool which will help you develop applications on Windows and mobile platforms. Request a Free Trial here.