Quickly Convert Speech To Text With Powerful Google Artificial Intelligence APIs

Table of Contents

What can the Google Audio Transcription API do for our applications?

Google’s Audio Transcription API allows your program to use all the power of Google’s huge computing resources to accurately transcribe speech found in audio files into text.

As Google’s documentation says:

Transcribe your content in real time or from stored files
Deliver a better user experience in products through voice commands
Gain insights from customer interactions to improve your service
How can we use the Audio Transcription API with Delphi?

Google cloud services have been positioned as a must have computing service solution today. They allow us to easily use their perfectly-designed AI solutions to use in our applications. Not only that, prices are also reasonable and you can start with zero payment by just adding your credit card.

But how about getting those amazing functions to our Delphi application? Some people would think that Delphi is not the ideal language to work with those popular cloud computing APIs. It’s not true. With the help of the huge community for Delphi, it’s simpler than you think.

Google Speech to Text and Text to speech are two of those cloud computing functions that could vital for some business applications. With the help of few repositories by grijjy in GitHub, we can easily get those functionalities to our Delphi application. Actually the repository is dated from 2017 but surprisingly it works perfectly with the newer versions of Delphi without many changes to the code.

How to setup Google cloud services to work with Delphi?

To use Google cloud services in our Delphi project, we need credentials to allow our project to use our google cloud API account. To do that, please visit this link and create an account.

https://cloud.google.com

Then create a new project.

Create new Google Cloud Platform Project for the Google Speech To Text API

Then go to API and services dashboard and create new credentials. Make sure it’s service account. Then go to the new service account and create new key in Keys tab. Please make sure the key type is P12.

Google Speech To Text API - Create private key

Now you need to convert this P12 key to PEM to use with our Delphi components. To do that, please run this command in a folder with OpenSSL binaries. Make sure to edit file names:

openssl pkcs12 -in path.p12 -out newfile.key.pem -nocerts -nodes

Keep that generated pem file to use with our Speech to Text Delphi project.

What are the prerequisites for creating a Delphi Audio Transcription project?

Download the GrijjyFoundation repository.

https://github.com/grijjy/GrijjyFoundation

Download the “Nghttp2.pas” from this link and copy it to the GrijjyFoundation folder.

https://github.com/grijjy/DelphiRemotePushSender/blob/master/Nghttp2.pas

Download the “Google.API.pas” form this link and include it in your project folder.

https://github.com/grijjy/DelphiGoogleAPI/blob/master/Google.API.pas

How to setup new Delphi project to demonstrate Audio Transcription?

Create a new Delphi project and add components like this. Alternatively, you can download the demo project form this link.

Demo Project

Google Speech To Text API - Delphi project for audio transcription

GrijjyFoundation has many units to support many Google Cloud services. But in this project we only need few of them. Please include only these units to your Delphi project.

Google Speech To Text API - Included units for Delphi project

How to code the Delphi Audio Transcription project?

Now let’s move to the coding part. We use an instance of “TgoGoogle” class to post data and get the response form the Speech to Text API. We need to set some parameters of “TgoGoogle”.

TgoGoogle.OAuthScope – OAuth Scope of the respective API we going to use. This is the list of Scopes for different APIs.

https://developers.google.com/identity/protocols/googlescopes

We use “https://www.googleapis.com/auth/cloud-platform” in our application to use the Cloud Text-to-Speech API.

TgoGoogle.ServiceAccount – This is the ID of the service account we created on google cloud earlier.

TgoGoogle.PrivateKey – This is the private key we created earlier. It’s a file with PEM extension. In our demo application, we let the user browse the file at runtime.

We can get the PEM file path using this simple code.

procedure TFormMain.btnPEMBrowserClick(Sender: TObject);
begin
   OpenDialog1.DefaultExt := 'pem';
   if OpenDialog1.Execute then
      EditPEM.Text := OpenDialog1.FileName;
end;

How to post the request to Google Speech to Text API and get the response?

The last thing we have to do is just post the request and get the response. Data will be encrypted, so make sure you have SSL DLLs in your EXE path. The URL we going to post data is:

https://speech.googleapis.com/v1beta1/speech:syncrecognize

Response will include the text version of our audio file. Google speech API support many formats. But we use FLAC, which is a lossless format. Here is the list of supported formats.

LINEAR16 – Uncompressed 16-bit signed.
FLAC This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression.
MULAW – 8-bit samples that compound 14-bit audio samples using G.711 PCMU/mu-law.
AMR – Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
AMR_WB – Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.

So finally we post the request and get our audio transcription. Here is the code to do that.

procedure TFormMain.ButtonPostClick(Sender: TObject);
var
 Google: TgoGoogle;
 ResponseHeaders, ResponseContent: String;
begin
 Google := TgoGoogle.Create;
 try
  Google.OAuthScope := EditOAuthScope.Text;
  Google.ServiceAccount := EditServiceAccount.Text;
  Google.PrivateKey := TFile.ReadAllText(EditPEM.Text);

  if Google.Post(EditUrl.Text, MemoRequestContent.Text,
  ResponseHeaders, ResponseContent, 60000) = 200 then
  begin
   MemoResponseHeaders.Text := ResponseHeaders;
   MemoResponseContent.Text := ResponseContent;
  end
  else
  begin
   MemoResponseHeaders.Text := 'ERROR: ' + ResponseHeaders;
   MemoResponseContent.Text := 'ERROR: ' + ResponseContent;
  end;
 finally
  Google.Free;
 end;
end;

Well done, now you can have some fun with your little Speech to text application!

Find out more about quickly and easily connecting to REST APIs from Windows, Android, iOS, macOS, and Linux via Delphi.

Ready to create, set up, and deploy cloud services and APIs with Delphi? Try the IDE Software, which can help you create apps in Delphi or C++ environments.

Special Live Webinar: Introducing Kai - A New Chapter for RAD Studio

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder.
Design. Code. Compile. Deploy.

Start Free Trial Upgrade Today

Free Delphi Community Edition Free C++Builder Community Edition

2 Comments


Urbano Gómez

March 8, 2023 at 2:09 pm

Is it possible to download the code? (please) the second link is broken
- Reply
  
  Ian Barker
  
  March 9, 2023 at 8:53 am
  
  Hi Urbano – I tested all the links in this article and they seem to work fine for me? If you comment here on which one is broken for you I will check it and update the post if necessary.

Quickly Convert Speech To Text With Powerful Google Artificial Intelligence APIs

What can the Google Audio Transcription API do for our applications?

How to setup Google cloud services to work with Delphi?

What are the prerequisites for creating a Delphi Audio Transcription project?

How to setup new Delphi project to demonstrate Audio Transcription?

How to code the Delphi Audio Transcription project?

How to post the request to Google Speech to Text API and get the response?

2 Comments

Leave a ReplyCancel reply

Search

Something Fresh

The Embarcadero Conference 2026 is coming! Years of innovation, code, and the Delphi community.

Share What You Built With Kai For Recognition And Great Giveaways

Update Subscription Customers Invited to Join RAD Studio “Pasiphae” Beta

Popular Posts

Announcing the Availability of RAD Studio 13 Florence Update 1

The Spirit of C++: Freedom, Responsibility, and the Reality of Complex Systems

A Summary of Year 2025 for RAD Studio, Delphi, and C++Builder

Is C++ Too Complex?

Rethinking C++: Ignorance, Surface, and Deep Architecture

Categories

Popular From News

26 años ... de Delphi

Nuevo para el IDE en RAD Studio 10.4.1

RAD Studio con Delphi - ¡El código bajo original!

Tutorial de desarrollo de aplicaciones de Windows 10 para principiantes

Tkinter vs DelphiFMX en Embarcadero Open Source Stream

Latest From GetItNow

C++Builder @ stackoverflow

Delphi @ stackoverflow

InterBase @ stackoverflow

Categories

Useful Links

Follow us