Quickly Convert Speech To Text With Powerful Google Artificial Intelligence APIs

Table of Contents

What can the Google Audio Transcription API do for our applications?

Google’s Audio Transcription API allows your program to use all the power of Google’s huge computing resources to accurately transcribe speech found in audio files into text.

As Google’s documentation says:

Transcribe your content in real time or from stored files
Deliver a better user experience in products through voice commands
Gain insights from customer interactions to improve your service
How can we use the Audio Transcription API with Delphi?

Google cloud services have been positioned as a must have computing service solution today. They allow us to easily use their perfectly-designed AI solutions to use in our applications. Not only that, prices are also reasonable and you can start with zero payment by just adding your credit card.

But how about getting those amazing functions to our Delphi application? Some people would think that Delphi is not the ideal language to work with those popular cloud computing APIs. It’s not true. With the help of the huge community for Delphi, it’s simpler than you think.

Google Speech to Text and Text to speech are two of those cloud computing functions that could vital for some business applications. With the help of few repositories by grijjy in GitHub, we can easily get those functionalities to our Delphi application. Actually the repository is dated from 2017 but surprisingly it works perfectly with the newer versions of Delphi without many changes to the code.

How to setup Google cloud services to work with Delphi?

To use Google cloud services in our Delphi project, we need credentials to allow our project to use our google cloud API account. To do that, please visit this link and create an account.

https://cloud.google.com

Then create a new project.

Create new Google Cloud Platform Project for the Google Speech To Text API

Then go to API and services dashboard and create new credentials. Make sure it’s service account. Then go to the new service account and create new key in Keys tab. Please make sure the key type is P12.

Google Speech To Text API Create private key

Now you need to convert this P12 key to PEM to use with our Delphi components. To do that, please run this command in a folder with OpenSSL binaries. Make sure to edit file names:

<em>openssl pkcs12 -in path.p12 -out newfile.key.pem -nocerts -nodes</em>

1	<em>openssl pkcs12 -in path.p12 -out newfile.key.pem -nocerts -nodes</em>

Keep that generated pem file to use with our Speech to Text Delphi project.

What are the prerequisites for creating a Delphi Audio Transcription project?

Download the GrijjyFoundation repository.

https://github.com/grijjy/GrijjyFoundation

Download the “Nghttp2.pas” from this link and copy it to the GrijjyFoundation folder.

https://github.com/grijjy/DelphiRemotePushSender/blob/master/Nghttp2.pas

Download the “Google.API.pas” form this link and include it in your project folder.

https://github.com/grijjy/DelphiGoogleAPI/blob/master/Google.API.pas

How to setup new Delphi project to demonstrate Audio Transcription?

Create a new Delphi project and add components like this. Alternatively, you can download the demo project form this link.

Demo Project

Google Speech To Text API Delphi project for audio transcription

GrijjyFoundation has many units to support many Google Cloud services. But in this project we only need few of them. Please include only these units to your Delphi project.

Google Speech To Text API Included units for Delphi project

How to code the Delphi Audio Transcription project?

Now let’s move to the coding part. We use an instance of “TgoGoogle” class to post data and get the response form the Speech to Text API. We need to set some parameters of “TgoGoogle”.

TgoGoogle.OAuthScope – OAuth Scope of the respective API we going to use. This is the list of Scopes for different APIs.

https://developers.google.com/identity/protocols/googlescopes

We use “https://www.googleapis.com/auth/cloud-platform” in our application to use the Cloud Text-to-Speech API.

TgoGoogle.ServiceAccount – This is the ID of the service account we created on google cloud earlier.

TgoGoogle.PrivateKey – This is the private key we created earlier. It’s a file with PEM extension. In our demo application, we let the user browse the file at runtime.

We can get the PEM file path using this simple code.

procedure TFormMain.btnPEMBrowserClick(Sender: TObject);
begin
   OpenDialog1.DefaultExt := 'pem';
   if OpenDialog1.Execute then
      EditPEM.Text := OpenDialog1.FileName;
end;

procedure TFormMain.btnPEMBrowserClick(Sender: TObject);

begin

OpenDialog1.DefaultExt := 'pem';

if OpenDialog1.Execute then

EditPEM.Text := OpenDialog1.FileName;

end;

How to post the request to Google Speech to Text API and get the response?

The last thing we have to do is just post the request and get the response. Data will be encrypted, so make sure you have SSL DLLs in your EXE path. The URL we going to post data is:

https://speech.googleapis.com/v1beta1/speech:syncrecognize

Response will include the text version of our audio file. Google speech API support many formats. But we use FLAC, which is a lossless format. Here is the list of supported formats.

LINEAR16 – Uncompressed 16-bit signed.
FLAC This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression.
MULAW – 8-bit samples that compound 14-bit audio samples using G.711 PCMU/mu-law.
AMR – Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
AMR_WB – Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.

So finally we post the request and get our audio transcription. Here is the code to do that.

procedure TFormMain.ButtonPostClick(Sender: TObject);
var
 Google: TgoGoogle;
 ResponseHeaders, ResponseContent: String;
begin
 Google := TgoGoogle.Create;
 try
  Google.OAuthScope := EditOAuthScope.Text;
  Google.ServiceAccount := EditServiceAccount.Text;
  Google.PrivateKey := TFile.ReadAllText(EditPEM.Text);

  if Google.Post(EditUrl.Text, MemoRequestContent.Text,
  ResponseHeaders, ResponseContent, 60000) = 200 then
  begin
   MemoResponseHeaders.Text := ResponseHeaders;
   MemoResponseContent.Text := ResponseContent;
  end
  else
  begin
   MemoResponseHeaders.Text := 'ERROR: ' + ResponseHeaders;
   MemoResponseContent.Text := 'ERROR: ' + ResponseContent;
  end;
 finally
  Google.Free;
 end;
end;

procedure TFormMain.ButtonPostClick(Sender: TObject);

var

Google: TgoGoogle;

ResponseHeaders, ResponseContent: String;

begin

Google := TgoGoogle.Create;

try

Google.OAuthScope := EditOAuthScope.Text;

Google.ServiceAccount := EditServiceAccount.Text;

Google.PrivateKey := TFile.ReadAllText(EditPEM.Text);

if Google.Post(EditUrl.Text, MemoRequestContent.Text,

ResponseHeaders, ResponseContent, 60000) = 200 then

begin

MemoResponseHeaders.Text := ResponseHeaders;

MemoResponseContent.Text := ResponseContent;

end

else

begin

MemoResponseHeaders.Text := 'ERROR: ' + ResponseHeaders;

MemoResponseContent.Text := 'ERROR: ' + ResponseContent;

end;

finally

Google.Free;

end;

Well done, now you can have some fun with your little Speech to text application!

Find out more about quickly and easily connecting to REST APIs from Windows, Android, iOS, macOS, and Linux via Delphi.

Ready to create, set up, and deploy cloud services and APIs with Delphi? Try the IDE Software, which can help you create apps in Delphi or C++ environments.

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder.
Design. Code. Compile. Deploy.
Start Free Trial Upgrade Today

Free Delphi Community Edition Free C++Builder Community Edition

2 Comments


Urbano Gómez

March 8, 2023 at 2:09 pm

Is it possible to download the code? (please) the second link is broken
- Reply
  
  Ian Barker
  
  March 9, 2023 at 8:53 am
  
  Hi Urbano – I tested all the links in this article and they seem to work fine for me? If you comment here on which one is broken for you I will check it and update the post if necessary.

Quickly Convert Speech To Text With Powerful Google Artificial Intelligence APIs

What can the Google Audio Transcription API do for our applications?

How to setup Google cloud services to work with Delphi?

What are the prerequisites for creating a Delphi Audio Transcription project?

How to setup new Delphi project to demonstrate Audio Transcription?

How to code the Delphi Audio Transcription project?

How to post the request to Google Speech to Text API and get the response?

2 Comments

Leave a ReplyCancel reply

Search

Something Fresh

Eugene Kryukov, 'Father of FireMonkey', and incredibly talented developer

Delphi 12 And C++Builder 12 Community Editions Released!

New PAServer Docker Image: Smaller, Faster, Better

Popular Posts

Delphi 12 And C++Builder 12 Community Editions Released!

InterBase ODBC Driver on GitHub

Embarcadero Partners with Raize Software for KSVC Maintenance

New in RAD Studio 12.1: Split Editor Views!

RAD Studio 12.1 Athens Patch 1 Available

Categories

Unknown Feed

Unknown Feed

Categories

Useful Links

Follow us

Quickly Convert Speech To Text With Powerful Google Artificial Intelligence APIs

What can the Google Audio Transcription API do for our applications?

How to setup Google cloud services to work with Delphi?

What are the prerequisites for creating a Delphi Audio Transcription project?

How to setup new Delphi project to demonstrate Audio Transcription?

How to code the Delphi Audio Transcription project?

How to post the request to Google Speech to Text API and get the response?

2 Comments

Leave a ReplyCancel reply

Join Our Global Developer Community

Search

Something Fresh

Popular Posts

Categories

Categories

Useful Links

Follow us