What can the Google Audio Transcription API do for our applications?
Google’s Audio Transcription API allows your program to use all the power of Google’s huge computing resources to accurately transcribe speech found in audio files into text.
As Google’s documentation says:
- Transcribe your content in real time or from stored files
- Deliver a better user experience in products through voice commands
- Gain insights from customer interactions to improve your service
- How can we use the Audio Transcription API with Delphi?
Google cloud services have been positioned as a must have computing service solution today. They allow us to easily use their perfectly-designed AI solutions to use in our applications. Not only that, prices are also reasonable and you can start with zero payment by just adding your credit card.
But how about getting those amazing functions to our Delphi application? Some people would think that Delphi is not the ideal language to work with those popular cloud computing APIs. It’s not true. With the help of the huge community for Delphi, it’s simpler than you think.
Google Speech to Text and Text to speech are two of those cloud computing functions that could vital for some business applications. With the help of few repositories by grijjy in GitHub, we can easily get those functionalities to our Delphi application. Actually the repository is dated from 2017 but surprisingly it works perfectly with the newer versions of Delphi without many changes to the code.
How to setup Google cloud services to work with Delphi?
To use Google cloud services in our Delphi project, we need credentials to allow our project to use our google cloud API account. To do that, please visit this link and create an account.
Then create a new project.
Then go to API and services dashboard and create new credentials. Make sure it’s service account. Then go to the new service account and create new key in Keys tab. Please make sure the key type is P12.
Now you need to convert this P12 key to PEM to use with our Delphi components. To do that, please run this command in a folder with OpenSSL binaries. Make sure to edit file names:
<em>openssl pkcs12 -in path.p12 -out newfile.key.pem -nocerts -nodes</em>
Keep that generated pem file to use with our Speech to Text Delphi project.
What are the prerequisites for creating a Delphi Audio Transcription project?
Download the GrijjyFoundation repository.
Download the “Nghttp2.pas” from this link and copy it to the GrijjyFoundation folder.
Download the “Google.API.pas” form this link and include it in your project folder.
How to setup new Delphi project to demonstrate Audio Transcription?
Create a new Delphi project and add components like this. Alternatively, you can download the demo project form this link.
GrijjyFoundation has many units to support many Google Cloud services. But in this project we only need few of them. Please include only these units to your Delphi project.
How to code the Delphi Audio Transcription project?
Now let’s move to the coding part. We use an instance of “TgoGoogle” class to post data and get the response form the Speech to Text API. We need to set some parameters of “TgoGoogle”.
TgoGoogle.OAuthScope – OAuth Scope of the respective API we going to use. This is the list of Scopes for different APIs.
We use “https://www.googleapis.com/auth/cloud-platform” in our application to use the Cloud Text-to-Speech API.
TgoGoogle.ServiceAccount – This is the ID of the service account we created on google cloud earlier.
TgoGoogle.PrivateKey – This is the private key we created earlier. It’s a file with PEM extension. In our demo application, we let the user browse the file at runtime.
We can get the PEM file path using this simple code.
procedure TFormMain.btnPEMBrowserClick(Sender: TObject);
OpenDialog1.DefaultExt := 'pem';
if OpenDialog1.Execute then
EditPEM.Text := OpenDialog1.FileName;
How to post the request to Google Speech to Text API and get the response?
The last thing we have to do is just post the request and get the response. Data will be encrypted, so make sure you have SSL DLLs in your EXE path. The URL we going to post data is:
Response will include the text version of our audio file. Google speech API support many formats. But we use FLAC, which is a lossless format. Here is the list of supported formats.
- LINEAR16 – Uncompressed 16-bit signed.
- FLAC This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression.
- MULAW – 8-bit samples that compound 14-bit audio samples using G.711 PCMU/mu-law.
- AMR – Adaptive Multi-Rate Narrowband codec. sampleRate must be 8000 Hz.
- AMR_WB – Adaptive Multi-Rate Wideband codec. sampleRate must be 16000 Hz.
So finally we post the request and get our audio transcription. Here is the code to do that.
procedure TFormMain.ButtonPostClick(Sender: TObject);
ResponseHeaders, ResponseContent: String;
Google := TgoGoogle.Create;
Google.OAuthScope := EditOAuthScope.Text;
Google.ServiceAccount := EditServiceAccount.Text;
Google.PrivateKey := TFile.ReadAllText(EditPEM.Text);
if Google.Post(EditUrl.Text, MemoRequestContent.Text,
ResponseHeaders, ResponseContent, 60000) = 200 then
MemoResponseHeaders.Text := ResponseHeaders;
MemoResponseContent.Text := ResponseContent;
MemoResponseHeaders.Text := 'ERROR: ' + ResponseHeaders;
MemoResponseContent.Text := 'ERROR: ' + ResponseContent;
Well done, now you can have some fun with your little Speech to text application!