How To Use OpenAI To Add Realistic Text-To-Speech To Your Apps

speech and text by speech generation capabilities with delphi fmx app via openai api

Do you want to use OpenAI to make your cross-platform apps generate realistic speech from text? Read all about how to do this in this article from Tech Partner Softacom.

Overview Of How To Use OpenAI To Add Realistic Text-To-Speech To Your Apps

Modern artificial intelligence-based services enable speech generation and speech-to-text conversion. Moreover, they support a wide range of languages. We can easily input text into the service and receive synthesized speech output. Thanks to the available settings, we can also choose the type of voice for the generated speech.

Additionally, it’s possible to convert speech into text. For example, we can transcribe songs from MP3 tracks of our favorite artists into text.

In the article, we will analyze the capabilities of the OpenAI API for speech generation based on textual descriptions, and vice versa, the generation of text from speech in our Embarcadero Delphi FMX application.

To use the features of the OpenAI API for speech generation and text transcription, we need to register and obtain a secret key. In the article dedicated to text generation, we demonstrated the process of registration and obtaining a secret key (API key).

To generate speech based on user requests, we will utilize the OpenAI API (the “Create speech” tab).

The OpenAI API offers extensive functionality for speech generation. Here we can configure the voice type, output media file format (such as mp3, wav, etc.), speech speed in the generated media file, textual description for the generated speech, and the machine learning model that will be used (tts-1 or tts-1-hd).

To extract text from speech, we will also use the OpenAI API (the “Create transcription” tab).

The OpenAI API also has rich capabilities for generating text based on speech.

Here, we can configure the type of input media file (mp3, wav, etc.) and the format of the response from OpenAI (json, text, srt, verbose_json, or vtt).

To enhance the usability of the OpenAI API for generating speech based on user requests and generating text from speech, we will expand the capabilities of the previously developed TChatGPT class described in the earlier published article focused on text generation.

Let’s add to our class such methods as GetGeneratedSpeechAsStream and GetGeneratedTextFromSpeech. We will also add an overloaded constructor method Create to enable the extraction of text from speech in a media file.

tchatgptmodHow To Use OpenAI To Add Realistic Text To Speech To Your Appsifiedclass

The taken version of the constructor method for the TChatGPT class accepts the following objects as input parameters: HttpBasicAuthenticator (the THTTPBasicAuthenticator class), RESTClient (the TRESTClient class), RESTRequest (the TRESTRequest class), and a string constant OpenAIApiKey with our secret key.

ChatGPT class allows us to obtain a TMemoryStream object containing our generated speech based on a textual description using the OpenAI API.

Its input parameters include string constants Input and Voice with representing our textual description and the type of generated voice (alloy, echo, fable, onyx, nova, and shimmer).

Details about the machine learning model that is used (in our case, tts-1), as well as the textual description for speech generation (input) and voice type (voice), are contained in JObj, an object of the TJSONObject class.

The string variable Request stores the data from the JObj object as a string. Further, the content of Request in string format will be passed into a StringStream object (the TStringStream class).

Next, the string data from StringStream is transferred to a MultipartFormData object (the TMultipartFormData class).

The URL of the OpenAI API for speech generation is sent to the Post method of the FNetHttpClient object as the input parameter.

Moreover, a MultipartFormData object with our model data, textual description for speech generation, and voice type is passed as well.

Similar to text and image generation projects, we also need to include headers (Authorization and Content-Type). Upon executing the FNetHttpClient.Post method, we will obtain the generated speech from OpenAI in the form of a TMemoryStream.

The GetGeneratedTextFromSpeech method will allow us to convert speech into text. The method takes a string constant InputAudioFilePath as input, which contains the path to the media file. The BaseURL property of the FRESTClient object contains the URL of the OpenAI API for generating text based on speech from our media file. The FRESTRequest object contains information about the type of response from OpenAI (text, json, srt, verbose_json, or vtt), the machine learning model used (in our case, whisper-1), and the path to our media file with recorded speech.

Authentication will be performed using the FHTTPBasicAuthenticator object (the THTTPBasicAuthenticator class). We need to assign our secret key to the Password field (FHTTPBasicAuthenticator.Password:= FOpenAIApiKey).

The FRESTRequest.Execute method will perform a POST request, passing the media file to extract text from it using OpenAI. As a result, we will receive a string with the converted speech text (Result:= FRESTRequest.Response.Content).

The full source code of the TChatGPT class is presented below.

unit ChatGPTHelper;

interface

uses
  System.SysUtils, System.Types, System.UITypes, System.Classes,
  System.Variants, FMX.Types, FMX.Controls, FMX.Forms, FMX.Graphics,
  FMX.Dialogs, FMX.Memo.Types, FMX.ScrollBox, FMX.Memo, FMX.StdCtrls,
  FMX.Controls.Presentation, System.Net.URLClient, System.Net.HttpClient,
  System.Net.HttpClientComponent, JSON, System.Threading,
  System.Net.Mime, System.Generics.Collections,
  REST.Client, REST.Types, REST.Authenticator.Basic;

type
  IChatGPTHelper = interface
    function SendTextToChatGPT(const Text: string): string;
    function GetJSONWithImage(const Prompt: string; ResponseFormat: Integer): string;
    function GetImageURLFromJSON(const JsonResponse: string): string;
    function GetImageAsStream(const ImageURL: string): TMemoryStream;
    function GetImageBASE64FromJSON(const JsonResponse: string): string;
    function GetGeneratedSpeechAsStream(const Input: string; const Voice: string): TMemoryStream;
    function GetGeneratedTextFromSpeech(const InputAudioFilePath: string): string;
  end;

  TChatGPT = class(TInterfacedObject, IChatGPTHelper)
  private
    FNetHttpClient: TNetHTTPClient;
    FHttpBasicAuthenticator: THTTPBasicAuthenticator;
    FRestRequest: TRESTRequest;
    FRestClient: TRESTClient;
    FOpenAIApiKey: string;
    FText: string;
    function FormatJSON(const JSON: string): string;
    function SendTextToChatGPT(const Text: string): string;
    function GetJSONWithImage(const Prompt: string; ResponseFormat: Integer): string;
    function GetImageURLFromJSON(const JsonResponse: string): string;
    function GetImageAsStream(const ImageURL: string): TMemoryStream;
    function GetImageBASE64FromJSON(const JsonResponse: string): string;
    function GetGeneratedSpeechAsStream(const Input: string; const Voice: string): TMemoryStream;
    function GetGeneratedTextFromSpeech(const InputAudioFilePath: string): string;
  public
    constructor Create(const NetHttpClient: TNetHTTPClient;
      const OpenAIApiKey: string); overload;
    constructor Create(const HttpBasicAuthentificator: THTTPBasicAuthenticator;
      const RESTClient: TRESTClient; const RESTRequest: TRESTRequest;
      const OpenAIApiKey: string); overload;
    class function MessageContentFromChatGPT(const JsonAnswer: string): string;
  end;

implementation

{ TFirebaseAuth }

constructor TChatGPT.Create(const NetHttpClient: TNetHTTPClient;
  const OpenAIApiKey: string);
begin
  FNetHttpClient := NetHttpClient;
  if OpenAIApiKey <> '' then
    FOpenAIApiKey := OpenAIApiKey
  else
  begin
    ShowMessage('OpenAI API key is empty!');
    Exit;
  end;
end;

constructor TChatGPT.Create(const HttpBasicAuthentificator: THTTPBasicAuthenticator;
  const RESTClient: TRESTClient; const RESTRequest: TRESTRequest;
  const OpenAIApiKey: string);
begin
  FHttpBasicAuthenticator := HttpBasicAuthentificator;
  FRestRequest :=  RESTRequest;
  FRestClient := RESTClient;
  if OpenAIApiKey <> '' then
    FOpenAIApiKey := OpenAIApiKey
  else
  begin
    ShowMessage('OpenAI API key is empty!');
    Exit;
  end;
end;

function TChatGPT.FormatJSON(const JSON: string): string;
var
  JsonObject: TJsonObject;
begin
  JsonObject := TJsonObject.ParseJSONValue(JSON) as TJsonObject;
  try
    if Assigned(JsonObject) then
      Result := JsonObject.Format()
    else
      Result := JSON;
  finally
    JsonObject.Free;
  end;
end;

function TChatGPT.GetGeneratedSpeechAsStream(const Input, Voice: string): TMemoryStream;
var
  JObj: TJsonObject;
  Request: string;
  MyHeaders: TArray<TNameValuePair>;
  StringStream: TStringStream;
begin
  JObj := nil;
  StringStream := nil;
  try
    Result := TMemoryStream.Create;
    SetLength(MyHeaders, 2);
    MyHeaders[0] := TNameValuePair.Create('Authorization', FOpenAIApiKey);
    MyHeaders[1] := TNameValuePair.Create('Content-Type', 'application/json');
    JObj := TJSONObject.Create;
    JObj.AddPair('model', 'tts-1');
    JObj.AddPair('input', Input);
    JObj.AddPair('voice', Voice);
    Request := JObj.ToString;
    StringStream := TStringStream.Create(Request, TEncoding.UTF8);
    FNetHttpClient.Post('https://api.openai.com/v1/audio/speech',
      StringStream, Result, MyHeaders);
  finally
    JObj.Free;
    StringStream.Free;
  end;
end;

function TChatGPT.GetGeneratedTextFromSpeech(const InputAudioFilePath: string): string;
begin
  FRESTClient.Authenticator := FHTTPBasicAuthenticator;
  FRESTRequest.Method := TRESTRequestMethod.rmPOST;
  FHTTPBasicAuthenticator.Password := FOpenAIApiKey;
  FRESTClient.BaseURL := 'https://api.openai.com/v1/audio/transcriptions';
  FRESTRequest.AddParameter('response_format', 'text',
    TRESTRequestParameterKind.pkREQUESTBODY);
  FRESTRequest.AddParameter('model', 'whisper-1',
    TRESTRequestParameterKind.pkREQUESTBODY);
  FRESTRequest.AddFile('file', InputAudioFilePath,
    TRESTContentType.ctAPPLICATION_OCTET_STREAM);
  FRESTRequest.Client := FRESTClient;
  FRESTRequest.Execute;
  Result := FRESTRequest.Response.Content;
end;

function TChatGPT.GetImageAsStream(const ImageURL: string): TMemoryStream;
begin
  Result := TMemoryStream.Create;
  FNetHTTPClient.Get(ImageURL, Result);
end;

function TChatGPT.GetImageURLFromJSON(const JsonResponse: string): string;
var
  Json: TJsonObject;
  DataArr: TJsonArray;
begin
  Json := TJsonObject.ParseJSONValue(JsonResponse) as TJsonObject;
  try
    if Assigned(Json) then
    begin
      DataArr := TJsonArray(Json.Get('data').JsonValue);
      Result := TJSONPair(TJSONObject(DataArr.Items[0]).Get('url')).JsonValue.Value;
    end
    else
      Result := '';
  finally
    Json.Free;
  end;
end;

function TChatGPT.GetImageBASE64FromJSON(const JsonResponse: string): string;
var
  Json: TJsonObject;
  DataArr: TJsonArray;
begin
  Json := TJsonObject.ParseJSONValue(JsonResponse) as TJsonObject;
  try
    if Assigned(Json) then
    begin
      DataArr := TJsonArray(Json.Get('data').JsonValue);
      Result := TJSONPair(TJSONObject(DataArr.Items[0]).Get('b64_json')).JsonValue.Value;
    end
    else
      Result := '';
  finally
    Json.Free;
  end;
end;


function TChatGPT.GetJSONWithImage(const Prompt: string; ResponseFormat: Integer): string;
var
  JObj: TJsonObject;
  Request: string;
  ResponseContent, StringStream: TStringStream;
  MyHeaders: TArray<TNameValuePair>;
begin
  JObj := nil;
  ResponseContent := nil;
  StringStream := nil;
  try
    SetLength(MyHeaders, 2);
    MyHeaders[0] := TNameValuePair.Create('Authorization', FOpenAIApiKey);
    MyHeaders[1] := TNameValuePair.Create('Content-Type', 'application/json');
    JObj := TJSONObject.Create;
    with JObj do
    begin
      Owned := False;
      AddPair('model', 'dall-e-2');
      if ResponseFormat = 1 then
        AddPair('response_format','b64_json')
      else
        AddPair('response_format','url');
      AddPair('prompt', Prompt);
      AddPair('n', TJSONNumber.Create(1));
      AddPair('size', '1024x1024');
    end;
    Request := Jobj.ToString;
    StringStream := TStringStream.Create(Request, TEncoding.UTF8);
    ResponseContent := TStringStream.Create;
    FNetHttpClient.Post('https://api.openai.com/v1/images/generations',
      StringStream, ResponseContent, MyHeaders);
    Result := ResponseContent.DataString;
  finally
    JObj.Free;
    ResponseContent.Free;
    StringStream.Free;
  end;
end;

class function TChatGPT.MessageContentFromChatGPT(const JsonAnswer: string): string;
var
  Mes: TJsonArray;
  JsonResp: TJsonObject;
begin
  JsonResp := nil;
  try
    JsonResp := TJsonObject.ParseJSONValue(JsonAnswer) as TJsonObject;
    if Assigned(JsonResp) then
    begin
      Mes := TJsonArray(JsonResp.Get('choices').JsonValue);
      Result := TJsonObject(TJsonObject(Mes.Get(0)).Get('message').JsonValue).
      GetValue('content').Value;
    end
    else
      Result := '';
  finally
    JsonResp.Free;
  end;
end;

function TChatGPT.SendTextToChatGPT(const Text: string): string;
var
  JArr: TJsonArray;
  JObj, JObjOut: TJsonObject;
  Request: string;
  ResponseContent, StringStream: TStringStream;
  Headers: TArray<TNameValuePair>;
  I: Integer;
begin
  JArr := nil;
  JObj := nil;
  JObjOut := nil;
  ResponseContent := nil;
  StringStream := nil;
  try
    SetLength(Headers, 2);
    Headers[0] := TNameValuePair.Create('Authorization', FOpenAIApiKey);
    Headers[1] := TNameValuePair.Create('Content-Type', 'application/json');
    JObj := TJsonObject.Create;
    JObj.Owned := False;
    JObj.AddPair('role', 'user');
    JArr := TJsonArray.Create;
    JArr.AddElement(JObj);
    Self.FText := Text;
    JObj.AddPair('content', FText);
    JObjOut := TJsonObject.Create;
    JObjOut.AddPair('model', 'gpt-3.5-turbo');
    JObjOut.AddPair('messages', Trim(JArr.ToString));
    JObjOut.AddPair('temperature', TJSONNumber.Create(0.7));
    Request := JObjOut.ToString.Replace('', '');
    for I := 0 to Length(Request) - 1 do
    begin
      if ((Request[I] = '"') and (Request[I + 1] = '[')) or
        ((Request[I] = '"') and (Request[I - 1] = ']')) then
      begin
        Request[I] := ' ';
      end;
    end;
    ResponseContent := TStringStream.Create;
    StringStream := TStringStream.Create(Request, TEncoding.UTF8);
    FNetHttpClient.Post('https://api.openai.com/v1/chat/completions',
      StringStream, ResponseContent, Headers);
    Result := FormatJSON(ResponseContent.DataString);
  finally
    StringStream.Free;
    ResponseContent.Free;
    JObjOut.Free;
    JArr.Free;
    JObj.Free;
  end;
end;

end.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

unit ChatGPTHelper;

interface

uses

System.SysUtils, System.Types, System.UITypes, System.Classes,

System.Variants, FMX.Types, FMX.Controls, FMX.Forms, FMX.Graphics,

FMX.Dialogs, FMX.Memo.Types, FMX.ScrollBox, FMX.Memo, FMX.StdCtrls,

FMX.Controls.Presentation, System.Net.URLClient, System.Net.HttpClient,

System.Net.HttpClientComponent, JSON, System.Threading,

System.Net.Mime, System.Generics.Collections,

REST.Client, REST.Types, REST.Authenticator.Basic;

type

IChatGPTHelper = interface

function SendTextToChatGPT(const Text: string): string;

function GetJSONWithImage(const Prompt: string; ResponseFormat: Integer): string;

function GetImageURLFromJSON(const JsonResponse: string): string;

function GetImageAsStream(const ImageURL: string): TMemoryStream;

function GetImageBASE64FromJSON(const JsonResponse: string): string;

function GetGeneratedSpeechAsStream(const Input: string; const Voice: string): TMemoryStream;

function GetGeneratedTextFromSpeech(const InputAudioFilePath: string): string;

end;

TChatGPT = class(TInterfacedObject, IChatGPTHelper)

private

FNetHttpClient: TNetHTTPClient;

FHttpBasicAuthenticator: THTTPBasicAuthenticator;

FRestRequest: TRESTRequest;

FRestClient: TRESTClient;

FOpenAIApiKey: string;

FText: string;

function FormatJSON(const JSON: string): string;

function SendTextToChatGPT(const Text: string): string;

function GetJSONWithImage(const Prompt: string; ResponseFormat: Integer): string;

function GetImageURLFromJSON(const JsonResponse: string): string;

function GetImageAsStream(const ImageURL: string): TMemoryStream;

function GetImageBASE64FromJSON(const JsonResponse: string): string;

function GetGeneratedSpeechAsStream(const Input: string; const Voice: string): TMemoryStream;

function GetGeneratedTextFromSpeech(const InputAudioFilePath: string): string;

public

constructor Create(const NetHttpClient: TNetHTTPClient;

const OpenAIApiKey: string); overload;

constructor Create(const HttpBasicAuthentificator: THTTPBasicAuthenticator;

const RESTClient: TRESTClient; const RESTRequest: TRESTRequest;

const OpenAIApiKey: string); overload;

class function MessageContentFromChatGPT(const JsonAnswer: string): string;

end;

implementation

{ TFirebaseAuth }

constructor TChatGPT.Create(const NetHttpClient: TNetHTTPClient;

const OpenAIApiKey: string);

begin

FNetHttpClient := NetHttpClient;

if OpenAIApiKey <> '' then

FOpenAIApiKey := OpenAIApiKey

else

begin

ShowMessage('OpenAI API key is empty!');

Exit;

end;

constructor TChatGPT.Create(const HttpBasicAuthentificator: THTTPBasicAuthenticator;

const RESTClient: TRESTClient; const RESTRequest: TRESTRequest;

const OpenAIApiKey: string);

begin

FHttpBasicAuthenticator := HttpBasicAuthentificator;

FRestRequest := RESTRequest;

FRestClient := RESTClient;

if OpenAIApiKey <> '' then

FOpenAIApiKey := OpenAIApiKey

else

begin

ShowMessage('OpenAI API key is empty!');

Exit;

end;

function TChatGPT.FormatJSON(const JSON: string): string;

var

JsonObject: TJsonObject;

begin

JsonObject := TJsonObject.ParseJSONValue(JSON) as TJsonObject;

try

if Assigned(JsonObject) then

Result := JsonObject.Format()

else

Result := JSON;

finally

JsonObject.Free;

end;

function TChatGPT.GetGeneratedSpeechAsStream(const Input, Voice: string): TMemoryStream;

var

JObj: TJsonObject;

Request: string;

MyHeaders: TArray<TNameValuePair>;

StringStream: TStringStream;

begin

JObj := nil;

StringStream := nil;

try

Result := TMemoryStream.Create;

SetLength(MyHeaders, 2);

MyHeaders[0] := TNameValuePair.Create('Authorization', FOpenAIApiKey);

MyHeaders[1] := TNameValuePair.Create('Content-Type', 'application/json');

JObj := TJSONObject.Create;

JObj.AddPair('model', 'tts-1');

JObj.AddPair('input', Input);

JObj.AddPair('voice', Voice);

Request := JObj.ToString;

StringStream := TStringStream.Create(Request, TEncoding.UTF8);

FNetHttpClient.Post('https://api.openai.com/v1/audio/speech',

StringStream, Result, MyHeaders);

finally

JObj.Free;

StringStream.Free;

end;

function TChatGPT.GetGeneratedTextFromSpeech(const InputAudioFilePath: string): string;

begin

FRESTClient.Authenticator := FHTTPBasicAuthenticator;

FRESTRequest.Method := TRESTRequestMethod.rmPOST;

FHTTPBasicAuthenticator.Password := FOpenAIApiKey;

FRESTClient.BaseURL := 'https://api.openai.com/v1/audio/transcriptions';

FRESTRequest.AddParameter('response_format', 'text',

TRESTRequestParameterKind.pkREQUESTBODY);

FRESTRequest.AddParameter('model', 'whisper-1',

TRESTRequestParameterKind.pkREQUESTBODY);

FRESTRequest.AddFile('file', InputAudioFilePath,

TRESTContentType.ctAPPLICATION_OCTET_STREAM);

FRESTRequest.Client := FRESTClient;

FRESTRequest.Execute;

Result := FRESTRequest.Response.Content;

end;

function TChatGPT.GetImageAsStream(const ImageURL: string): TMemoryStream;

begin

Result := TMemoryStream.Create;

FNetHTTPClient.Get(ImageURL, Result);

end;

function TChatGPT.GetImageURLFromJSON(const JsonResponse: string): string;

var

Json: TJsonObject;

DataArr: TJsonArray;

begin

Json := TJsonObject.ParseJSONValue(JsonResponse) as TJsonObject;

try

if Assigned(Json) then

begin

DataArr := TJsonArray(Json.Get('data').JsonValue);

Result := TJSONPair(TJSONObject(DataArr.Items[0]).Get('url')).JsonValue.Value;

end

else

Result := '';

finally

Json.Free;

end;

function TChatGPT.GetImageBASE64FromJSON(const JsonResponse: string): string;

var

Json: TJsonObject;

DataArr: TJsonArray;

begin

Json := TJsonObject.ParseJSONValue(JsonResponse) as TJsonObject;

try

if Assigned(Json) then

begin

DataArr := TJsonArray(Json.Get('data').JsonValue);

Result := TJSONPair(TJSONObject(DataArr.Items[0]).Get('b64_json')).JsonValue.Value;

end

else

Result := '';

finally

Json.Free;

end;

function TChatGPT.GetJSONWithImage(const Prompt: string; ResponseFormat: Integer): string;

var

JObj: TJsonObject;

Request: string;

ResponseContent, StringStream: TStringStream;

MyHeaders: TArray<TNameValuePair>;

begin

JObj := nil;

ResponseContent := nil;

StringStream := nil;

try

SetLength(MyHeaders, 2);

MyHeaders[0] := TNameValuePair.Create('Authorization', FOpenAIApiKey);

MyHeaders[1] := TNameValuePair.Create('Content-Type', 'application/json');

JObj := TJSONObject.Create;

with JObj do

begin

Owned := False;

AddPair('model', 'dall-e-2');

if ResponseFormat = 1 then

AddPair('response_format','b64_json')

else

AddPair('response_format','url');

AddPair('prompt', Prompt);

AddPair('n', TJSONNumber.Create(1));

AddPair('size', '1024x1024');

end;

Request := Jobj.ToString;

StringStream := TStringStream.Create(Request, TEncoding.UTF8);

ResponseContent := TStringStream.Create;

FNetHttpClient.Post('https://api.openai.com/v1/images/generations',

StringStream, ResponseContent, MyHeaders);

Result := ResponseContent.DataString;

finally

JObj.Free;

ResponseContent.Free;

StringStream.Free;

end;

class function TChatGPT.MessageContentFromChatGPT(const JsonAnswer: string): string;

var

Mes: TJsonArray;

JsonResp: TJsonObject;

begin

JsonResp := nil;

try

JsonResp := TJsonObject.ParseJSONValue(JsonAnswer) as TJsonObject;

if Assigned(JsonResp) then

begin

Mes := TJsonArray(JsonResp.Get('choices').JsonValue);

Result := TJsonObject(TJsonObject(Mes.Get(0)).Get('message').JsonValue).

GetValue('content').Value;

end

else

Result := '';

finally

JsonResp.Free;

end;

function TChatGPT.SendTextToChatGPT(const Text: string): string;

var

JArr: TJsonArray;

JObj, JObjOut: TJsonObject;

Request: string;

ResponseContent, StringStream: TStringStream;

Headers: TArray<TNameValuePair>;

I: Integer;

begin

JArr := nil;

JObj := nil;

JObjOut := nil;

ResponseContent := nil;

StringStream := nil;

try

SetLength(Headers, 2);

Headers[0] := TNameValuePair.Create('Authorization', FOpenAIApiKey);

Headers[1] := TNameValuePair.Create('Content-Type', 'application/json');

JObj := TJsonObject.Create;

JObj.Owned := False;

JObj.AddPair('role', 'user');

JArr := TJsonArray.Create;

JArr.AddElement(JObj);

Self.FText := Text;

JObj.AddPair('content', FText);

JObjOut := TJsonObject.Create;

JObjOut.AddPair('model', 'gpt-3.5-turbo');

JObjOut.AddPair('messages', Trim(JArr.ToString));

JObjOut.AddPair('temperature', TJSONNumber.Create(0.7));

Request := JObjOut.ToString.Replace('', '');

for I := 0 to Length(Request) - 1 do

begin

if ((Request[I] = '"') and (Request[I + 1] = '[')) or

((Request[I] = '"') and (Request[I - 1] = ']')) then

begin

Request[I] := ' ';

end;

ResponseContent := TStringStream.Create;

StringStream := TStringStream.Create(Request, TEncoding.UTF8);

FNetHttpClient.Post('https://api.openai.com/v1/chat/completions',

StringStream, ResponseContent, Headers);

Result := FormatJSON(ResponseContent.DataString);

finally

StringStream.Free;

ResponseContent.Free;

JObjOut.Free;

JArr.Free;

JObj.Free;

end;

end.

Implementation of speech generation based on textual description and text extraction from speech in a media file in our Embarcadero Delphi FMX application

In our Delphi FMX application, we will use the TNetHttpClient component to work with the OpenAI API, specifically for sending POST requests to OpenAI.

To play the speech generated by OpenAI and saved in a media file (in MP3 format) in our Embarcadero Delphi FMX application, we will use the TMediaPlayer component.

To make a request to OpenAI with the transfer of a saved media file for extracting text from speech within it, we will use three components: TRESTClient, TRESTRequest, and THTTPBasicAuthenticator.

No additional setup is required for these components. TRESTClient and TRESTRequest are used to make POST requests and retrieve data from OpenAI with the extracted speech text from our media file. THTTPBasicAuthenticator is used for authentication using the secret key.

To input textual descriptions for speech generation, we will use the TMemo component.

We will also use the TMemo component to display the extracted text from the speech in the media file.

In the onCreate method of the main form, we need to assign the path to the media file where our speech generated by OpenAI will be saved to the FAudioFilePath field. We will also assign the value of the secret key to the FOpenAIApiKey field.

The functionality of speech generation based on textual description with saving to a media file and playback in our Embarcadero Delphi FMX application is implemented in the onClick handler of the “Send Request For Speech Generation” button. In this handler, we will declare objects GPTHelper (the IChatGPTHelper type) to pass the textual description to OpenAI for speech generation, and ImageStream (the TMemoryStream class) to store the generated speech as a TMemoryStream.

Next, we will call the constructor of the TChatGPT class, passing such objects as NetHttpClient1 and our secret key (FOpenAIApiKey). Then, we will invoke the GetGeneratedSpeechAsStream method, providing such parameters as the textual description of the generated speech (Memo2.Text) and the voice type (the string ‘onyx’ in our example). To prevent blocking the application interface during the execution of requests, we will use TTask.Run. The result of executing the GetGeneratedSpeechAsStream method, namely the generated speech, is saved into ImageStream.

In the main application thread, using TThread.Synchronize, we will save our speech to an MP3 media file using the ImageStream SaveToFile method. During this process, we will check if a file exists at the specified path using the FileExists function. If the file exists, we need to delete it using the DeleteFile function. After saving, we will play the media file in our Embarcadero Delphi FMX application using TMediaPlayer (MediaPlayer1.Play). To do this, we need to provide the path to our media file (MediaPlayer1.FileName).

The code for the “Send Request For Speech Generation” button handler is provided below.

procedure TForm1.Button1Click(Sender: TObject);
var
  GPTHelper: IChatGPTHelper;
  ImageStream: TMemoryStream;
begin
  TTask.Run(
    procedure
    begin
      GPTHelper := TChatGPT.Create(NetHTTPClient1, 'Bearer ' + FOpenAIApiKey);
      ImageStream := GPTHelper.GetGeneratedSpeechAsStream(Memo2.Text, 'onyx');
      try
        TThread.Synchronize(nil,
          procedure
          begin
            if FileExists(FAudioFilePath) then
            begin
              DeleteFile(FAudioFilePath);
              ImageStream.SaveToFile(FAudioFilePath);
            end
            else
              ImageStream.SaveToFile(FAudioFilePath);
            MediaPlayer1.FileName := FAudioFilePath;
            MediaPlayer1.Play;
            ShowMessage('All is done!!!');
          end);
      finally
        ImageStream.Free;
      end;
    end);
end;

procedure TForm1.Button1Click(Sender: TObject);

var

GPTHelper: IChatGPTHelper;

ImageStream: TMemoryStream;

begin

TTask.Run(

procedure

begin

GPTHelper := TChatGPT.Create(NetHTTPClient1, 'Bearer ' + FOpenAIApiKey);

ImageStream := GPTHelper.GetGeneratedSpeechAsStream(Memo2.Text, 'onyx');

try

TThread.Synchronize(nil,

procedure

begin

if FileExists(FAudioFilePath) then

begin

DeleteFile(FAudioFilePath);

ImageStream.SaveToFile(FAudioFilePath);

end

else

ImageStream.SaveToFile(FAudioFilePath);

MediaPlayer1.FileName := FAudioFilePath;

MediaPlayer1.Play;

ShowMessage('All is done!!!');

end);

finally

ImageStream.Free;

end;

end);

end;

Now let’s extract text from the speech in our saved media file. We will implement this functionality in the onClick handler of the “Speech From Audio File To Text” button. In the handler, we will declare an object GPTHelper (the IChatGPTHelper) type to pass our media file to OpenAI for text extraction. We will also declare a string variable Text, where we will store the extracted text from the media file.

Next, we should call the second variant of the constructor with four input parameters (HTTPBasicAuthenticator1, RESTClient1, RESTRequest1, FOpenAIApiKey). Then, we will invoke the GetGeneratedTextFromSpeech method, passing the path to our media file. This method will return the extracted text from the speech in the media file. Finally, we will display the received text using TMemo (Memo1.Text).

The code for the “Speech From Audio File To Text” button handler is provided below.

procedure TForm1.Button4Click(Sender: TObject);
var
  GPTHelper: IChatGPTHelper;
  Text: string;
begin
  TTask.Run(
    procedure
    begin
      GPTHelper := TChatGPT.Create(HTTPBasicAuthenticator1,
        RESTClient1, RESTRequest1, FOpenAIApiKey);
      Text := GPTHelper.GetGeneratedTextFromSpeech(FAudioFilePath);
      TThread.Synchronize(nil,
        procedure
        begin
          Memo1.Text := Text;
          ShowMessage('All is done!!!');
        end);
    end);
end;

procedure TForm1.Button4Click(Sender: TObject);

var

GPTHelper: IChatGPTHelper;

Text: string;

begin

TTask.Run(

procedure

begin

GPTHelper := TChatGPT.Create(HTTPBasicAuthenticator1,

RESTClient1, RESTRequest1, FOpenAIApiKey);

Text := GPTHelper.GetGeneratedTextFromSpeech(FAudioFilePath);

TThread.Synchronize(nil,

procedure

begin

Memo1.Text := Text;

ShowMessage('All is done!!!');

end);

end;

You also need to add the following code to the FormCreate event – you need to replace the key string with your own OpenAPI key:

procedure TForm1.FormCreate(Sender: TObject);
begin
  FAudioFilePath := TPath.Combine(TPath.GetDocumentsPath, 'GeneratedVoice.mp3');
  FOpenAIApiKey := '**** YOUR OPEN API KEY GOES HERE ****';
end;

procedure TForm1.FormCreate(Sender: TObject);

begin

FAudioFilePath := TPath.Combine(TPath.GetDocumentsPath, 'GeneratedVoice.mp3');

FOpenAIApiKey := '**** YOUR OPEN API KEY GOES HERE ****';

end;

Let’s test our Embarcadero Delphi FMX application. First, based on a textual description, we will generate speech. The speech will be played using TMediaPlayer and saved to a media file with an mp3 extension.

In our Embarcadero Delphi FMX application, the media file will be saved in the “Documents” directory.

Now, using our application, let’s convert our speech saved in the media file back into text.

Where can I download the example code?

The code is in this repository: https://github.com/Embarcadero/OpenAI_Audio_Demo

Do you want to try some of these examples for yourself? Why not download a free trial of the latest version of RAD Studio with Delphi?

This article was written by Embarcadero Tech Partner Softacom. Softacom specialize in all sorts of software development focused on Delphi. Read more about their services on the Softacom website.

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder.
Design. Code. Compile. Deploy.
Start Free Trial Upgrade Today

Free Delphi Community Edition Free C++Builder Community Edition

10 Comments


BY

May 28, 2024 at 6:23 pm

I’ve got a compile error as below:

[dcc64 Error] ChatGPTHelper.pas(121): E2003 Undeclared identifier: ‘LoadFromStream’

I am using Embarcadero® RAD Studio 12 Version 29.0.51961.7529

Any advice, please?
- Reply
  
  Ian Barker
  
  May 29, 2024 at 7:23 am
  
  Hi there – I will ask the blog post author – Softacom – to reply with any advice they have.
  - Reply
    
    Ian Barker
    
    June 3, 2024 at 8:55 am
    
    OK the post has been updated to correct the problem. Softacom will also email you the code directly.

BY

June 6, 2024 at 5:06 pm

Thank you Ian and Softacom!

BY

June 6, 2024 at 5:20 pm

FYI, The updated part (the full source code of TChatGPT class) is displayed in one line so it is not easy to look through.
- Reply
  
  Ian Barker
  
  June 7, 2024 at 8:31 am
  
  Hi BY – thanks for pointing that out. I’ve updated the post to use our regular syntax highlighted block type so the code will be a lot less problematic now.
  - Reply
    
    NeWspi
    
    June 8, 2024 at 4:15 pm
    
    Hi!,
    
    Where is the update? i copy the code and get the same error… really i have the same problem in the post about Firebase (https://blogs.embarcadero.com/how-to-use-the-firebase-api-to-add-read-and-delete-data-in-a-realtime-document-oriented-database/)) but i can’t found the solution.
    
    Thanks in advance
    - Reply
      
      Ian Barker
      
      June 23, 2024 at 12:13 pm
      
      Sorry Jordi, I have fixed it now after Softacom sent me a new version. I’ve updated the article and also published the source code (tested with RAD Studio 12) here: https://github.com/Embarcadero/OpenAI_Audio_Demo
  - Reply
    
    Jordi
    
    June 8, 2024 at 4:33 pm
    
    Hi Ian,
    
    Where is the solution? i copy all the code and have the same error, in the post about firebase i have the same error.
    
    Any solution, please?
    
    Thanks in advance
    - Reply
      
      Ian Barker
      
      June 23, 2024 at 12:12 pm
      
      Yes, sorry about that, a few gremlins sneaked in. I’ve updated the article and also published the source code (tested with RAD Studio 12) here: https://github.com/Embarcadero/OpenAI_Audio_Demo

How To Use OpenAI To Add Realistic Text-To-Speech To Your Apps

Overview Of How To Use OpenAI To Add Realistic Text-To-Speech To Your Apps

Where can I download the example code?

10 Comments

Leave a ReplyCancel reply

Search

Something Fresh

Embarcadero InterBase 2020 Update 7 Released!

Introducing The Brand New Konopka Signature VCL Controls 8

RAD Studio 13 - Florence or Syracuse? We Have The Results!

Popular Posts

RAD Studio 12.2 Athens Inline Patch 1 Available

Announcing the Availability of RAD Studio 12.2 Athens

New in 12.3: Scripts for Migration from InterBase Express to FireDAC

AI-Powered Smart CodeInsight in RAD Studio 12.3

Announcing the Availability of RAD Studio 12.3 Athens

RAD Studio 12.2 Athens Inline Patch 1 Available

Announcing the Availability of RAD Studio 12.2 Athens

New in 12.3: Scripts for Migration from InterBase Express to FireDAC

AI-Powered Smart CodeInsight in RAD Studio 12.3

Categories

Popular From News

New in 10.3.2: C++17 for Win64 - target all Windows with the C++17 Clang compiler

Delphi 12 And C++Builder 12 Community Editions Released!

We've Updated The HUGE Delphi Anniversary “Innovation Timeline” Infographic. Grab it Now!

Embarcadero InterBase 2020 Update 6 Released!

3 x 12 VCL Enhancements in Delphi 12

C++Builder @ stackoverflow

Delphi @ stackoverflow

InterBase @ stackoverflow

Categories

Useful Links

Follow us

How To Use OpenAI To Add Realistic Text-To-Speech To Your Apps

Overview Of How To Use OpenAI To Add Realistic Text-To-Speech To Your Apps

Where can I download the example code?

10 Comments

Leave a ReplyCancel reply

Join Our Global Developer Community

Search

Something Fresh

Popular Posts

Categories

Popular From News

Categories

Useful Links

Follow us