With Tiburon, I can use Unicode characters with VCL components like TMemo, TListBox, TComboBox (and others that contain string lists). How can I load the strings from a file and save the strings to a file? How do I need to modify any existing Delphi and C++Builder programs to handle Unicode characters for these components? Here is the answer.
There is a new, optional, parameter for the LoadFromFile and SaveToFile methods. The optional parameter is named "Encoding" and its type is class type "TEncoding". TEncoding (defined in the SysUtils unit) contains several class properties that you can use to specify the type of strings you want to load and/or save: ASCII, BigEndianUnicode, Default, Unicode, UTF7, UTF8.
The following are the declarations for LoadFromFile and SaveToFile methods for components that contain TStrings (defined in the Classes unit)
Delphi:
procedure TStrings.LoadFromFile(const FileName: string);
procedure TStrings.LoadFromFile(const FileName: string; Encoding: TEncoding);
procedure TStrings.SaveToFile(const FileName: string);
procedure TStrings.SaveToFile(const FileName: string; Encoding: TEncoding);
C++Builder:
virtual void __fastcall LoadFromFile(const System::UnicodeString FileName)/* overload */;
virtual void __fastcall LoadFromFile(const System::UnicodeString FileName, Sysutils::TEncoding* Encoding)/* overload */;
virtual void __fastcall SaveToFile(const System::UnicodeString FileName)/* overload */;
virtual void __fastcall SaveToFile(const System::UnicodeString FileName, Sysutils::TEncoding* Encoding)/* overload */;
Looking at the Delphi implementation for SaveToFile shows the use of TStream and the encoding I provide:
procedure TStrings.SaveToFile(const FileName: string);
begin
SaveToFile(FileName, nil);
end;
procedure TStrings.SaveToFile(const FileName: string; Encoding: TEncoding);
var
Stream: TStream;
begin
Stream := TFileStream.Create(FileName, fmCreate);
try
SaveToStream(Stream, Encoding);
finally
Stream.Free;
end;
end;
The following examples show how to load and save the strings with a ListBox VCL component on your form:
Delphi:
Listbox1.Items.LoadFromFile(’c:\temp\MyListBoxItems.txt’,TEncoding.UTF8)
ListBox1.Items.SaveToFile(’MyListBoxItems.txt’,TEncoding.UTF8);
C++Builder:
ListBox1->Items->LoadFromFile("c:\\temp\\MyListBoxItems.txt", TEncoding::UTF8);
ListBox1->Items->SaveToFile("c:\\temp\\MyListBoxItems.txt",TEncoding::UTF8);
Here is a screen shot of my example Delphi application:
Here are links to the Delphi and C++Builder versions of the application: delphihelloworld_660.zip cpphelloworld_661.zip
With Tiburon, now my Delphi and C++ demo applications can handle Unicode characters in list boxes, edit boxes, and labels, and I can also save and load the Unicode strings to/from my hard drive.

{ 16 } Comments
And what exactly does a NIL encoding mean as applied to a Unicode string?
>TEncoding … BigEndianUnicode, …, Unicode
Couldn’t you just call the encodings what they are, i.e. UTF16BE and UTF16LE?
Is it possible to save a Unicode stringlist in a certain ANSI codepage? I.e.:
ListBox1.Items.SaveToFile(’MyListBoxItems1251.txt’, TEncoding.CP1251);
ListBox1.Items.SaveToFile(’MyListBoxItems1250.txt’, TEncoding.CP1250);
or may be
ListBox1.Items.SaveToFile(’MyListBoxItems.txt’, TEncoding.ANSI, 1251);
// type of encoding + number of codepage
I strongly agree with Aleksander. "Unicode" is absolutely _not_ an encoding - it is an abstract concept representing a character set. Google for the phrase "Unicode is not an encoding" for numerous explanations of why this is fundamentally wrong.
Aleksander’s suggested names (UTF16BE and UTF16LE) would be correct. If the term Unicode were to be applied to any encoding (which it shouldn’t!), the most appropriate would be UTF-32, being the only one that can represent the full character set without variable length encodings.
Oh yes, and there is a bug in the attached sample
ListBox1.Items.SaveToFile(’MyListBoxItems.txt’,TEncoding.UTF8);
should read
ListBox1.Items.SaveToFile(’c:\temp\MyListBoxItems.txt’,TEncoding.UTF8);
well. I think no one needs this info, but well.
procedure TForm34.ListBox1Click(Sender: TObject);
begin
Label1.Caption := ListBox1.Items.Strings[ListBox1.ItemIndex];
end;
should be
procedure TForm34.ListBox1Click(Sender: TObject);
begin
if ListBox1.ItemIndex > -1 then
Label1.Caption := ListBox1.Items.Strings[ListBox1.ItemIndex];
end;
Thanks for the post.
I think that Unicode and BigEndianUnicode aren’t good encoding names. The Unicode standard makes a clear distinction between the set of characters and its encoding. I guess what you want is UTF16LE and UTF16BE as encoding-classes or more verbose names if you prefer. But please don’t use Unicode as an encoding name. MS did it in the past when UTF-16 was equal to UCS2 and considered to be the only necessary encoding (Windows NT).
Please, pretty please, change that. In projects where I worked and such naming was done it created confusion amongst developers and led to bugs that could have been avoided if the name was clear and didn’t mix up concepts.
Here is the SaveToStream implementation to answer some of the questions above about what "nil" does, and what the encodings do for streaming out and in Strings:
procedure TStrings.SaveToStream(Stream: TStream; Encoding: TEncoding);
var
Buffer, Preamble: TBytes;
begin
if Encoding = nil then
Encoding := TEncoding.Default;
Buffer := Encoding.GetBytes(GetTextStr);
Preamble := Encoding.GetPreamble;
if Length(Preamble) > 0 then
Stream.WriteBuffer(Preamble[0], Length(Preamble));
Stream.WriteBuffer(Buffer[0], Length(Buffer));
end;
Note: *byte-char* based strings can have an affinity to a given codepage. For UTF8String, it is UTF8String = type AnsiString(65001); Assigning a UnicodeString to a UTF8String will perform an automatic conversion. The reverse is also true.
The <code> value is whatever the underlying OS supports.
Glad everyone is commenting and catching my typos and inbetween versions of these sample demos. I will fix them.
I should have added that "Default" encoding = user’s active code page.
For more on Tiburon "String Theory" check out Allen Bauer’s recent blog post at
http://blogs.codegear.com/abauer/2008/07/16/38864/
The new TEncoding class is modeled after .NET’s System.Text.Encoding class. That is where the "Unicode" and "BigEndianUnicode" property names come from.
As for loading/saving in a specific codepage, TEncoding has support for that as well, similar to .NET:
var
Enc: TEncoding;
begin
Enc := TEncoding.GetEncoding(1251);
try
ListBox1.Items.SaveToFile(’c:\temp\MyListBoxItems.txt’, Enc);
finally
Enc.Free;
end;
end;
"That is where the "Unicode" and "BigEndianUnicode" property names come from."
I would have thought that people that wanted Unicode so badly that they were using .NET already would not be that interested in knowing that badly chosen names from .NET were lovingly preserved in a Win32 implementation.
A bad name is a bad name. A bad excuse for using a bad name doesn’t make it a good name.
This is just yet more hamstringing/pollution of the Win32 implementation in the name of (psst) Delphi.NET compatability. Y’know, the thing that someone at CodeGear recently said was "over" (reading between the lines - "had been a mistake from the get go").
Same ol’, same ol’.
New name over the door, same old crap coming through it.
Oh, and thanks (DavidI) for clarifying the NIL encoding Q.
Thanks Remy for clarifying the origin of the erroneous "Unicode" and "BigEndianUnicode" property names. Please, though, take note of the feedback in these comments. It would be no credit to CodeGear/Embarcadero to make a mistake simply because Microsoft made it once already.
Creat that unicode is comming!
But what happens with the filename?
Here only a "string" type is defined and not "Widestring" to support unicode also for the path and filename itself.
filename is also unicode. Because, string is now (Delphi 2009) UnicodeString (Before was AnsiString).
{ 3 } Trackbacks
[...] Tiburón - String Theory Here Comes Tiburon Don’t Get Caught with Boxes Tiburon’s LoadFromFile and SaveToFile for Unicode characters [...]
[...] Tiburon’s LoadFromFile and SaveToFile for Unicode characters http://blogs.codegear.com/davidi/2008/07/15/38898 [...]
[...] David I blog post: LoadFromFile and SaveToFile for Unicode Characters [...]
Post a Comment