Skip to content

Tiburon’s LoadFromFile and SaveToFile for Unicode characters

With Tiburon, I can use Unicode characters with VCL components like TMemo, TListBox, TComboBox (and others that contain string lists).  How can I  load the strings from a file and save the strings to a file? How do I need to modify any existing Delphi and C++Builder programs to handle Unicode characters for these components?  Here is the answer.

There is a new, optional, parameter for the LoadFromFile and SaveToFile methods. The optional parameter is named "Encoding" and its type is class type "TEncoding". TEncoding (defined in the SysUtils unit) contains several class properties that you can use to specify the type of strings you want to load and/or save:  ASCII, BigEndianUnicode, Default, Unicode, UTF7, UTF8.

The following are the declarations for LoadFromFile and SaveToFile methods for components that contain TStrings (defined in the Classes unit)

Delphi:
  procedure TStrings.LoadFromFile(const FileName: string);
  procedure TStrings.LoadFromFile(const FileName: string; Encoding: TEncoding);
  procedure TStrings.SaveToFile(const FileName: string);
  procedure TStrings.SaveToFile(const FileName: string; Encoding: TEncoding);

C++Builder:
  virtual void __fastcall LoadFromFile(const System::UnicodeString FileName)/* overload */;
  virtual void __fastcall LoadFromFile(const System::UnicodeString FileName, Sysutils::TEncoding* Encoding)/* overload */;
  virtual void __fastcall SaveToFile(const System::UnicodeString FileName)/* overload */;
  virtual void __fastcall SaveToFile(const System::UnicodeString FileName, Sysutils::TEncoding* Encoding)/* overload */;

Looking at the Delphi implementation for SaveToFile shows the use of TStream and the encoding I provide:

procedure TStrings.SaveToFile(const FileName: string);
begin
  SaveToFile(FileName, nil);
end;

procedure TStrings.SaveToFile(const FileName: string; Encoding: TEncoding);
var
  Stream: TStream;
begin
  Stream := TFileStream.Create(FileName, fmCreate);
  try
    SaveToStream(Stream, Encoding);
  finally
    Stream.Free;
  end;
end;

The following examples show how to load and save the strings with a ListBox VCL component on your form:

Delphi:
  Listbox1.Items.LoadFromFile(’c:\temp\MyListBoxItems.txt’,TEncoding.UTF8)
  ListBox1.Items.SaveToFile(’MyListBoxItems.txt’,TEncoding.UTF8);

C++Builder:
  ListBox1->Items->LoadFromFile("c:\\temp\\MyListBoxItems.txt", TEncoding::UTF8);
  ListBox1->Items->SaveToFile("c:\\temp\\MyListBoxItems.txt",TEncoding::UTF8);

Here is a screen shot of my example Delphi application:

delphihelloworld_658.jpg 

Here are links to the Delphi and C++Builder versions of the application:  delphihelloworld_660.zip  cpphelloworld_661.zip

With Tiburon, now my Delphi and C++ demo applications can handle Unicode characters in list boxes, edit boxes, and labels,  and I can also save and load the Unicode strings to/from my hard drive.

{ 16 } Comments

  1. Jolyon Smith | July 15, 2008 at 8:59 pm | Permalink

    And what exactly does a NIL encoding mean as applied to a Unicode string?

  2. Aleksander Oven | July 15, 2008 at 9:22 pm | Permalink

    >TEncoding … BigEndianUnicode, …, Unicode

    Couldn’t you just call the encodings what they are, i.e. UTF16BE and UTF16LE?

  3. Kryvich | July 15, 2008 at 11:17 pm | Permalink

    Is it possible to save a Unicode stringlist in a certain ANSI codepage? I.e.:

    ListBox1.Items.SaveToFile(’MyListBoxItems1251.txt’, TEncoding.CP1251);
    ListBox1.Items.SaveToFile(’MyListBoxItems1250.txt’, TEncoding.CP1250);
    or may be
    ListBox1.Items.SaveToFile(’MyListBoxItems.txt’, TEncoding.ANSI, 1251);
    // type of encoding + number of codepage

  4. Mike Dillamore | July 16, 2008 at 4:29 am | Permalink

    I strongly agree with Aleksander. "Unicode" is absolutely _not_ an encoding - it is an abstract concept representing a character set. Google for the phrase "Unicode is not an encoding" for numerous explanations of why this is fundamentally wrong.

    Aleksander’s suggested names (UTF16BE and UTF16LE) would be correct. If the term Unicode were to be applied to any encoding (which it shouldn’t!), the most appropriate would be UTF-32, being the only one that can represent the full character set without variable length encodings.

  5. Dennis | July 16, 2008 at 6:58 am | Permalink

    Oh yes, and there is a bug in the attached sample

    ListBox1.Items.SaveToFile(’MyListBoxItems.txt’,TEncoding.UTF8);

    should read

    ListBox1.Items.SaveToFile(’c:\temp\MyListBoxItems.txt’,TEncoding.UTF8);

    well. I think no one needs this info, but well.

  6. Bernhard Geyer | July 16, 2008 at 7:37 am | Permalink

    procedure TForm34.ListBox1Click(Sender: TObject);
    begin
    Label1.Caption := ListBox1.Items.Strings[ListBox1.ItemIndex];
    end;

    should be

    procedure TForm34.ListBox1Click(Sender: TObject);
    begin
    if ListBox1.ItemIndex > -1 then
    Label1.Caption := ListBox1.Items.Strings[ListBox1.ItemIndex];
    end;

  7. Maël Hörz | July 16, 2008 at 8:36 am | Permalink

    Thanks for the post.

    I think that Unicode and BigEndianUnicode aren’t good encoding names. The Unicode standard makes a clear distinction between the set of characters and its encoding. I guess what you want is UTF16LE and UTF16BE as encoding-classes or more verbose names if you prefer. But please don’t use Unicode as an encoding name. MS did it in the past when UTF-16 was equal to UCS2 and considered to be the only necessary encoding (Windows NT).

    Please, pretty please, change that. In projects where I worked and such naming was done it created confusion amongst developers and led to bugs that could have been avoided if the name was clear and didn’t mix up concepts.

  8. davidi | July 16, 2008 at 9:46 am | Permalink

    Here is the SaveToStream implementation to answer some of the questions above about what "nil" does, and what the encodings do for streaming out and in Strings:
    procedure TStrings.SaveToStream(Stream: TStream; Encoding: TEncoding);
    var
      Buffer, Preamble: TBytes;
    begin
      if Encoding = nil then
        Encoding := TEncoding.Default;
      Buffer := Encoding.GetBytes(GetTextStr);
      Preamble := Encoding.GetPreamble;
      if Length(Preamble) > 0 then
        Stream.WriteBuffer(Preamble[0], Length(Preamble));
      Stream.WriteBuffer(Buffer[0], Length(Buffer));
    end;
    Note: *byte-char* based strings can have an affinity to a given codepage. For UTF8String, it is UTF8String = type AnsiString(65001); Assigning a UnicodeString to a UTF8String will perform an automatic conversion. The reverse is also true.
    The <code> value is whatever the underlying OS supports.
    Glad everyone is commenting and catching my typos and inbetween versions of these sample demos. I will fix them.

  9. davidi | July 16, 2008 at 10:27 am | Permalink

    I should have added that "Default" encoding = user’s active code page.

  10. davidi | July 16, 2008 at 11:31 am | Permalink

    For more on Tiburon "String Theory" check out Allen Bauer’s recent blog post at
    http://blogs.codegear.com/abauer/2008/07/16/38864/

  11. Remy Lebeau (TeamB) | July 16, 2008 at 12:15 pm | Permalink

    The new TEncoding class is modeled after .NET’s System.Text.Encoding class. That is where the "Unicode" and "BigEndianUnicode" property names come from.

    As for loading/saving in a specific codepage, TEncoding has support for that as well, similar to .NET:

    var
    Enc: TEncoding;
    begin
    Enc := TEncoding.GetEncoding(1251);
    try
    ListBox1.Items.SaveToFile(’c:\temp\MyListBoxItems.txt’, Enc);
    finally
    Enc.Free;
    end;
    end;

  12. Jolyon Smith | July 16, 2008 at 2:43 pm | Permalink

    "That is where the "Unicode" and "BigEndianUnicode" property names come from."

    I would have thought that people that wanted Unicode so badly that they were using .NET already would not be that interested in knowing that badly chosen names from .NET were lovingly preserved in a Win32 implementation.

    A bad name is a bad name. A bad excuse for using a bad name doesn’t make it a good name.

    This is just yet more hamstringing/pollution of the Win32 implementation in the name of (psst) Delphi.NET compatability. Y’know, the thing that someone at CodeGear recently said was "over" (reading between the lines - "had been a mistake from the get go").

    Same ol’, same ol’.

    New name over the door, same old crap coming through it.

  13. Jolyon Smith | July 16, 2008 at 2:43 pm | Permalink

    Oh, and thanks (DavidI) for clarifying the NIL encoding Q.

  14. Mike Dillamore | July 17, 2008 at 3:08 am | Permalink

    Thanks Remy for clarifying the origin of the erroneous "Unicode" and "BigEndianUnicode" property names. Please, though, take note of the feedback in these comments. It would be no credit to CodeGear/Embarcadero to make a mistake simply because Microsoft made it once already.

  15. Arne Hartmann | July 28, 2008 at 12:34 pm | Permalink

    Creat that unicode is comming!
    But what happens with the filename?
    Here only a "string" type is defined and not "Widestring" to support unicode also for the path and filename itself.

  16. Samir | August 25, 2008 at 4:44 am | Permalink

    filename is also unicode. Because, string is now (Delphi 2009) UnicodeString (Before was AnsiString).

{ 3 } Trackbacks

  1. Seppy Bloom » Tiburón Preview | July 18, 2008 at 12:09 pm | Permalink

    [...] Tiburón - String Theory  Here Comes Tiburon Don’t Get Caught with Boxes Tiburon’s LoadFromFile and SaveToFile for Unicode characters [...]

  2. [...] Tiburon’s LoadFromFile and SaveToFile for Unicode characters   http://blogs.codegear.com/davidi/2008/07/15/38898 [...]

  3. [...] David I blog post: LoadFromFile and SaveToFile for Unicode Characters [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

Bad Behavior has blocked 4 access attempts in the last 7 days.

Close