Site icon Embarcadero RAD Studio, Delphi, & C++Builder Blogs

How To Migrate Legacy C++ Apps To Unicode

how to migrate legacy c apps to unicode

This brief article focuses on helping you migrate your legacy C++ IDE applications to Unicode. Based on information from the Embarcadero consultants here you will find tips, tricks, and techniques to migrate legacy C++ Builder apps to a newer version.

We will look at working with Unicode in C++ Builder. For example you may know for the string datatype C++ Builder offers several choices. Your code can utilize C-style characters and strings or C++ string objects for VCL string objects and each of these has its own set of Unicode variations. Moreover, the Windows API provides both ANSI and Unicode variants of various functions and methods.

We will look at how to use the _TCHAR maps to option that determines whether or not the Unicode preprocessor macro is defined. Then that determines whether you get the ANSI variant or the Wide string UTF-16 variant.

Furthermore, we will look at the standard Windows header tchar that includes macros designed to let you write code that compiles as either ANSI or Unicode.

Then finally, we will give examples for loading and saving Unicode characters to file.

What do we need to know about Unicode in C++ Builder?

Unicode in C++ Builder 2009 has full Unicode support throughout both the VCL and the runtime library. The Unicode is critical for internationalization and localization

C++ is a diverse language permitting the use of many libraries and several programming paradigms and, while your code uses the VCL does not need to be Unicode-aware. So, this means you might not need to convert all your code to a newer version. You can use your C runtime library, STL, or the Windows API, but when you need to pass data to or from VCL you can convert to Unicode.

When is complete migration to Unicode necessary?

By completely migrating to Unicode you can gain the full benefits of internationalization but, migrating only the VCL portion of your code can also simplify the task of upgrading. 

To completely convert to Unicode, we need to explore several techniques.

How do we update C++ Builder projects from previous versions?

There are 2 options to start:

  1. Open your project files in C++ Builder
    1. IDE updates your project file
    2. Rebuild it
    3. Check the warnings and errors to fix
  2. Do not let C++ Builder convert to a newer version
    1. Copy your files to a new folder 
    2. Create a new project 
    3. Now add your source files to the project

The second option is much more efficient but takes time.

What you need to know about Unicode in VCL

Unicode in VCL we see that the VCL offers these 6 string classes which support ANSI UTF-8 and UTF16 encoding.

Note: code page is a character set, which can include numbers, punctuation marks, and other glyphs.

Using RawByteString can give several advantages since each code page is a separate, compile-time type. RawByteString lets you write a single routine that can handle any code page. It removes any VCL overhead of doing code page conversions itself, and it prevents possible loss of data from automatically converting text data into encodings that can’t represent some characters.

The good news with all of this is that most member functions of these new string classes operate just the same as they did for the old non-Unicode string classes.

AnsiStringT<CodePage> template example:

const wchar_t* data = L"Что ты говоришь?";
{
  // illustrates handling of Cyrillic data from 
  // CP 1251 -> CP_UTF8 -> UnicodeString without loss
  AnsiStringT<1251> cp1251Str = data;
  assert(cp1251Str.CodePage() == 1251);

  UTF8String utf8Str = cp1251Str;
  assert(utf8Str.CodePage() == CP_UTF8);

  UnicodeString us1 = cp1251Str;
  UnicodeString us2(utf8Str);

  assert(us1 == us2);
  assert(us1 == data);
  assert(us2 == data);
}

Is there anything I need to know about Unicode and the Windows API?

Windows API includes both Unicode and ANSI variants.

#if defined(UNICODE)
  ::MessageBox(Handle, Edit1->Text.c_str(), // wchar_t*
          _T("MessageBox - text from Edit1->Text.c_str()"), MB_OK);
#else
  ::MessageBox(Handle, AnsiString(Edit1->Text).c_str(),
          AnsiString("MessageBox").c_str(), MB_OK);
#endif

What new C and C++ data types are there for C-style strings?

Why should I use TCHAR.H?

If you are planning on a complete Unicode migration as part of an upgrade to C++ Builder’s latest versions, character width agnostic code lets you prepare for the migration in your previous version of C++ Builder without breaking compilation. 

Windows provides us with this TCHAR header file to help with this. Depending on whether the underscore Unicode preprocessor macro is defined, this TCHAR header defines the following macros for Unicode builds.

How do I replace a substring with AnsiString?

Here is one of the tips that talks about should you just go ahead and replace all occurrences of a string with an AnsiString. It is better to not just replace all occurrences of a string with AnsiString but instead use these two functions. You can use them whenever one type is returned and, the other type is required and vice-versa. 

wchar_t* __fastcall UnicodeOf(const char* C)
{
  static wchar_t W[STR_CONV_BUF_SIZE];
  memset(w, 0, sizeof(W));
  MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, c, strlen(c), W, STR_CONV_BUF_SIZE);
  return(W);
}

char* __fastcall AnsiOf(wchar_t* W)
{
  static char c[STR_CONV_BUF_SIZE];
  memset(c, 0, sizeof(c));
  WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, w, wcslen(w), c, STR_CONV_BUF_SIZE, NULL, NULL);
  return(c);
}

With these two functions, we’re using the two Windows API functions once called the wide char into multibyte that converts from UTF-16 to the encoding of your choices. For instance, UTF-8 or any of the various ANSI encodings.

How to load and save Unicode characters in files?

LoadFromFile and SaveToFile methods now have an additional parameter which is the TEncoding class. TEncoding class has a static property where you can, specify the encoding either ASCII or UTF-8 or Unicode or others.

ListBox1->Items->SaveToFile("c:tempmyText.txt", TEncoding::UTF8);

if (FileExists("c:tempmyText.txt")) {
  ListBox1->Items->LoadFromFile("c:tempmyText.txt", TEncoding::UTF8);
} else {
  ShowMessage("c:tempmyText.txt does not exist!");
}

Is there a video which shows how to migrate C++ apps to use Unicode?

You can learn more about migrating legacy C++ Builder projects to a newer version from this webinar:

Where can I learn more about how to migrate my C++ apps to Unicode?

We have only very briefly scratched the surface of Unicode conversion in this article. For much more comprehensive advice you can head over, explore, and learn about migrating to Unicode at the Migration and Upgrade Center.

Check out everything you need to know about String on C++ in this blog.

Exit mobile version