Have an amazing solution built in RAD Studio? Let us know. Looking for discounts? Visit our Special Offers page!

How To Migrate Legacy C++ Apps To Unicode

how to migrate legacy c apps to unicode

This brief article focuses on helping you migrate your legacy C++ IDE applications to Unicode. Based on information from the Embarcadero consultants here you will find tips, tricks, and techniques to migrate legacy C++ Builder apps to a newer version.

We will look at working with Unicode in C++ Builder. For example you may know for the string datatype C++ Builder offers several choices. Your code can utilize C-style characters and strings or C++ string objects for VCL string objects and each of these has its own set of Unicode variations. Moreover, the Windows API provides both ANSI and Unicode variants of various functions and methods.

  • New C and C++ data types for C-style strings
  • New Unicode VCL string classes

We will look at how to use the _TCHAR maps to option that determines whether or not the Unicode preprocessor macro is defined. Then that determines whether you get the ANSI variant or the Wide string UTF-16 variant.

Furthermore, we will look at the standard Windows header tchar that includes macros designed to let you write code that compiles as either ANSI or Unicode.

Then finally, we will give examples for loading and saving Unicode characters to file.

What do we need to know about Unicode in C++ Builder?

Unicode in C++ Builder 2009 has full Unicode support throughout both the VCL and the runtime library. The Unicode is critical for internationalization and localization

C++ is a diverse language permitting the use of many libraries and several programming paradigms and, while your code uses the VCL does not need to be Unicode-aware. So, this means you might not need to convert all your code to a newer version. You can use your C runtime library, STL, or the Windows API, but when you need to pass data to or from VCL you can convert to Unicode.

When is complete migration to Unicode necessary?

By completely migrating to Unicode you can gain the full benefits of internationalization but, migrating only the VCL portion of your code can also simplify the task of upgrading. 

To completely convert to Unicode, we need to explore several techniques.

How do we update C++ Builder projects from previous versions?

There are 2 options to start:

  1. Open your project files in C++ Builder
    1. IDE updates your project file
    2. Rebuild it
    3. Check the warnings and errors to fix
  2. Do not let C++ Builder convert to a newer version
    1. Copy your files to a new folder 
    2. Create a new project 
    3. Now add your source files to the project

The second option is much more efficient but takes time.

What you need to know about Unicode in VCL

Unicode in VCL we see that the VCL offers these 6 string classes which support ANSI UTF-8 and UTF16 encoding.

Note: code page is a character set, which can include numbers, punctuation marks, and other glyphs.

  • AnsiString – corresponds to the old string class that contains 8-bit char data in the system
  • UnicodeString – this is a new class. It contains 16-bit wide char_t data in the UTF-16 encoding
  • WideString – this class exists from previous versions of RAD Studio and it corresponds to BSTR (Basic string or binary string) data type that is used by COM, Automation, and Interop functions. It contains a 16-bit Wide char_t data just like the Unicode string
  • AnsiStringT – is a class template that contains 8-bit char data encoded in any code page.
  • UTF8String – this is an AnsiStringT instantiation using the UTF-8 encoding
  • RawByteString – it contains 8-bit char data on an unspecified code page.

Using RawByteString can give several advantages since each code page is a separate, compile-time type. RawByteString lets you write a single routine that can handle any code page. It removes any VCL overhead of doing code page conversions itself, and it prevents possible loss of data from automatically converting text data into encodings that can’t represent some characters.

The good news with all of this is that most member functions of these new string classes operate just the same as they did for the old non-Unicode string classes.

AnsiStringT<CodePage> template example:

  • We pass Cyrillic data from CodePage 1251 to 65001 and back to Unicode without data loss

Is there anything I need to know about Unicode and the Windows API?

Windows API includes both Unicode and ANSI variants.

  • MessageBoxA – takes ANSI strings
  • MessageBoxW – takes wide (UTF-16) strings

What new C and C++ data types are there for C-style strings?

  • char16_t (example u”Hello, World! u263A”) -> UTF-16
    • identical semantics to wchar_t
  • char32_t (example, U”Hello World! u263A”) ->UTF-32
  • C++ has AnsiStringT codepage template which you can create yourn own AnsiString types

Why should I use TCHAR.H?

If you are planning on a complete Unicode migration as part of an upgrade to C++ Builder’s latest versions, character width agnostic code lets you prepare for the migration in your previous version of C++ Builder without breaking compilation. 

Windows provides us with this TCHAR header file to help with this. Depending on whether the underscore Unicode preprocessor macro is defined, this TCHAR header defines the following macros for Unicode builds.

  • TCHAR defined as
    • char for non-Unicode builds
    • wchar_t for Unicode builds
  • _T, which is removed by the preprocessor for non-Unicode builds and is defined as L for Unicode builds
    • _T(“Hello, world”) and the preprocessor converts it to a char literal
    • wchar_t literal (L”Hello, world”) as appropriate

How do I replace a substring with AnsiString?

Here is one of the tips that talks about should you just go ahead and replace all occurrences of a string with an AnsiString. It is better to not just replace all occurrences of a string with AnsiString but instead use these two functions. You can use them whenever one type is returned and, the other type is required and vice-versa. 

With these two functions, we’re using the two Windows API functions once called the wide char into multibyte that converts from UTF-16 to the encoding of your choices. For instance, UTF-8 or any of the various ANSI encodings.

How to load and save Unicode characters in files?

LoadFromFile and SaveToFile methods now have an additional parameter which is the TEncoding class. TEncoding class has a static property where you can, specify the encoding either ASCII or UTF-8 or Unicode or others.

Is there a video which shows how to migrate C++ apps to use Unicode?

You can learn more about migrating legacy C++ Builder projects to a newer version from this webinar:

Where can I learn more about how to migrate my C++ apps to Unicode?

We have only very briefly scratched the surface of Unicode conversion in this article. For much more comprehensive advice you can head over, explore, and learn about migrating to Unicode at the Migration and Upgrade Center.

Check out everything you need to know about String on C++ in this blog.

Reduce development time and get to market faster with RAD Studio, Delphi, or C++Builder.
Design. Code. Compile. Deploy.
Start Free Trial   Upgrade Today

   Free Delphi Community Edition   Free C++Builder Community Edition

About author

Software Developer | CS(CyberSec) Undergrad at APU Malaysia | Delphi/C++ Builder Enthusiast | Microsoft Learn Student Ambassador | Microsoft Azure Certified

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.