This brief article focuses on helping you migrate your legacy C++ IDE applications to Unicode. Based on information from the Embarcadero consultants here you will find tips, tricks, and techniques to migrate legacy C++ Builder apps to a newer version.
We will look at working with Unicode in C++ Builder. For example you may know for the string
datatype C++ Builder offers several choices. Your code can utilize C-style characters and strings or C++ string objects for VCL string objects and each of these has its own set of Unicode variations. Moreover, the Windows API provides both ANSI and Unicode variants of various functions and methods.
- New C and C++ data types for C-style strings
- New Unicode VCL string classes
We will look at how to use the _TCHAR
maps to option that determines whether or not the Unicode preprocessor macro is defined. Then that determines whether you get the ANSI variant or the Wide string UTF-16 variant.
Furthermore, we will look at the standard Windows header tchar that includes macros designed to let you write code that compiles as either ANSI or Unicode.
Then finally, we will give examples for loading and saving Unicode characters to file.
Table of Contents
What do we need to know about Unicode in C++ Builder?
Unicode in C++ Builder 2009 has full Unicode support throughout both the VCL and the runtime library. The Unicode is critical for internationalization and localization.
C++ is a diverse language permitting the use of many libraries and several programming paradigms and, while your code uses the VCL does not need to be Unicode-aware. So, this means you might not need to convert all your code to a newer version. You can use your C runtime library, STL, or the Windows API, but when you need to pass data to or from VCL you can convert to Unicode.
When is complete migration to Unicode necessary?
By completely migrating to Unicode you can gain the full benefits of internationalization but, migrating only the VCL portion of your code can also simplify the task of upgrading.
To completely convert to Unicode, we need to explore several techniques.
How do we update C++ Builder projects from previous versions?
There are 2 options to start:
- Open your project files in C++ Builder
- IDE updates your project file
- Rebuild it
- Check the warnings and errors to fix
- Do not let C++ Builder convert to a newer version
- Copy your files to a new folder
- Create a new project
- Now add your source files to the project
The second option is much more efficient but takes time.
What you need to know about Unicode in VCL
Unicode in VCL we see that the VCL offers these 6 string classes which support ANSI UTF-8 and UTF16 encoding.
Note: code page is a character set, which can include numbers, punctuation marks, and other glyphs.
- AnsiString – corresponds to the old string class that contains 8-bit char data in the system
- UnicodeString – this is a new class. It contains 16-bit wide char_t data in the UTF-16 encoding
- WideString – this class exists from previous versions of RAD Studio and it corresponds to BSTR (Basic string or binary string) data type that is used by COM, Automation, and Interop functions. It contains a 16-bit Wide char_t data just like the Unicode string
- AnsiStringT – is a class template that contains 8-bit char data encoded in any code page.
- UTF8String – this is an AnsiStringT instantiation using the UTF-8 encoding
- RawByteString – it contains 8-bit char data on an unspecified code page.
Using RawByteString can give several advantages since each code page is a separate, compile-time type. RawByteString lets you write a single routine that can handle any code page. It removes any VCL overhead of doing code page conversions itself, and it prevents possible loss of data from automatically converting text data into encodings that can’t represent some characters.
The good news with all of this is that most member functions of these new string classes operate just the same as they did for the old non-Unicode string classes.
AnsiStringT<CodePage> template example:
- We pass Cyrillic data from CodePage 1251 to 65001 and back to Unicode without data loss
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
const wchar_t* data = L"Что ты говоришь?"; { // illustrates handling of Cyrillic data from // CP 1251 -> CP_UTF8 -> UnicodeString without loss AnsiStringT<1251> cp1251Str = data; assert(cp1251Str.CodePage() == 1251); UTF8String utf8Str = cp1251Str; assert(utf8Str.CodePage() == CP_UTF8); UnicodeString us1 = cp1251Str; UnicodeString us2(utf8Str); assert(us1 == us2); assert(us1 == data); assert(us2 == data); } |
Is there anything I need to know about Unicode and the Windows API?
Windows API includes both Unicode and ANSI variants.
- MessageBoxA – takes ANSI strings
- MessageBoxW – takes wide (UTF-16) strings
1 2 3 4 5 6 7 |
#if defined(UNICODE) ::MessageBox(Handle, Edit1->Text.c_str(), // wchar_t* _T("MessageBox - text from Edit1->Text.c_str()"), MB_OK); #else ::MessageBox(Handle, AnsiString(Edit1->Text).c_str(), AnsiString("MessageBox").c_str(), MB_OK); #endif |
What new C and C++ data types are there for C-style strings?
- char16_t (example u”Hello, World! u263A”) -> UTF-16
- identical semantics to wchar_t
- char32_t (example, U”Hello World! u263A”) ->UTF-32
- C++ has AnsiStringT codepage template which you can create yourn own AnsiString types
Why should I use TCHAR.H?
If you are planning on a complete Unicode migration as part of an upgrade to C++ Builder’s latest versions, character width agnostic code lets you prepare for the migration in your previous version of C++ Builder without breaking compilation.
Windows provides us with this TCHAR header file to help with this. Depending on whether the underscore Unicode preprocessor macro is defined, this TCHAR header defines the following macros for Unicode builds.
- TCHAR defined as
- char for non-Unicode builds
- wchar_t for Unicode builds
- _T, which is removed by the preprocessor for non-Unicode builds and is defined as L for Unicode builds
- _T(“Hello, world”) and the preprocessor converts it to a char literal
- wchar_t literal (L”Hello, world”) as appropriate
How do I replace a substring with AnsiString?
Here is one of the tips that talks about should you just go ahead and replace all occurrences of a string with an AnsiString. It is better to not just replace all occurrences of a string with AnsiString but instead use these two functions. You can use them whenever one type is returned and, the other type is required and vice-versa.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
wchar_t* __fastcall UnicodeOf(const char* C) { static wchar_t W[STR_CONV_BUF_SIZE]; memset(w, 0, sizeof(W)); MultiByteToWideChar(CP_ACP, MB_PRECOMPOSED, c, strlen(c), W, STR_CONV_BUF_SIZE); return(W); } char* __fastcall AnsiOf(wchar_t* W) { static char c[STR_CONV_BUF_SIZE]; memset(c, 0, sizeof(c)); WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, w, wcslen(w), c, STR_CONV_BUF_SIZE, NULL, NULL); return(c); } |
With these two functions, we’re using the two Windows API functions once called the wide char into multibyte that converts from UTF-16 to the encoding of your choices. For instance, UTF-8 or any of the various ANSI encodings.
How to load and save Unicode characters in files?
LoadFromFile and SaveToFile methods now have an additional parameter which is the TEncoding class. TEncoding class has a static property where you can, specify the encoding either ASCII or UTF-8 or Unicode or others.
1 2 3 4 5 6 7 |
ListBox1->Items->SaveToFile("c:tempmyText.txt", TEncoding::UTF8); if (FileExists("c:tempmyText.txt")) { ListBox1->Items->LoadFromFile("c:tempmyText.txt", TEncoding::UTF8); } else { ShowMessage("c:tempmyText.txt does not exist!"); } |
Is there a video which shows how to migrate C++ apps to use Unicode?
You can learn more about migrating legacy C++ Builder projects to a newer version from this webinar:
Where can I learn more about how to migrate my C++ apps to Unicode?
We have only very briefly scratched the surface of Unicode conversion in this article. For much more comprehensive advice you can head over, explore, and learn about migrating to Unicode at the Migration and Upgrade Center.
Check out everything you need to know about String on C++ in this blog.
Design. Code. Compile. Deploy.
Start Free Trial Upgrade Today
Free Delphi Community Edition Free C++Builder Community Edition