The recently released AMD Ryzen 9 5950x offers 16 cores and 32 threads so let’s see what kind of performance we can get out of a parallel C++ compile with those 32 threads. At the time of this writing the AMD Ryzen 9 5950x has the highest single core CPU Benchmark score at around 3515. C++Builder is a rapid application development tool for building C++ Windows apps. It offers normal compilation and in the most recent version includes an add-on called TwineCompile that will use all 32 threads from the Ryzen 5950x powerhouse to compile multiple files in the C++ project simultaneously. We did two previous posts where we benchmarked the 5950x with a ~750k lines of code compile in Delphi and parallel building 300 native Windows apps in Delphi.
The project I used for testing the C++ parallel compile is a large C++ Windows app with 128 forms and according to C++Builder ~254,000 lines of C++. The forms are taken from the 50 project forms found in this C++ Cross Platform Samples repository. We used the 50 forms 2 and 3 times to get to the 128 number. Originally we built this project to benchmark the AMD Ryzen Threadripper 3990x which has 64 cores and 128 threads. In any event once we had 128 forms in the project we added some generic C++ to each of the 128 units to bring them to over 1000 lines each. Keep in mind that each project is a different workload and results in your own projects may vary. Different C++ language features and project configurations can affect compile times.
The full specs on the AMD Ryzen 9 5950x benchmark machine are AMD Ryzen 9 5950x, 64GB DDR4 3200MHz RAM, 1TB NVMe SSD + 2TB HDD, NVIDIA GeForce RTX 3070 8GB, and Windows 10 Pro. In order to monitor the CPU and Disk IO usage of C++Builder’s parallel compile I used Task Manager DeLuxe or TMX (which is also built in Delphi). Task Manager DeLuxe is pretty amazing in the amount of information it provides regarding your Windows system. TMX is available from MiTeC which also makes a wide variety of Delphi components that give you access to a lot of the same information found in TMX. Below is the 32 CPU thread view TMX provides. I took this screenshot during the normal synchronous C++Builder compile. You can see in the screenshot that it is really only using a single core simultaneously for the compile.
Next let’s take a look at the screen shot from Task Manager DeLuxe shortly after the C++ parallel compilation using TwineCompile in C++Builder. You will see in this screenshot that it uses all of the threads for the compilation. You can see how it used all 32 threads and TMX also provides a handy CPU clock speed monitor as the AMD Ryzen 9 5950x turbo boosts up to 4.9Ghz (only 4.2Ghz in the screenshot). One interesting thing to note here is that because the turbo boosting from 3.9 Ghz to 4.9 Ghz is not consistent the benchmarks change by a few seconds each run.
If you want to hear more about the AMD Ryzen 9 5950x CPU architecture AMD has a great video where they explain the Zen 3 architecture.
Let’s get to the comparison of the numbers. There are a number of different kinds of builds that can be done in C++Builder. This includes a Debug build (-O0) and a Release build. On the Release build different optimization flags can be selected (-O1, -O2, and -O3). Each flag has a different optimization target. -O1 generates the smallest possible code, -O2 generates the fastest possible code, and -O3 generates the most optimized code. According to Embarcadero -O3 gives speed improvements of up to twice the performance of -O2.
The Debug builds are the fastest of the four optimization levels. This mainly makes a difference when using the normal compile because it took up to a minute longer for the Release builds than the Debug builds. While using parallel compile the build process was so fast in both Debug and Release mode that it hardly mattered as all the scores are pretty close together. The first chart here is the Normal C++ Debug build (-O0) coming in at 396 seconds verses the Parallel C++ Debug Build (-O0) coming in at 33 seconds (12X faster!). If we run the numbers on lines of code per second we get around 7,696 lines of code per second using the parallel TwineCompile for -O0. The normal debug synchronous -O0 build comes in at 641 lines per second to compile.
In the second chart we have the Normal C++ Release build (-O1) coming in at 404 seconds verses the Parallel C++ Release Build (-O1) coming in at 32 seconds (~12X faster!). The parallel build seconds varies depending on the current speed of the turbo boost (anywhere between 3.9 Ghz and 4.9 Ghz). If we run the numbers on lines of code per second we get around 7,937 lines of code per second using the parallel TwineCompile for -O1. The normal synchronous -O1 build comes in at 628 lines per second to compile.
In the third chart we have the Normal C++ Release build (-O2) coming in at 449 seconds verses the Parallel C++ Release Build (-O2) coming in at 37 seconds (~12X faster!). The parallel build seconds varies depending on the current speed of the turbo boost (anywhere between 3.9 Ghz and 4.9 Ghz). If we run the numbers on lines of code per second we get around 6,864 lines of code per second using the parallel TwineCompile for -O2. The normal synchronous -O2 build comes in at 565 lines per second to compile.
In the fourth and final chart we have the Normal C++ Release build (-O3) coming in at 450 seconds verses the Parallel C++ Release Build (-O3) coming in at 36 seconds (~12X faster!). The parallel build seconds varies depending on the current speed of the turbo boost (anywhere between 3.9 Ghz and 4.9 Ghz). I saw between 36 seconds and 40 seconds here. If we run the numbers on lines of code per second we get around 7,055 lines of code per second using the parallel TwineCompile for -O3. The normal synchronous -O3 build comes in at 564 lines per second to compile.
Suffice to say the productivity boost by having parallel compilation is significant. Being able to compile a large C++ app in around 30 seconds allows you to iterate faster (similar to the iteration speed which can be done in Delphi) because compile times are so fast. I term 128 forms and ~254k lines of code for a Windows project to be large. It certainly is not a small project (2-3 forms) and it certainly is not a massive project (millions and millions of lines of code).
Now let’s compare the Delphi 10.4.1 compiler to the C++Builder Parallel compile. In our first blog in this series an AMD Ryzen 9 5950x CPU compiles generics heavy Object Pascal code at around 61,500 lines per second which can be extrapolated to 1 million lines of generics heavy Object Pascal code in 16 seconds. The fastest C++Builder parallel build (-O1) compiles 7,937 lines of code per second which can be extrapolated to 1 million lines of C++ in ~126 seconds. The same C++Builder -O1 synchronous C++ compile was 628 lines of code per second which can be extrapolated to 1 million lines of C++ code in 1592 seconds. As you can see the C++Builder parallel compile approaches the productivity of Delphi with compile speeds as it is orders of magnitude faster than the normal compile. C++Builder with parallel compilation on modern hardware through TwineCompile can bring you close to the productivity of Delphi with the speed and power of C++ for your Windows applications.
Modern hardware is and the AMD Ryzen 9 5950x is great with it’s 16 cores and 32 threads but the Ryzen 9 5950x CPU is actually difficult to get a hold of at the moment. What about using TwineCompile on an older machine? I have actually been using an i7-3770 with 4 cores and 8 threads for the last 8 years as my daily driver. The specs on this machine are roughly an Intel i7-3770, 16GB RAM, 1TB SSD, Windows 10 Home. It’s CPU benchmark score for a single thread is 2069 vs. 3515 on the 5950x. The only upgrade I really made to it in 8 years was putting in a Samsung 860 EVO 1TB SSD and that has made a huge difference with compile times. I used Task Manager DeLuxe again and took screenshots of the normal compile and the parallel compile on the i7-3770 8 thread machine. First we will show a normal compile in C++Builder. As you will see in the screenshot it is only using around 30% of the CPU to compile the C++ code.
Next let’s take a look at the i7-3770 machine again this time using C++Builder parallel compiling the same 128 form project and around 254,000 lines of code. As you will see this time it is hitting all 4 cores and 8 threads and using the full power of the machine to compile.
Let’s see some numbers out of this machine when compiling the same 128 form C++Builder project synchronously and in parallel. The first chart here is the Normal C++ Debug build (-O0) coming in at 1023 seconds verses the Parallel C++ Debug Build (-O0) coming in at 170 seconds (6X faster!). If we run the numbers on lines of code per second we get around 1494 lines of code per second using the parallel TwineCompile for -O0. The normal debug synchronous -O0 build comes in at 248 lines per second to compile.
The second chart here is the Normal C++ Release build (-O2) coming in at 935 seconds verses the Parallel C++ Release Build (-O2) coming in at 142 seconds (~6X faster!). If we run the numbers on lines of code per second we get around 1788 lines of code per second using the parallel TwineCompile for -O2. The normal debug synchronous -O2 build comes in at 271 lines per second to compile. One interesting thing I see here is that on the AMD Ryzen 9 3950x machine the Debug builds were faster than the release builds where as on the older machine here the debug builds are slow. I don’t have any hard numbers but I would guess this might be due to debug builds being larger than release builds and therefor the solid state hard drive speed comes into play.
As you can see even on older hardware the C++Builder parallel compile provides a HUGE productivity boost with much faster compile times. If you have an older machine and you aren’t running an SSD like the Samsung 860 EVO that is an easy upgrade to achieve much better performance over a normal hard drive. Or if you are running an older machine which is not at least quad code you can pick up older quad core machines for relatively low cost.
In any event regardless of the hardware you are running (as long as it has at least 2 cores) you will see a significant compile time boost for your C++ projects when using the latest C++Builder with parallel compilation through TwineCompile. In this blog post we benchmarked the latest AMD Ryzen 9 5950x with it’s 16 cores and 32 threads and have conclusively shown that it can make a huge difference for increasing your productivity through iteration speed. A relatively large Windows C++ project with 128 forms and over 254,000 lines of code can be compiled in around 30-40 seconds through parallel compilation with 16 cores and 32 threads. That is incredible. An older machine using normal synchronous compilation took anywhere between ~15 minutes and ~17 minutes for the same project!
Learn more about this C++ programming language and all similar techniques in this article.
Now is a great time to be a C++ developer for building Windows (and iOS) apps in C++. We’ve seen how a single core on older hardware might take 60 minutes to compile a C++ project with 1 million lines of code which now only takes ~2 minutes using parallel compilation on modern hardware! Parallel compilation brings much needed productivity to C++ development without sacrificing the speed and power of C++’s runtime performance. C++Builder 10.4.1+ is the tool that can get you there.
Find out more about the C++Builder compiler over in the Embarcadero DocWiki.
Don’t have the latest version of C++Builder yet? Take a look.