Site icon Embarcadero RAD Studio, Delphi, & C++Builder Blogs

Threadripper 3990X: The Quest To Compile 1 BILLION Lines Of C++ On 64 Cores

3990xthreadripper

RAD Studio is made up of Delphi and C++Builder. On the Delphi side the Object Pascal compiler is a single pass compiler and the compiler itself is not a parallel compiler but when compiling multiple projects in parallel it was able to compile 1 billion lines of Object Pascal code in 5 minutes on the AMD Ryzen 9 5950x 16 core machine. I wanted to see if something similar was possible with C++. This post is part of our modern hardware series where we explore the massive productivity gains that can be achieved on some of the fastest CPUs available at the time of this writing in early 2021. Just how much is 1 billion lines of code? Take a look.

Parallel Compilation In C++Builder

C++Builder has a number of different compilers including the classic Borland compiler and modern Clang based compilers for a number of platforms. Additionally, Embarcadero sponsors the open source Dev-C++ which has the TDM-GCC 9.2.0 compiler bundled with it. GCC 9.2.0 comes with MAKE which supports handles parallel compilation through it’s -j (Jobs) command line switch. C++Builder has an add-on called TwineCompile that brings parallel compilation to C++Builder. Both C++Builder and Dev-C++ are built with Delphi.

During my investigations so far TwineCompile seems to offer more functionality than MAKE Jobs because TwineCompile supports background compilation and some other productivity enhancing features. It is up to to the IDE (Dev-C++) to support additional features like background compilation but at this time Dev-C++ does not while C++Builder does through TwineCompile. Dev-C++ is a great native C++ IDE for Windows development and then C++Builder turns the productivity up to the max with it’s visual designer, powerful built in VCL RTL, and enhanced parallel compilation features. Additionally, they are based around the different C++ compilers so it is not entirely a direct comparison and they really compliment each other.

Third party benchmarks (not the project in this blog post) for a 3990X with TwineCompile:

Parallel Compilation In Dev-C++

At the beginning of this quest Dev-C++ did not support the -j MAKE flag so that was the first task to complete. I was able to update Dev-C++ and release the new v6.3 version with the parallel compilation -j built-in as an option now. It is also on by default for release builds so that should greatly reduce compile times for everyone using Dev-C++. The update had to be made because the command line flag that needed to be added had to be added to MAKE and not to the compiler command line. This took a few days to implement and get the new v6.3 version released. Bundled in with this release was all the bug fixes from the last two months and a second new feature where custom embedded console apps can be selected. Here are the release notes for Dev-C++ v6.3:

Version 6.3 – 30 January 2021

Once I had the Dev-C++ IDE that could parallel compile the 1 billion lines of C++ I needed the actual AMD Threadripper 3990X with the 64 cores and 128 threads. The Threadripper scores less per CPU on PassMark than the 5950X but because it has more cores it has a higher overall score. The below screenshot is from PassMark showing a comparison of the two CPUs. As you can see the 5950x has a single core benchmark of 3491 while the 3990x has a single core benchmark of 2553. But the 3990x has a overall multicore benchmark of 80752 right now while the 5950x only has a multicore benchmark of 46045 right now.

Note: Video doesn’t mention the larger 64 core 3990X Threadripper used in this blog post.

ReliableSite.net has cloud based AMD Threadripper 3990X 256GB of RAM machines which fit the needs of this project. They offer two different Windows setups: Windows Standard 2016 and Windows Standard 2019. I selected Windows 2016 and they tried to install the machine with that OS but for whatever release were unable to due to most likely a licensing issue with how Microsoft has the licensing of CPUs and cores in Windows Standard 2016. In any event, they switched it to Windows Standard 2019 and it was able to install just fine.

At this point we are up and running on the Threadripper Windows 2019 machine with C++Builder and TwineCompile plus Dev-C++ v6.3 with it’s new parallel compilation support built-in. Everything tests out and runs great. C++Builder is able to compile the 1 million lines of C++ from a previous post 4X faster than the 5950x did and Delphi is able to compile the 1 billion lines of Object Pascal projects 2.5X faster as well. We’ll leave those two comparisons for another post.

One of the tools I use with the modern hardware posts to gauge the CPU usage is Task Manager DeLuxe from MiTeC. Task Manager DeLuxe is pretty amazing in the amount of information it provides regarding your Windows system. TMX features a dark (very 2021) and light mode. TMX is available from MiTeC which also makes a wide variety of Delphi components that give you access to a lot of the same information found in TMX. Most of the information in TMX is probably available for you to use in your app with the MiTeC System Information Component Suite.

When I first loaded Task Manager DeLuxe up on the Threadripper 3990x 64 core machine it was not able to display the individual CPU graphs and threw out an error. I have a commercial license to Task Manager DeLuxe so I sent an email to Michal at MiTeC and he was able to solve the issue very quickly. He released a new version of Task Manager DeLuxe which now loads up and runs great on the 64 core machine.

The next task was to actually create the 1 billion lines of code C++ project so we can compile it. I started out with this Scimark2 project for Dev-C++ and developed a Delphi app to quickly generate the number of lines of code needed. In the end I wanted to be able to actually run the application created from the 1 billion lines of C++. The Delphi app takes the LU.c and LU.h files and duplicates the last function LU_factor() the number of times needed to create the designed number of lines. The function itself is 69 lines long and to avoid name collisions each generated function has a file number and a iterator number.

I tried a number of different ways to slice the C++ project files up with more files and less lines or more lines and less files. In the Delphi project I did 4 million lines across 250 different projects. For the C++ project one of the ways was with 32,000 files and 31,250 lines per file. I arrived at this number with some testing because it seemed like Dev-C++ did better with smaller files, more files for more cores, and a great number of smaller files mimics a real project more closely. A second way was with 10,666 files and 93,750 lines per file. A third way is with 1000 files and 1000000 lines of C++ per file. The list of files gets added to the Dev-C++ project file after they have been generated which means Dev-C++ has to load that list of files into its project list.

A bottleneck I discovered here is that Dev-C++ has code completion and symbol completion. These features parse the files in the project when the project is opened and suffice to say they are not parallelized yet. Dev-C++ does eventually load but it takes awhile for it to handle the 32,000 files (and even the 10,666 files). Once I figured this out I was able to disable the code completion and symbol completion which allows the 1 billion lines of C++ code project to load quickly. Dev-C++ doesn’t seem to have any trouble editing a file with 1 million lines and it feels pretty snappy.

A second issue I ran into is that Delphi’s System.CPUCount procedure reports 32 instead of the 128 threads. 32 cores might have been enough when the System.CPUCount procedure was written but we’re way beyond that now. In the case of the 5950X which has 16 cores and 32 threads the procedure works great but for the 3990X this was incorrect. I reported this issue to the Embarcadero Quality portal but in the mean time there is a third party NumCPULib4Pascal library which should report the correct value. I built a custom build of the Dev-C++ executable and hard coded in the 128 threads for now.

We’re almost ready to go for the 1 billion lines of code compile now! We have the hardware in place, we have the IDE, and the compilers in place, and we have the projects (slices different ways in place). I have been compiling different size versions of the 1 billion lines of C++ project during the whole process to figure out each of the issues mentioned above and correct them.

Let’s start out with the 1 billion line project split into 32,000 files with 31,250 lines each. This project compiles. It uses all of the cores as it should but when it gets down to linking the 32,000 files into the executable it stalls out. There is a limit on the command line for being able to pass the 32,000 files to the linker. The maximum length of the Windows command line is 32768 bytes which is a USHORT in the Windows API. The second project with the 10,666 files and 93,750 lines per file also compiles but fails out for the same reason.

The third project with the 1000 files and 1,000,000 lines per file compiles as well but more slowly. It does not use all 128 cores during the compilation process. Selecting -j64, -j128, and -j (automatic) from MAKE sees only around ~34 of the 64 cores really fire up though it does execute 64 g++ processes. It uses 81GB of RAM during this process so it’s a good thing the machine has 256GB of RAM. Once all of the files compile it does get past executing the command line but the linker itself crashes out with an error while it is trying to combine all of the object files into the executable. So far the suggestions found on StackOverflow of various command line arguments to pass did not solved the issue.

[crayon-677697bb74924106609821/]

After so more testing it would appear that the 2GB executable size limit (regardless of using -mcmodel=medium or -mcmodel=large) is the roadblock the above error. I was able to do a 100 file 1,000,000 lines per file compile and it generated a ~1.1GB executable. I started using -Os (which optimizes for size) and that moved the project forward quite a bit. A couple things to note here is that the larger the executable the slower the Scimark2 score is which is interesting. The first successful 1 billion line compile using 1000 1,000,000 line files and -Os generated a 359MB executable in 1483 seconds (24.7 minutes). I also tried 500 files with 2,000,000 lines and that actually took longer. The default Scimark2 project is 4X faster than the project with the extra 1 billion lines of code when the executable is larger and when compiled with -Os.

The 500 file 2000000 million lines of code per file used up to 156GB but not all 64 cores

I don’t feel like this compile time accurately represents the 3990x Threadripper because on the 1 million and 2 million lines of code file sizes all of the cores are not being used. I don’t know if this is an issue with MAKE and G++ or the automated -j setting where it selects the number of cores to use or even if there is an IO bottleneck on the machine it is not able to handle it. The smaller the files the more of the cores the MAKE/G++ -j combo uses. I’ve also tried it with and without the -pipe flag which uses pipes instead of files during the compilation. What is also interesting here is that TwineCompile in C++Builder doesn’t seem to have the same limitation. When using it to parallel compile all of the cores instantly fire up.

Quad Compile

In an effort to get a faster compile time on 1 billion lines of C++ code I loaded up 4 instances of Dev-C++ with 250 1,000,000 lines of code files in the project and compiled all 4 projects at the same time. This is similar to the 1 billion lines of Object Pascal code because under that project it was compiling 250 projects with 4 million lines of code per project. And here we have the results of the quad compile.

Quad Instances Of Dev C++
Note On this screenshot there is a bug where it is showing 32 cores and 64 threads which should read 64 cores and 128 threads

Compilation results…

Compilation results…

Compilation results…

Compilation results…

1 billion lines of C++ code in 15 minutes on the AMD 3990X Threadripper.

This project was a lot of fun and there are all kinds of different C++ flags for the TDM-GCC compiler like -mtune=native, -mtune=znver2, and -mtune=znver3 which I haven’t tried with this setup. As we’ve seen in this post the software support for a modern machine with 64 cores and 128 threads is still getting all the kinks worked out but generally works and delivers some serious computing power. C++Builder with TwineCompile is a powerful productivity solution for multi-core machines compiling 1 million lines of code very quickly and can work better than the MAKE/GCC parallel compilation Jobs feature due to it’s deep IDE integration. The open source Dev-C++ is a pretty powerful solution when working with large code bases and large files and could take more advantage of the Parallel Programming Library in Delphi. I was impressed by how responsive the Dev-C++ UI is during the parallel compilation. Ready to get started building Windows apps in C++? Try out C++Builder or Dev-C++ from the links below. You can also investigate more about learning C++ and other modern hardware articles featuring the 5950X 16 core machine below.

Additional Reading…

Learn more about programming with C++ on LearnCPlusPlus.org

AMD Ryzen 9 5950x Powerhouse Compiles Three Fourths Of A Million Lines Of Delphi Code In 12 Seconds

Parallel Compiling 300 Native Windows Apps In 45 Seconds With Delphi On An AMD Ryzen 9 5950x

Compile 1 Million Lines Of C++ In ~2 Minutes With A 16 Core Ryzen 9 5950x

Ryzen 9 5950x: One Billion Lines Of Delphi Code Compiled In ~5 Minutes On 16 Cores

Ryzen 9 5950x: Parallel Compile 124 Windows C++ Projects In ~1 Minute With 16 Cores

Get Two Powerful C++ Parallel Compilation IDEs

Download C++Builder for some massive productivity gains on multi-core machines.

Download Dev-C++ for a more basic C++ experience but now with powerful parallel compilation support.

Check out the full source code for Dev-C++ built in Embarcadero Delphi.

Take a look at Delphi as well as it is used to build both Dev-C++ and C++Builder.

Exit mobile version