If you are programming in C++ using Microsoft tools, you can use the traditional Visual Studio compiler. Or you can use LLVM as a front-end (ClangCL).
Let us compare their performance characteristics with a fast string transcoding library (simdutf). I use an up-to-date Visual Studio (2022) with the latest ClangCL component (based on LLVM 15). For building the library, we use the latest version of CMake. I will abstain from parallelizing the build: I use default settings. Hardware-wise, I use a Microsoft Surface Laptop Studio: it has a Tiger Lake Intel processor (i7-11370 @ 3.3 GHz).
After grabbing the simdutf library from GitHub, I prepare the build directory for standard Visual Studio:
> cmake -B buildvc
I do the same for ClangCL:
> cmake -B buildclangcl -T ClangCL
You may also build directly with LLVM:
> cmake -B buildfastclang -D CMAKE_LINKER="lld" -D CMAKE_CXX_COMPILER=clang++ -D CMAKE_C_COMPILER=clang -D CMAKE_RC_COMPILER=llvm-rc
For each build directory, I can build in Debug mode (--config Debug) or in Release mode (--config Release) with commands such as
> cmake --build buildvc --config Debug
The project builds an extensive test suite by default. I often rely on my Apple macbook, and I build a lot of software using Amazon (AWS) nodes. I use an AWS c6i.large node (Intel Icelake running at 3,5 GHz, 2 vCPU).
The simdutf library and its testing suite build in a reasonable time as illustrated by the following table (Release builds). For comparison purposes, I also build the library using ‘WSL’ on the Microsoft laptop (Windows Subsystem for Linux).
|macbook air||ARM M2 processor||LLVM 14||25 s|
|AWS/Linux||Intel Ice Lake processor||GCC 11||54 s|
|AWS/Linux||Intel Ice Lake processor||LLVM 14||54 s|
|WSL (Microsoft Laptop)||Intel Rocket Lake processor||GCC 11||1 min*|
|WSL (Microsoft Laptop)||Intel Rocket Lake processor||LLVM 14||1 min*|
On Intel processors, we build multiple kernels to support the various families of processors. On a 64-bit ARM processor, we only build one kernel. Thus the performance of the AWS/Linux system and the macbook is somewhat comparable.
Let us switch back to Windows and build the library.
|Visual Studio (default)||2 min||2 min 15 s|
|ClangCL||2 min 51 s||3 min 12 s|
|Windows LLVM (direct with ldd)||2 min||2 min 4 s|
Let us run an execution benchmark. We pick an UTF-8 Arabic file that we load in memory and that we transcode to UTF-16 using a fast AVX-512 algorithm. (The exact command is benchmark -P convert_utf8_to_utf16+icelake -F Arabic-Lipsum.utf8.txt).
|Visual Studio (default)||0.789 GB/s||4.2 GB/s|
|ClangCL||0.360 GB/s||5.9 GB/s|
|WSL GCC 11||(omitted)||6.3 GB/s|
|WSL LLVM 14||(omitted)||5.9 GB/s|
|AWS Server (GCC)||(omitted)||8.2 GB/s|
|AWS Server (clang)||(omitted)||7.7 GB/s|
I draw the following tentative conclusions:
- There may be a significant performance difference between Debug and Release code (e.g., between 5x to 15x difference).
- Compiling your Windows software with ClangCL may lead to better performance (in Release mode). In my test, I get a 40% speedup with ClangCL. However, compiling with ClangCL takes much longer. I have recommended Windows users build their librairies with ClangCL and I maintain this recommendation.
- In Debug mode, the regular Visual Studio produces more performant code and it compiles faster than ClangCL.
Thus it might make sense to use the regular Visual Studio compiler in Debug mode as it builds fast and offers other benefits while testing the code, and then ClangCL in Release mode for the performance.
You may bypass ClangCL under Windows and build directly with clang with the LLVM linker for faster builds. However, I have not verified that you get the same speed.
No matter what I do, however, I seem to be getting slower builds under Windows that I expect. I am not exactly sure why build the code takes so much longer under Windows. It is not my hardware since building with Linux under Windows, on the same laptop, is fast.
Of course, parallelizing the build is the most obvious way to speed it up. Appending -- -m to my CMake command could help my performance. I deliberately avoided parallel builds in this experiment.
WSL Update. Reader Quan Anh Mai asked me whether I had made a copy of the source files in the Windows Subsystem for Linux drive, and I had not. Doing so multiplied the speed by a factor of three. I included the better timings.
20 thoughts on “Regular Visual Studio versus ClangCL”
Cloud servers may have ramdisk mounted for tmp.
Did you build in WSL from its native file system or from the Windows one? WSL access to the Windows file system is notoriously slow, so it may be better to make a copy of the project in the WSL file system to measure the build speed. Thanks.
I did not make a copy.
Blog post updated.
Have you excluded your Windows source and build directories from antivirus scanning? In my experience that’s the most common reason for slow builds on Windows.
I disabled Defender temporarily and it did not impact the speed, at least not in a noticeable manner for me.
Might be related: https://dev.blog.documentfoundation.org/2023/02/21/telemetry-required-ask-users-first/
I am not getting build errors.
Sure, but the point was that there’s telemetry in the build process thst many don’t expect, and that that might be related to slower builds.
I think it worth to report such performance gap between clang-cl and native msvc cl for the release mode at least with minimal reproduce code.
I think I provide enough information for reproduction in my blog post. I already formally reported that Visual Studio has non-competitive performance with SIMD kernels like simdutf.
If you’re using a recentish version of LLVM and CMake on windows, you don’t need to use the clang-cl driver at all. A trivial toolchain file for cmake like:
run in the VS development prompt will get it working with the normal clang frontend, so you can use the normal GCCish arguments.
If it’s linking that’s taking the time, adding the “-fuse-ld=lld” argument will speed it up a lot, as lld is much faster than link.exe.
I tried the following…
cmake -D CMAKE_LINKER=”ldd” -D CMAKE_CXX_COMPILER=clang++ -D CMAKE_RC_COMPILER=llvm-rc -B buildfastclang3
A debug build then takes 2 minutes and slightly over 2 minutes for a release build.
I don’t know if it applies here but one thing I’ve noticed in particular is that clang-cl tends to inline a lot more aggressively than MSVC.
Note that with Visual Studio 2019+, you can use the /Ob3 flag for more aggressive inlining.
/Ob2 is the default when using /O2
I haven’t tested if this actually makes a difference though.
I have found that Ob3 did not make a difference in a project of mine: https://github.com/simdjson/simdjson/issues/847
Note that you can give compiler hints to Visual Studio, telling it to inline a function. I think it works.
The linker is lld, ldd prints dependencies for dynamic executables.
Thanks for catching the typographical error. I did use lld.
You may subscribe to this blog by email.