Float-parsing benchmark: Regular Visual Studio, ClangCL and Linux GCC

Windows users have choices when it comes to C++ programming. You may choose to stick with the regular Visual Studio. If you prefer, Microsoft makes available ClangCL which couples the LLVM compiler (commonly used by Apple) with the Visual Studio backend. Further, under Windows, you may easily build software for Linux using the Windows Subsystem for Linux.

Programmers often need to convert a string (e.g., 312.11) into a binary float-point number. It is a common task when doing data science.

I wrote a small set of benchmark programs to measure the speed. To run, I use an up-to-date Visual Studio (2022) with the latest ClangCL component, and the latest version of CMake.

After downloading it on your machine, you may build it with CMake. Under Linux, you may do:

cmake -B build && cmake --build build

And then you can execute like so:

./build/benchmarks/benchmark -f data/canada.txt

The Microsoft Visual Studio usage is similar except that you must specify the build type (e.g., Release):

cmake -B build && cmake --build build --config Release

For ClangCL, it is almost identical, except that you need to add -T ClangCL:

cmake -B build -T ClangCL && cmake --build build --config Release

Under Windows, the binary is not produced at build/benchmarks/benchmark but rather at build/benchmarks/Release/benchmark.exe, but the commands are otherwise the same. I use the default CMake flags corresponding to the Release build.

I run the benchmark on a laptop with Tiger Lake Intel processor (i7-11370 @ 3.3 GHz). It is not an ideal machine for benchmarking so I indicate the error margin on the speed measurements.

Among other libraries, my benchmark puts the fast_float library to the test: it is an implementation of the C++17 from_chars function for floats and doubles. Microsoft has its own implementation of this function which I include in the mix.

My benchmarking code remains identical (in C++) when I switch system or compiler: I simply recompile it.

Visual Studio std::from_chars 87 MB/s (+/- 20 %)
Visual Studio fast_float 285 MB/s (+/- 24 %)
ClangCL fast_float 460 MB/s (+/- 36 %)
Linux GCC11 fast_float 1060 MB/s (+/- 28 %)

We can observe that there are large performance differences. All the tests run on the same machine. The Linux build runs under the Windows Subsystem for Linux, and you do not expect the subsystem to run computations faster than Windows itself. The benchmark does not involve disk access. The benchmark is not allocating new memory.

It is not the first time that I notice that the Visual Studio compiler provides disappointing performance. I have yet to read a good explanation for why that is. Some people blame inlining, but I have yet to find a scenario where a release built from Visual Studio had poor inlining.

I am not ruling out methodological problems but I almost systematically find lower performance under Visual Studio when doing C++ microbenchmarks, despite using different benchmarking methodologies and libraries (e.g., Google Benchmark). It means that if there is a methodological issue, it is deeper than a mere programming typo.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

4 thoughts on “Float-parsing benchmark: Regular Visual Studio, ClangCL and Linux GCC”

  1. I think is not an issue with your metodology.
    I work as system programmer and in my experience, linux beats windows in every CPU, Disk and Memory benchmark. Even virtualized under windows many test still outperforms windows en several test. The only area where windows wins is in graphics.

    I dont have low level details about it but i suspect the cause is a combination of better optimization (ej, copy on write memory when fork a process), api exposed (system-v api vs win32) and complexity (linux boxes often run less process at idle state).

    And graphics stack in linux is getting better year at year. Now you even can run windows games in linux and some of them, run faster. Even “emulating” using proton.

    Regards.
    Regards.

    1. The Linux version of his tests runs on a Windows kernel (WSL). Generally speaking I doubt the OS has a serious impact on such micro benchmark.
      It’s the compiler that is benckmarked here. And VC++ is known to not be the best in speed, both for backend code generation but also efficient STL implémentation.
      https://developercommunity.visualstudio.com/t/vs2019-generates-inefficient-code-compared-to-clan/457819
      Something interesting to test would be clang on WSL because I wonder how much clang-cl differs from regular clang.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.