Parsing floating-point numbers really fast in C#

Programmers often write out numbers as strings (e.g., 3.1416) and they want to read back the numbers from the string. If you read and write JSON or CSV files, you do this work all of the time.

Previously, we showed that we could parse floating-point numbers at a gigabyte per second or better in C++ and in Rust, several times faster than the conventional approach. In Go 1.16, our approach improved parsing performance by up to a factor of two.

Not everyone programs in C++, Rust or Go. So what about porting the approach to C#? csFastFloat is the result!

For testing, we rely on two standard datasets, canada and mesh. The mesh dataset is made of “easy cases” whereas the canada dataset is more difficult. We use .NET 5 and an AMD Rome processor for testing.

parser canada mesh
Double.Parse (standard) 3 million floats/s 11 million floats/s
csFastFloat (new) 20 million floats/s 35 million floats/s

Importantly, the new approach should give the same exact results. That is, we are accurate.

Can this help in the real world? I believe that the most popular CSV (comma-separate-values) parsing library in C# is probably CSVHelper. We patched CSVHelper so that it would use csFastFloat instead of the standard library. Out of a set of five float-intensive benchmarks, we found gains ranging from 2x to 8%. Your mileage will vary depending on your data and your application, but you should see some benefits.

Why would you see only an 8% gain some of the time? Because, in that particular case, only about 15% of the total running time has to do with number parsing. The more you optimize the parsing in general, the more benefit you should get out of fast float parsing.

The package is available on nuget.

Credit: The primary author is Carl Verret. We would like to thank Egor Bogatov from Microsoft who helped us improve the speed, changing only a few lines of code, by making use of his deep knowledge of C#.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

4 thoughts on “Parsing floating-point numbers really fast in C#”

  1. Hi Daniel, a couple of questions. I was just about to ask what data format you were using for some of the integer libraries when I realized that a lot of these are parsing text files.

    So when you say “parsing” floats or integers, should I understand that this means parsing a text representation of these values? Is that implied in the term “parse”, such that we wouldn’t say we were parsing if the data was binary?

    And then with these floats, I noticed the data files have a lot of content before the floats themselves. In many of the files, there are lots of leading zeros before each number (more than eight). What are those about? And then some files have a bunch of hex before each number, like this file:

    https://github.com/CarlVerret/csFastFloat/blob/master/TestcsFastFloat/data_files/tencent-rapidjson.txt

    What is that hex data? Are those supposed to be floats also? It seems like the floats come at the end of each row, after a lot of hex. An easier example is this one:

    58A8 43150000 4062A00000000000 149

    The float is 149, and the two longer strings in the middle are different hex representations of 149 as a float. But I don’t know what 58A8 is. Is csFastFloat doing anything with those hex strings? Which representation is actually parsing?

    1. We parse strings representing numbers in decimal form.

      These files you are looking at are test files for internal use, and not part of the library. We use them for testing.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax