Fast float parsing in practice

In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is quite slow. People who write fast parsers tend to roll their own number parsers (e.g., RapidJSON, sajson), and so we did. However, we sacrifice some standard compliance. You see, the floating-point standard that we all rely on (IEEE 754) has some hard-to-implement features like “round to even”. Sacrificing such fine points means that you can be off by one bit when decoding a string. As such, this never matters: double-precision numbers have more accuracy than any engineering project will ever need and a difference on the last bit is irrelevant. Nevertheless, it is mildly annoying.

A better alternative in C++ might be from_chars. Unfortunately, many standard libraries have not yet caught up the standard and they fail to support from_chars properly. One can get around this problem by using the excellent abseil library. It tends to be much faster than venerable strtod function.

Unfortunately, for our use cases, even abseil’s from_chars is much too slow. It can be two or three times slower than our fast-but-imperfect number parser.

I was going to leave it be. Yet Michael Eisel kept insisting that it should be possible to both follow the standard and achieve great speed. Michael gave me an outline. I was unconvinced. And then he gave me a code sample: it changed my mind. The full idea requires a whole blog post to explain, but the gist of it is that we can attempt to compute the answer, optimistically using a fast algorithm, and fall back on something else (like the standard library) as needed. It turns out that for the kind of numbers we find in JSON documents, we can parse 99% of them using a simple approach. All we have to do is correctly detect the error cases and bail out.

Your results will vary, but the next table gives the speed numbers from my home iMac (2017). The source code is available along with everything necessary to test it out (linux and macOS).

parser MB/s
fast_double_parser (new) 660 MB/s
abseil, from_chars 330 MB/s
double_conversion 250 MB/s
strtod 70 MB/s

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

15 thoughts on “Fast float parsing in practice”

  1. Unfortunately, many standard libraries have not yet caught up the standard and they fail to support from_chars properly.

    On Windows it’s highly optimized (by Stephan T. Lavavej himself). Possibly you could add, which std-libs are lagging, or bad.

    1. Possibly you could add, which std-libs are lagging, or bad.

      I’d be more interested in knowing which one support it. The only one I have seen mentioned is Visual Studio. This will no doubt improve in the future, thankfully.

      On Windows it’s highly optimized

      Currently, this would not produce portable code since most other standard libraries I have tried do not support it. One portable approach is to rely on abseil.

      1. Implicitly, you’ve added it now to the comments.

        With some fiddling (if very important and worth the trouble, and some digging in the relevant docs) one could create an object file with clang-cl and link that in on linux or with MinGW. The thing has more or less a c-api anyway.

    1. As far as I can tell, the “new” way to parse floats in C++ is to use “from_chars” and I address this both in my benchmarks and my post. If you are thinking about something else, would you kindly elaborate?

      1. Aha, ok. I didn’t know that it was based on the from_chars routine (I’m still on C++17). Seeing as this is also quite fast that sounds very good.
        Is there any chance the std::format can use your algorithm, or does that have to go through the committee?

    1. You are correct.

      RapidJSON has at least two fast-parsing mode. The fast mode, which I think is what you refer to, is indeed quite fast, but it can be off by one ULP, so it is not standard compliant. Boost Spirit similarly offers fast parsing, but it is not again not standard compliant.

      Our very own simdjson has also a fast number parsing mode…

        1. The file name is confusing; dtoa.c contains both dtoa() and strtod(). I suppose Albert meant the latter.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.