Converting binary floating-point numbers to integers

You are given a floating-point number, e.g. a double type in Java or C++. You would like to convert it to an integer type… but only if the conversion is exact. In other words, you want to convert the floating-point number to an integer and check if the result is exact.

In C++, you could implement such a function using few lines of code:

bool to_int64_simple(double x, int64_t *out) {
  int64_t tmp = int64_t(x);
  *out = tmp;
  return tmp == x;
}

The code is short in C++, and it will compile to few lines of assembly. In x64, you might get the following:

        cvttsd2si rax, xmm0
        pxor    xmm1, xmm1
        mov     edx, 0
        cvtsi2sd        xmm1, rax
        mov     QWORD PTR [rdi], rax
        ucomisd xmm1, xmm0
        setnp   al
        cmovne  eax, edx

Technically, the code might rely on undefined behaviour.

You could do it in a different manner. Instead of working with high-level instructions, you could copy your binary floating-point number to a 64-bit word and use your knowledge of the IEEE binary64 standard to extract the mantissa and the exponent.

It is much more code. It also involves pesky branches. I came up with the following routine:

typedef union {
  struct {
    uint64_t fraction : 52;
    uint32_t exp_bias : 11;
    uint32_t sign : 1;
  } fields;
  double value;
} binary64_float;

bool to_int64(double x, int64_t *out) {
  binary64_float number = {.value = x};
  const int shift = number.fields.exp_bias - 1023 - 52;
  if (shift > 0) {
    return false;
  }
  if (shift <= -64) {
    if (x == 0) {
      *out = 0;
      return true;
    }
    return false;
  }
  uint64_t integer = number.fields.fraction | 0x10000000000000;
  if (integer << (64 + shift)) {
    return false;
  }
  integer >>= -shift;
  *out = (number.fields.sign) ? -integer : integer;
  return true;
}

How does it compare with the simple and naive routine?

I just wrote a simple benchmark where I iterate over many floating-point numbers in sequence, and I try to do the conversion. I effectively measure the throughput and report the number of nanosecond per function call. My benchmark does not stress the branch predictor. Unsurprisingly, the naive and simple approach can be faster.

system long version simple version
ARM M1 + LLVM 12 0.95 ns/call 1.1 ns/call
Intel 7700K + LLVM 13 1.6 ns/call 1.2 ns/call

I find reassuring that the simple approach is so fast. I was concerned at first that it would be burdensome.

It is possible that my long-form routine could be improved. However, it seems unlikely that it could ever be much more efficient than the simple version.

My source code is available.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

15 thoughts on “Converting binary floating-point numbers to integers”

  1. Very interesting post.
    Don’t you need to shift bits left by 1 before testing equality to zero (to account for +0 and -0) ?

  2. I’ve executed your code on one of my armhf machine and the results are surprising.

    $ time ./isint.g++
    499999999500000000
    39.8756
    499999999500000000
    96.3353

    real 4m32,431s
    user 4m32,237s
    sys 0m0,196s

    $ time ./isint.clang++
    499999999500000000
    28.1106
    499999999500000000
    98.3311

    real 4m12,893s
    user 4m12,060s
    sys 0m0,508s

    Your code is way faster on my armhf machine.
    Note that g++ (Debian 8.3.0-6) 8.3.0 and clang version 7.0.1-8+deb10u2 were not the most recent version.

  3. Very interesting post.
    Shouldn’t bits be left shifted by one bit before testing zero with (bits == 0) to account for +0 and -0 cases ?

    (my comment may have been lost the first time)

  4. As a special case, I wonder how fast it would be for a custom shortcut routine to convert a known-integer value stored in a 32- or 64-bit floating-point variable into a uint64_t value?

    What I mean is: If you happen to know that a floating-point value is an integer (which can certainly happen sometimes), can the conversion be performed measurably faster?

  5. Hello Daniel. In the modern C++ the legal way and optimal way for binary conversion between unrelated types is to use std::memcpy or std::bit_cast.

      1. I thought unions were only legal if you write and read from the same field (not possible to write a field an interpret the data another way by reading another one). But I may be mixing C/C++ standards

        1. It seems that you are correct:
          “It’s undefined behavior to read from the member of the union that wasn’t most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.”

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.