# GNU GCC on x86 does not round floating-point divisions to the nearest value

I know that floating-point arithmetic is a bit crazy on modern computers. For example, floating-point numbers are not associative:

```0.1+(0.2+0.3) == 0.599999999999999978
(0.1+0.2)+0.3 == 0.600000000000000089
```

But, at least, this is fairly consistent in my experience. You should simply not assume fancy properties like associativity to work in the real world.

Today I stumbled on a fun puzzle. Consider the following:

```double x = 50178230318.0;
double y = 100000000000.0;
double ratio = x/y;
```

If God did exist, the variable ratio would be 0.50178230318 and the story would end there. Unfortunately, there is no floating-point number that is exactly 0.50178230318. Instead it falls between the floating-point number 0.501782303179999944 and the floating-point number 0.501782303180000055.

It important to be a bit more precise. The 64-bit floating-point standard represents numbers as a 53-bit mantissa followed by a power of two. So let us spell it out the way the computer sees it:

```0.501782303179999944 == 4519653187245114  * 2 ** -53
0.501782303180000055 == 4519653187245115  * 2 ** -53
```

We have to pick the mantissa 4519653187245114 or the mantissa 4519653187245115. There is no way to represent exactly anything that falls in-between using 64-bit floating-point numbers. So where does 0.50178230318 fall exactly? We have approximately…

```0.50178230318 = 4519653187245114.50011795456 * 2 ** -53
```

So the number is best approximated by the largest of the two values (0.501782303180000055).

Goldberg in What every computer scientist should know about floating-point arithmatic tells us that rounding must be to the nearest value:

The IEEE standard requires that the result of addition, subtraction, multiplication and division be exactly rounded. That is, the result must be computed exactly and then rounded to the nearest floating-point number (using round to even). (…) One reason for completely specifying the results of arithmetic operations is to improve the portability of software. When a program is moved between two machines and both support IEEE arithmetic, then if any intermediate result differs, it must be because of software bugs, not from differences in arithmetic. Another advantage of precise specification is that it makes it easier to reason about floating-point. Proofs about floating-point are hard enough, without having to deal with multiple cases arising from multiple kinds of arithmetic. Just as integer programs can be proven to be correct, so can floating-point programs, although what is proven in that case is that the rounding error of the result satisfies certain bounds.

Python gets it right:

```>>> "%18.18f" % (50178230318.0/100000000000.0)
'0.501782303180000055'
```

JavaScript gets it right:

```> 0.50178230318 == 0.501782303180000055
true
> 0.50178230318 == 0.501782303179999944
false
```

So the story would end there, right?

Let us spin up the GNU GCC 7 compiler for x86 systems and run the following C/C++ program:

```#include <stdio.h>
int main() {
double x = 50178230318.0;
double y = 100000000000.0;
double ratio = x/y;
printf("x/y  = %18.18f\n", ratio);
}
```

Can you predict the result?

```\$ g++ -o round round.cpp
\$ ./round
x/y  = 0.501782303179999944
```

So GNU GCC actually picks the smallest and furthest value, instead of the nearest value.

You might think that it is a bug that should be reported, right?

There are dozens if not hundreds of similar reports to the GNU GCC team. They are being flagged as invalid.

Let me recap: the GNU GCC compiler may round the result of a division between two floating-point numbers to a value that is not the nearest. And it is not considered a bug.

The explanation is that the compiler first rounds to nearest using 80 bits and then rounds again (this is called double rounding). This is what fancy numerical folks call FLT_EVAL_METHOD = 2.

However, the value of FLT_EVAL_METHOD remains at 2 even if you add optimization flags such as -O2, and yet the result will change. The explanation is that the optimizer figures out the solution at compile-time and does so ignoring the FLT_EVAL_METHOD value. Why it is allowed to do so is beyond me.

Maybe you think it does not matter. After all, the value is going to be close, right? However, if you are an experienced programmer, you know the value of having deterministic code. You run the code and you always get the same results. If the results depend, some of the time, on your exact compiler flag, it makes your life much more difficult.

You can also try to pass GNU GCC the flags -msse -mfpmath=sse, as experts recommend, but as my script demonstrates, it does not solve the issue (and then you get FLT_EVAL_METHOD = -1). You need to also add an appropriate target (e.g., -msse -mfpmath=sse -march=pentium4). In effect, when using GNU GCC, you cannot get away from specifying a target. The flags -msse -mfpmath=sse alone will silently fail to help you.

Some people have recommended using other flags to switch the compiler in pc64 mode (-pc64). While it would fix this particular example, it does not fix the general problem of floating-point accuracy. It will just create new problems.

If you are confused as to why all of this could be possible without any of it being a bug, welcome to the club. My conclusion is that you should probably never compile C/C++ using GNU GCC for a generic x86 target. It is broken.

You can examine the assembly on godbolt.

Note: Please run my tests in the specified docker images so that you get the exact same configuration as I do.

### Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

## 60 thoughts on “GNU GCC on x86 does not round floating-point divisions to the nearest value”

1. Brian Kessler says:

I think you should be more specific and state that the issue is with the x87 floating point unit. The x87 has one internal precision that defaults to 80-bits, though this can be set to 32 or 64 bits using the FLDCW instruction to load the control word for the floating point unit. For most cases, people want and expect the internal precision to equal the operand precision (FLT_EVAL_METHOD=0), which is what SSE instructions do. Note that explicitly requesting SSE does actually work in gcc as long as the processor supports SSE. See here in godbolt https://godbolt.org/z/Xqi_S7

For your docker test, you are using the i386/ubuntu image and so you need to explicitly tell gcc that you have a machine with sse instructions, e.g. -march=pentium4, which gives the desired behavior:

```compiling round with -mfpmath=sse -march=pentium4 x/y = 0.501782303180000055 expected x/y (round to even) = 0.501782303180000055 Expected assumes FLT_EVAL_METHOD = 1 Equal Equal FE_DOWNWARD : 0.501782303179999944 FE_TONEAREST : 0.501782303180000055 FE_TOWARDZERO: 0.501782303179999944 FE_UPWARD : 0.501782303180000056 FLT_EVAL_METHOD = 0 ```

The real question is why is gcc emitting any x87 instructions (except for long double) on architectures that support sse. I would expect -march=pentium4 to use sse by default, but sse has to be specifically requested.

1. I think you should be more specific and state that the issue is with the x87 floating point unit. The x87 has one internal precision that defaults to 80-bits, though this can be set to 32 or 64 bits using the FLDCW instruction to load the control word for the floating point unit. For most cases, people want and expect the internal precision to equal the operand precision (FLT_EVAL_METHOD=0), which is what SSE instructions do.

Did you get the part where if you add the flag ‘-O2’ then FLT_EVAL_METHOD remains unchanged, but the rounding becomes correct (meaning that it rounds to the nearest 64-bit float)?

For most cases, people want and expect the internal precision to equal the operand precision (FLT_EVAL_METHOD=0), which is what SSE instructions do.

If the x87 unit can do it, why isn’t it the default since we agree that it is what most people want?

1. Brian Kessler says:

Looking at godbolt, gcc -O2 appears to compute the division at compile time using the precision of the operands. I don’t see any division instructions issued at all, so the FLT_EVAL_METHOD is not relevant in this case since the x87 is not actually performing the calculations. Compile time and run time floating point calculations not matching is a whole other can of worms 🙂

As mentioned above, x87 unit can perform the calculations only at a single given precision by altering the precision in the control word. It could emulate FLT_EVAL_METHOD=0, but the precision would have to be updated when switching between any calculations using float, double or long double. I imagine this might not be the default behavior because of the performance overhead of loading and restoring precision in the control word for different precisions. For code that mostly or solely is using doubles, the compiler could avoid adjusting the precision most of the time at the cost of tracking the precision state throughout the code. It would be interesting to see if any compiler has a mode where it adjusts the x87 precision automatically to emulate FLT_EVAL_MODE=0

1. I don’t see any division instructions issued at all, so the FLT_EVAL_METHOD is not relevant in this case since the x87 is not actually performing the calculations. Compile time and run time floating point calculations not matching is a whole other can of worms

This is insanity.

1. Jules Blok says:

This is insanity

This is C++

2. Shash says:

Looking at godbolt, gcc -O2 appears to compute the division at compile time using the precision of the operands. I don’t see any division instructions issued at all, so the FLT_EVAL_METHOD is not relevant in this case since the x87 is not actually performing the calculations.

That’s easy fixed – just move the division to a function so that gcc can’t optimize it away: https://godbolt.org/z/4Tfcup

Interestingly, this gives us

```x/y = 0.501782303179999944 ```

2. That Singh says:

It’s not a bug, at least not as you have related it. If the calculation gets performed at compile time (as you have said), then you have to deal with the fact that the compiler has different knowledge than the run time… And the more you tell it to optimize, the more you cause work to be done in the compiler environment.

2. Simon Lindholm says:

Even with SSE enabled you still don’t necessarily get complete IEEE 754 compliance, since GCC by default emits fused multiply-add instructions whenever it can:

`#include <stdio.h>`

``` ```

```double a = 0.1, b = 10.0, c = -1; int main() { double ab = a * b; printf("%.5e\n", ab + c); } ```

This prints 5.55112e-17 with GCC -O2 -march=native (or -mfma), and 0.00000e+00 with GCC -O2. (Can be turned off with -ffp-contract=off.)

3. A.J. S. says:

If God did exist, really? I stopped reading right after that, get your stuff together or don’t write at all.

1. For Real says:

Oh, relax.

2. Joe Zbiciak says:

You must be a blast at parties.

3. Andrew Penner says:

Is this an attempt to dismiss the original post based on word choice as opposed to content? If you stop reading at your first offensive content you will not read much on the web 🙂

4. J. B. says:

Someone holds (presumably) different religious beliefs to you and that means they should stop writing and “get it together” (what does that even mean in this context – it makes no sense)? Never mind the fact that it’s totally irrelevant, I’m more interested in why you are apparently so special that you get to be judge of who can write. (Not really though – you’re a joke and so is your god.)

5. Chops says:

Possibly you’re not neuro-typical and the real meaning of these words is lost on you. If you think a statement about the existence of god was made here, then you’ve completely misunderstood.

6. The Dude says:

There was a god, once. But he’s dead now.

7. Mark Gomersbach says:

What’s so wrong about that sentence to discard the rest of the content and respond with a weak attempt at insult?

8. Eddy Current says:

GOD is REAL, unless declared INTEGER

That leaves the question: REAL4 or REAL8?
Since neither 4 nor 8 equals 3 we can safely deny the existence of the so called trinity.

4. john mullee says:

https://docs.python.org/3/tutorial/floatingpoint.html

“Python keeps the number of digits manageable by displaying a rounded value instead

>

1 / 10
0.1
Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.

5. Joe Zbiciak says:

It gets weirder than that. I get different results if I compile for 32-bit vs. 64-bit:

```\$ gcc-7.1.0 -m64 r.c \$ ./a.out 0.501782303180000055 \$ gcc-7.1.0 -m32 r.c \$ ./a.out 0.501782303179999944 \$ gcc-9.2.0 -m64 r.c \$ ./a.out 0.501782303180000055 \$ gcc-9.2.0 -m32 r.c \$ ./a.out 0.501782303179999944 ```

The 32-bit SysV ABI is actually pretty terrible with regards to floating point rounding consistency.

Here’s another great example of just how messed up it is:

`#include <stdio.h>`

``` float fp_test(void) { volatile float a = (float)(1ull << 60); volatile float b = (float)(1 << 0); return a + b; } ```

```int main() { volatile float c = (float)(1ull << 60); printf("%g\n", fp_test() - c); return 0; } ```

If you compile this for the 32-bit and 64-bit ABIs, you’ll get different results:

```\$ gcc-9.2.0 -m32 -fno-inline abi.c -o abi-32 \$ gcc-9.2.0 -m64 -fno-inline abi.c -o abi-64 \$ ./abi-32 1 \$ ./abi-64 0 ```

That anomaly arises from the fact that the i386 SysV ABI has floating point values returned on the stack, and they retain more precision than if they were rounded to the requested size. (Page 38: i386 SysV ABI)

And there’s this additional caveat in the i386 SysV ABI:

Looking around, it seems GCC does implement a flag, `-frounding-math`, that should disable the assumption of the same rounding mode everywhere. But… Just go read the associated PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678

It looks like the closest thing I’ve found to an official position is at the bottom of the page here: https://gcc.gnu.org/wiki/FloatingPointMath

The tl;dr, as I see it? “On i386 and M68K, we throw are hands in the air and do the fast thing, not the pedantically bitwise accurate thing, as that’s what we think most folks actually want.”

Note on x86 and m68080 floating-point math
For legacy x86 processors without SSE2 support, and for m68080 processors, GCC is only able to fully comply with IEEE 754 semantics for the IEEE double extended (`long double`) type. Operations on IEEE double precision and IEEE single precision values are performed using double extended precision. In order to have these operations rounded correctly, GCC would have to save the FPU control and status words, enable rounding to 24 or 53 mantissa bits and then restore the FPU state. This would be far too expensive.

The extra intermediate precision and range may cause flags not be set or traps not be raised. Also, for double precision, double rounding may affect the final results. Whether or not intermediate results are rounded to double precision or extended precision depends on optimizations being able to keep values in floating-point registers. The option `-ffloat-store` prevents GCC from storing floating-point results in registers. While this avoids the indeterministic behavior just described (at great cost), it does not prevent accuracy loss due to double rounding.

On more modern x86 processors that support SSE2, specifying the compiler options `-mfpmath=sse -msse2` ensures all `float` and `double` operations are performed in SSE registers and correctly rounded. These options do not affect the ABI and should therefore be used whenever possible for predictable numerical results. For x86 targets that may not support SSE2, and for m68080 processors, use long double where exact rounding is required and explicitly convert to `float` or `double` when necessary. Using `long double` prevents loss of 80-bit accuracy when values must be spilled to memory. See x87note for further details.

6. Hmmm. I copied this to a file t.c, and complied -O3 on a an old Core i7-2670QM CPU, running linux mint 19.3

```[email protected]:~\$ gcc -O3 t.c [email protected]:~\$ ./a.out x/y = 0.501782303180000055 ```

This is gcc 7.5.0-3ubuntu~18.04. Could you describe your environment in greater detail?

1. I specify the docker image. I cannot be more precise than that. Have you tried running the script and generating the docker container?

1. Ok, I missed that you were doing this with a 32 bit target. Mine is a 64 bit platform. SSE and above will behave more sanely.

7. Warren Henning says:

Is it considered acceptable if the program output changes based on optimization levels? Changing -O0 to -O3 in that Godbolt link changed the program output for me: https://godbolt.org/z/rDTmn8 . I love computers!

8. Mathias Gaunard says:

The compiler is allowed to perform any operation with higher precision than requested.

There is apparently an interesting edge case in that your operation, when done with infinite precision, then rounded to 80-bit Intel, then rounded again to standard binary64, does not yield the same result than if done with infinite precision then directly rounded to standard binary64.

1. Al says:

Classic double rounding

9. Heikki Kultala says:

SSE does not support 64-bit double precision floating point numbers. It only supports 32-bit floating point numbers, so x87 is still used for double calculations.

Only SSE2 supports doubles.

10. Moschops says:

Is it a bug? I don’t know enough about the IEEE floating point specification, but is there a requirement to round to the nearest value rather than either of the two bounding values? Or if the requirement isn’t in that, where is it? What specifies that one of the two bounding values should be chosen over the other?

1. Mathias Gaunard says:

It is not a bug, see my comment.
It is allowed for the compiler to use arbitrarily more precision than requested. Sometimes (in some pretty specific conditions), that leads to an error of one bit in the mantissa.

11. Silviu says:

I think it matters how you compiled the GCC compiler itself, since those computations are done at compile time(i.e. the compiler(which is an executable compiled woth some other flags) does thrm for you).
Never bothered to check, just thought it might be relevant.

12. Michael K. says:

Hi Daniel,
I wonder what the existence of God has to do with floating point properties or some compiler implementations.
Though my faith is not always as big as I wish, I do believe in God. There are people who do not believe in God. I respect this. But I think it is arrogant to deny God, because men think, we have the total understanding of the world and the universe.
I have a deep sympathy fro the ancient scholars, who addressed themselfes to theological questions. (Have you heard about Gottfried Wilhelm Leibniz, Joseph Fourier, Blaise Pascal or Michael Faraday?)
Regards,
michaeL

I cannot say for sure, but I’m pretty confident he meant no insult. To me it looks more like a saying, perhaps a bit clumsily translated from French?

Back to the topic, I find it quite fascinating that there are so many interpretations to the standard, depending on how one reads it, of course. As a former member of a custom C compiler for an in-house DSP processor, I have to say my memory is quite a bit fade on the topic. Shame on me, though…

1. To me it looks more like a saying, perhaps a bit clumsily translated from French?

I do not think that it is a saying. It is a reference to the fact that we do not live in a universe where things work out nicely. There are ugly bits.

1. Saheed says:

Great sir, you just discovered one ugly bit (bug) in a system created by thousands of intelligent people. Permit me to point out that the very existence of God is reinforced by man’s continuos discovered of so many ugly bits we sneak into the perfect system He created, either deliberately or by mistake. Yet the system still works, even when for the most part we don’t know why.

While I totally disagree with the reaction of the first person that pointed this out. The fact remains that the literary construct:

If God did exist, the variable ratio would be 0.50178230318 and the story would end there. Unfortunately, there is no floating-point number that is exactly 0.50178230318.

is offensive to people of faith. The logical conclusion of those two statements denies the existence of God. It is totally uncalled for in the context of the subject at hand. Judging by the target audience of the subject, it should be no surprise that those statements are interpreted as such.

Your reply implies that it is a deliberate construct. I say it is just wrong and clearly inconsiderate.

1. Moschops says:

“I say it is just wrong and clearly inconsiderate.”

Well it’s not. It’s a not uncommon English turn of phrase; completely forgivable for a non-native English speaker to not know it, and native English speakers who’ve missed it simply aren’t reading enough.

2. Moschops says:

Exactly so. That’s the meaning as commonly understood. It is no comment at all about religion, or the existence of god, or anything like that. It’s akin to saying “in a just world”, or “in a sensible universe”. This obsession with the literal seems to be more common amongst programmers; joyfully throwing out the pleasures of communication and language.

2. José Luis says:

Worst: I had heard of Newton.

3. José Luis says:

Michel, don’t be so sensitive (or I may use the childish lenguage? ) about a common expresion (out of your house or church), we are talking on other dimension (a well known dimension in which gods are irrelevant as Laplace said to Napoléon Bonaparte).

13. Marshall Ward says:

For what it’s worth, this is what I get from GCC 10.1 with libc 2.31:

```x/y = 0.501782303180000055 ```

My machine is an AMD Ryzen 5 2600. Not yet sure how to compare the assembly output or various architecture compiler flags.

Our group has recently had to grapple with various floating point answer in changes in libm from libc 2.22 to 2.27, so it does seem some attention is being paid to these issues.

1. Marshall Ward says:

Just tested this on several GCC versions (6.1, 7.3, 8.1, 8.2, 9.1) on a Sandy Bridge Xeon, and also get the “correct” answer as above.

I am unfortunately not very versed in Docker, but FP repro is very important to our work and I’d like to understand this more, so I’ll try to try and familiarize with it.

2. Marshall Ward says:

Using `-m32` appears to give me the “wrong” answer:

```\$ ~/test/float\$ gcc -m32 div.c \$ ~/test/float\$ ./a.out x/y = 0.501782303179999944 ```

(I also now see Joe Zbiciak has observed the same…)

14. Tony Reix says:

With GCC v9 on Fedora32/x86_64 : OK :

gcc /tmp/C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -O2 /tmp/C.c -o C ; ./C

x/y = 0.501782303180000055

gcc –version

gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)

With GCC v8.4 on AIX : OK :

gcc C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

gcc -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

gcc –version

gcc (GCC) 8.4.0

g++ -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++ -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

Same with GCC v9.3 :

g++ -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++ -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O2 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O0 -maix64 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O0 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 -O2 -maix32 C.c -o C ; ./C

x/y = 0.501782303180000055

g++-9 –version

g++-9 (GCC) 9.3.0

You should use Pari/gp !

0.1+(0.2+0.3)

%1 = 0.60000000000000000000000000000000000000

(0.1+0.2)+0.3

%2 = 0.60000000000000000000000000000000000000

50178230318.0 / 100000000000.0

%1 = 0.50178230318000000000000000000000000000

Ha ha ha !!

15. Marc-Olivier Killijian says:

Hi Daniel, just to let you know that gcc 8.3.0 on Mac (target x86_64-apple-darwin18) works correctly and gives “x/y = 0.501782303180000055”.

16. A. Banker says:

The compiler chose to do banker’s rounding.
You think the right answer is this:

1100001010111011110101000011101111000010101110111101010000111011

but it is bounded below by this:

1100001010111011110101000011101011000010101110111101010000111010

Round to even will round to the lower value if it ends in 0.

17. zero says:

Did you try compiling with -mpc64? From the man page,

-mpc32
-mpc64
-mpc80
Set 80387 floating-point precision to 32, 64 or 80 bits. When -mpc32 is specified, the significands of
results of floating-point operations are rounded to 24 bits (single precision); -mpc64 rounds the
significands of results of floating-point operations to 53 bits (double precision) and -mpc80 rounds the
significands of results of floating-point operations to 64 bits (extended double precision), which is the
default
. When this option is used, floating-point operations in higher precisions are not available to
the programmer without setting the FPU control word explicitly.

``` Setting the rounding of floating-point operations to less than the default 80 bits can speed some programs by 2% or more. Note that some mathematical libraries assume that extended-precision (80-bit) floating- point operations are enabled by default; routines in such libraries could suffer significant loss of accuracy, typically through so-called "catastrophic cancellation", when this option is used to set the precision to less than extended precision. ```

18. zero says:

Go to godbolt.org with the godbolt code provided in this article, and add the compilation switch -mpc64. Problem gone. It looks like the solution is (advertised) in the manual.

1. @zero

Problem gone. It looks like the solution is (advertised) in the manual.

Are you sure?

I am not so sure. Yes. It will solve the 64-bit case, but does it ensures that the 32-bit floats do not suffer from the same double rounding problem? If the values are rounded to 53-bit mantissa and then rounded again to a smaller mantissa, the same problem could happen again.

As far as I can tell, there is only way correct way to solve the problem and it is to round once from the infinite precision computation to the desired computation, without any intermediate.

And then there is the warning about catastrophic failing of some libraries if you invoke this 64-bit flag. Does it include the standard mathematics library? It might, right?

As far as I can tell, the only proper fix is the SSE approach. But it is not enabled by default.

19. zero says:

Hi, did you find an example where operations with variables of type ‘float’ suffer from rounding due to performing the operation with data type ‘double’ before being rounded down to type ‘float’ again?

One could also look at why -mpcXX is there. Looking at the relevant Intel manuals, one will find notes about the precision control bits. At first glance, it looks like GCC might be simply parroting what the hardware is doing (this is also the argument why the GCC bug you referenced was marked as invalid — and note the same bug’s comments suggest -mpc64 and setting the default path to SSE rather than then x87, as per the GCC documentation). Did you look at the actual Intel manuals on this?

By the way, is the difference noted here within one ulp of the most precise result? Is that within the expected error for the operation in question? Is there an actual bug here? Of course, when exact values are required in compiled binaries, one could always specify the actual floating point value in binary (e.g. with a union).

Regarding the other libraries that might be affected, one would have to read the relevant documentation to see if the libraries one wants to use are compiled (binary libraries) or can be compiled (from sources) to a level of precision that is compatible with one’s intent. This is just good practice.

Finally, I am not celebrating this level of complexity…

20. Hi, did you find an example where operations with variables of type ‘float’ suffer from rounding due to performing the operation with data type ‘double’ before being rounded down to type ‘float’ again?

Well, the compiler is providing us no guarantee, right? How lucky do you feel?

I could investigate, but I have already ruled out this mode of GCC as unacceptable.

One could also look at why -mpcXX is there. Looking at the relevant Intel manuals, one will find notes about the precision control bits. At first glance, it looks like GCC might be simply parroting what the hardware is doing (this is also the argument why the GCC bug you referenced was marked as invalid — and note the same bug’s comments suggest -mpc64 and setting the default path to SSE rather than then x87, as per the GCC documentation). Did you look at the actual Intel manuals on this?

I don’t think Intel can be blamed.

For example, evidently the compiler is itself, at compile-time, evaluating the expression with full precision. The net result is that if you do “double x = value1; double y = value2; if(x/y = value1/value2) ..”, the expression will evaluate to false under some optimization levels but not others.

The fact that the expressions evaluate to different values at compile-time than at runtime has nothing to do with Intel.

By the way, is the difference noted here within one ulp of the most precise result? Is that within the expected error for the operation in question? Is there an actual bug here? Of course, when exact values are required in compiled binaries, one could always specify the actual floating point value in binary (e.g. with a union).

At compile-time, the evaluation is exact. It is only at runtime that it differs.

Regarding the other libraries that might be affected, one would have to read the relevant documentation to see if the libraries one wants to use are compiled (binary libraries) or can be compiled (from sources) to a level of precision that is compatible with one’s intent. This is just good practice.

The GNU math library publishes its ULP constraints, and you have (often) at most 1 ULP as a promise. Unfortunately, I found that this was optimistic: it does not apply when x87 is active… it can go much higher than stated in the documentation. Now, what happens when you set the rounding to 64, from what we can tell, is likely to be worse.

I have not investigated the matter more, and I do not plan to. It is not a viable platform as far as I am concerned. Not in 2020.

1. Wilco says:

I don’t think Intel can be blamed.

It is an x87 problem, you will not get this on other ISAs. There are other ISAs that also support 80-bit long double, but they have instructions that produce float or double results (for example 68k has fsmul, fdmul).

For example, evidently the compiler is itself, at compile-time,
evaluating the expression with full precision. The net result is that
if you do “double x = value1; double y = value2; if(x/y =
value1/value2) ..”, the expression will evaluate to false under some
optimization levels but not others.

That can only happen on x87. Floating point expressions will give the same results at compile-time and at run-time by default.

The fact that the expressions evaluate to different values at
compile-time than at runtime has nothing to do with Intel.

A user can explicitly switch on aggressive FP optimizations, change the rounding mode or use flush-to-zero. This will change the result at runtime, and that is expected. However on x87 results are different even if you don’t do any of this. And given all the bug reports, that is not obvious nor expected!

21. zero says:

(note SSE is enabled by default for 64 bits — enabling this default for 32 bits now would break conceivably every program that relied on default settings… not worth that chaos, especially when you can control the behavior with a switch)

22. Wilco says:

It’s worth updating the title – this is a x86-specific issue. Other ISAs like Arm, Power, MIPS etc do not have this issue – they fully support IEEE and give consistent answers.

It’s well-known among floating point experts that the x87 floating point unit has many serious IEEE conformance issues. Let’s ignore the transcendental functions which are not only very slow but also incredibly inaccurate.

Even when switched to the pc32/pc64 modes it is not possible to calculate floating point values correctly. This is because x87 always uses 15 exponent bits (it is still using the 80-bit format internally), so you still get double rounding results. Even more strangely, the value may change when a register is stored to memory – only then the extra exponent range is reduced to the correct IEEE value!

So if you use a value just computed in a register you may get a different result if it was stored to memory. A compiler may spill variables at any time. You can use the same variable several times from a register but if at some point it is stored/spilled, its value may suddenly change…

So an expression like (x > y && x > y) can evaluate to false even if the first x > y is true! Unbelievable, right?

To avoid this you need to use -ffloat-store to force register values to memory so that the extra rounding happens immediately and the value is now consistent. This option slows down x87 floating point code significantly. So you get to choose between fast and inconsistent, or consistently incorrectly rounded and slow…

In conclusion: if you care about floating point at all, do not ever use x87.

1. Wilco says:

You’re welcome. I learned much of this the hard way… I had the pleasure of debugging an instruction set simulator which used native floating point instructions to simulate IEEE floating point hardware. It worked fine for the developers (IIRC they used SPARC workstations), but it failed when we ran our IEEE testsuite on Windows. The developers were adamant the simulator was correct since our failing testcases passed on their system.

After some time we convinced them to switch to a software floating point library on x86 since otherwise we couldn’t ever trust the simulator to correctly emulate IEEE floating point (you can’t test all inputs to be sure). For similar reasons modern compilers use software floating point to evaluate floating point constants.

1. @Wilco

You don’t get the same problems by default with LLVM clang on x86. That’s what I mean by “Intel cannot be blamed”.

Can we blame Intel for producing a bad x87 design? Sure. But we are talking about something done in the 1980s… by people who probably retired a long time ago.

I would rather blame the people today who insists that we must keep on defaulting on x87 because otherwise… too many things might break.

I don’t understand this attitude. When old things are bad, it is you duty to leave them behind and move on.

Compilers should say “we no longer support x87, it is shit” and be done with it. If people want to keep supporting it, let them add crazy flags to their command lines.

1. Wilco says:

Yes I completely agree that the -fpmath default for 32-bit x86 should have been changed years ago in GCC. LLVM uses a different default, but unlike GCC it has no need to be “bug compatible” with many applications…

In general GCC maintainers tend to be very conservative. Change is being seen as risky, breaking software or creating unnecessary extra work. I managed to make various improvements to GCC defaults in the last few years, so there is progress, but unfortunately it is slow.

It would be useful to create requests to change the -fpmath default – if there is enough demand, maybe it will finally happen. Note the x86 backend maintainers (which include Intel employees of course) would have the final say.

23. Jamie says:

You can solve this properly with GCC’s fpu_control.h as described at:

http://christian-seiler.de/projekte/fpmath/

#include

fpu_control_t fpu_oldcw, fpu_cw;
_FPU_GETCW(fpu_oldcw); // store old cw
fpu_cw = (fpu_oldcw & ~_FPU_EXTENDED & ~_FPU_DOUBLE & ~_FPU_SINGLE) | precision;
_FPU_SETCW(fpu_cw);
// calculations here
_FPU_SETCW(fpu_oldcw); // restore old cw

Precision may be one of:

_FPU_SINGLE
Single precision
_FPU_DOUBLE
Double precision
_FPU_EXTENDED
Double-extended precision

24. The ALGLIB library ran into similar issues with floating point rouding, early on, and took the easy (but awkward) way out: by defining functions for each of the comparison operators. In my version (over on GitHub under AlgLib_cpp), I removed the functions and put the operators back in, and then made a note of the problem, and its resolution in the manual (Manual.htm, sections 3.1 and 5.5) – the problem being specific to the Intel FPU and to later processors that include it.

It required setting the FPU up to round to IEEE double floating point (fldcw 0x27f or _FPU_SETCW(0x27f) under GCC using the macro _FPU_SETCW from . It appeared that it was fixed later in GCC, though I don’t know which version the repair took place in.

So, in relation to to the issue you brought up, there is some history behind this issue showing that there’s been trail and error trying to get the rounding issues resolved.

You may subscribe to this blog by email.