In Mathematics, we typically require equality to form equivalence classes. That is, it should be reflexive: A should be equal to A. Moreover, it should be symmetric: if A is equal to B, then B should be equal to A. Finally, it should be transitive: if A is equal to B, and B is equal to C, then A must be equal to C.

These conditions appear to make perfect sense. Yet almost all programming languages fail to enforce reflexivity, symmetry and transitivity.

• Languages such as XPath fail to be transitive. Indeed, consider the following numbers

x = 10000000000000001;
y = 10000000000000000.0;
z = 10000000000000000;

Then x = y and y = z are true, but x = z is false. The problem with XPath is that it automatically casts integer-to-float comparisons to float-to-float comparisons.

But even if you restrict yourself to integer types, XPath still fails to provide transitivity: (1,2) is equal to (2,3) which is equal to (3,4), yet (1,2) fails to be equal to (3,4).

• If you try the same test with the values x,y, z in JavaScript, you will find that transitivity is preserved because… x = z. That’s because JavaScript has no integer type. JavaScript uses finite-precision floating point numbers to represent all numbers.

However, floating point numbers have problems of their own. For one thing, they are not reflexive. That is, there is a value such as x = x evaluates to false. (The NaN marker.) This is only one of several challenges that floating point numbers represent.

Conclusion. When working out a theory, it is easy to come up with simple axioms. People often think that software is just a representation of mathematics that can be whatever we want it to be. Thus, software should obey intuitive and simple axioms. However, what we get, instead, are organically grown compromises. There is sometimes little difference between studying software and studying nature. Both can be surprising.

Update: John Regehr pointed me to comparisons in PHP. It is slightly insane trying to figure out what is equal to what.

Update 2: Jeff Erickson reminded me of a very funny presentation by Gary Bernhardt.

1. Old news:

What Every Computer Scientist Should Know About Floating-Point Arithmetic

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Comment by Anon — 3/9/2012 @ 14:57

2. Giving a theory a sound axiomatic structure is an hard job: ask Euclid, Peano, Hilbert…

The same is true for designing consistent and efficient finite precision arithmetic systems, both for human and machine computers! (Long before the introduction of computing machines, approximate numerical calculations were performed by great scientists like Gauss.)

While naive mapping between exact number theory and finite precision arithmetic may give paradoxical results, there is much more than “organically grown compromises” in the specification and implementation of let say IEEE 754 floating point arithmetic.

In fact as much as an axiomatic system is the end point of a long and complex evolution (with errors, corrections, changes of perspective) so the main driving force in the present day deployment of computer arithmetic is a profound and mathematically sound understanding of finite precision calculations.

Comment by Stefano Miccoli — 3/9/2012 @ 15:59

3. @Stefano

There is a good reason why floating point numbers are not reflexive, but it is still a compromise. They were very concerned with performance, for example.

Comment by Daniel Lemire — 3/9/2012 @ 16:43

4. > There is a good reason why floating point numbers are not reflexive, but it is still a compromise. They were very concerned with performance, for example.

Nope. Not performance. The lack of reflectivity comes directly from the fact that a base-2 number system (floating point numbers in most conventional computers are base-2) with limited precision can not accurately represent all possible fractional base-10 number values.

Just as base-10 can not accurately represent the fraction 1/3, base-2 is worse. Any fractional value which does not have 2 as the only prime factor of the denominator is an infinitely repeating base-2 value (http://en.wikipedia.org/wiki/Binary_numeral_system#Fractions_in_binary). And just as .3333 is an approximation in base-10 of 1/3, 0.000110011 in base-2 is only an approximation of 1/10.

When combined with a finite size representation (usually 32 or 64 or 128 bits) any fractional value which is actually a repeating base-2 value is approximated by floating point. It is this approximation that results in the lack of reflexivity.

Comment by Anon — 3/9/2012 @ 17:06

5. @Anon

I disagree.

(1) Only NaN has the property that it is different from itself. The number 0.000110011 in base 2 is equal to itself.

Formally, you do not need the number NaN. The software could simply bail out and throw an exception instead of using NaN. This would introduce compulsory branching throughout and be a bit bad… though we could probably optimize some of the cost away given enough work.

(2) Arbitrary precision real number arithmetic is indeed possible. Symbolic algebra system have been doing for decades. But you won’t be crunching billions of numbers per second using those.

There is nothing very hard in programming a computer to keep track of 1/10, it is just hard to do it fast.

(3) The problems I pointed out in my blog post have nothing to do with representing all possible fractional base-10 number values. Indeed, I do not even use fractions.

Comment by Daniel Lemire — 3/9/2012 @ 17:26

6. I think failure in this 3 rules is not big problem. The main problem is that programming language’s creators deals with this problem in different way than you may think. For example, 2+3*4 can be solved as 2+(3*4) but creator may do this in this way: (2+3)*4.
And what is worse each creators can define his own method of solving one particular problem and can results in different results. Even in one PL you can achieve 2 different results. Some time ago I have been trying some haskell’s tutorial. They have showed how to make this table: [1.0,1.2,1.4,1.6,1.8,2.0]. In normal way: [1.0,1.2..2.0] gives you this: [1.0,1.2,1.4,1.5999999999999999,1.7999999999999998,1.9999999999999998]. Other method I have worked up is: map (/10) [10,12..20] which gives proper results.

Comment by Darek Nędza — 4/9/2012 @ 11:44

7. “We used to think that if we knew one, we knew two, because one and one are two. We are finding that we must learn a great deal more about ‘and’.”

–Sir Arthur Eddington, The Harvest of a Quiet Eye

Comment by B.J. Murphy — 4/9/2012 @ 19:38