The fallacy of absolute numbers
I often come across the following type of arguments in research papers:
- You could save 3 bits of storage for every value in your database. Surely that’s irrelevant. Nobody cares about saving 3 bits!
- You can sort arrays in 10 ms. Surely, that cannot be improved upon? You are already down to 10 ms and nobody cares about such small delays.
I hope you can see what is wrong with these statements?
I call it the fallacy of absolute numbers: you express a measure or a gain in absolute value, and then conclude to optimality or near optimality because the number appears small (or large).
Remember: Saving 3 bits of storage out of 6 bits is a 2:1 compression ratio. Sorting in 5 ms instead of 10 ms doubles the speed.
Disclaimer: I am sure that someone else has documented this fallacy, but I could not find any reference to it.
Montreal, Canada 
Follow on
Frederick Mosteller coined the term numerator-only data for things like this.
Comment by John — 18/6/2010 @ 13:16
You’ve got to love blogging! Thanks!
I did read your blog post back then, I’m sure, but I never connected it with what I see in research papers.
Comment by Daniel Lemire — 18/6/2010 @ 14:06
Hi Daniel, love your blog.
I see your point, but…
I just did a Google search for ‘fish’, the results… “About 359,000,000 results (0.17 seconds) ”
Suppose Google told me that they could make it 100 times faster, just 0.0017 seconds!
I really would not care, for me, in this context there is no difference between 0.17 seconds and even 0.00000000000017 seconds.
Of course, you might argue that if I build a crawler can call google a million times, then I would care. This is true, but there really are papers that make similar claims in domains for which we just don’t need speedup.
One example is a paper on a faster way to do a calculations on human ancestor remains. They had a speed-up of a factor of two. However, every prehistoric human ancestor remain we have could comfortably be placed in a small suitcase. Making the algorithm faster was polishing the wrong apple, we just don’t need to speedup that problems.
Comment by Anonymous — 20/6/2010 @ 20:34
When people talk about the improving or comparing any algorithm the only meaningful way to present it is the Pareto front. I learned about it too late, possibly should put blog post about it.
Comment by Alex Mikhalev — 21/6/2010 @ 6:13
On the other hand, beware of the fallacy of relative numbers: if my web site has the fastest growth in access, that is may be because it went up from 1 access (myself) to, say, 100. This is usually clear with the rate of adoption of, say, new browsers.
Comment by Muhammad Alkarouri — 29/6/2010 @ 20:33