I often come across the following type of arguments in research papers:

  • You could save 3 bits of storage for every value in your database. Surely that’s irrelevant. Nobody cares about saving 3 bits!
  • You can sort arrays in 10 ms. Surely, that cannot be improved upon? You are already down to 10 ms and nobody cares about such small delays.

I hope you can see what is wrong with these statements?

I call it the fallacy of absolute numbers: you express a measure or a gain in absolute value, and then conclude to optimality or near optimality because the number appears small (or large).

Remember: Saving 3 bits of storage out of 6 bits is a 2:1 compression ratio. Sorting in 5 ms instead of 10 ms doubles the speed.

Note: I am sure that someone else has documented this fallacy, but I could not find any reference to it.

5 Comments

  1. Frederick Mosteller coined the term numerator-only data for things like this.

    Comment by John — 18/6/2010 @ 13:16

  2. You’ve got to love blogging! Thanks!

    I did read your blog post back then, I’m sure, but I never connected it with what I see in research papers.

    Comment by Daniel Lemire — 18/6/2010 @ 14:06

  3. Hi Daniel, love your blog.
    I see your point, but…
    I just did a Google search for ‘fish’, the results… “About 359,000,000 results (0.17 seconds) ”

    Suppose Google told me that they could make it 100 times faster, just 0.0017 seconds!

    I really would not care, for me, in this context there is no difference between 0.17 seconds and even 0.00000000000017 seconds.

    Of course, you might argue that if I build a crawler can call google a million times, then I would care. This is true, but there really are papers that make similar claims in domains for which we just don’t need speedup.

    One example is a paper on a faster way to do a calculations on human ancestor remains. They had a speed-up of a factor of two. However, every prehistoric human ancestor remain we have could comfortably be placed in a small suitcase. Making the algorithm faster was polishing the wrong apple, we just don’t need to speedup that problems.

    Comment by Anonymous — 20/6/2010 @ 20:34

  4. When people talk about the improving or comparing any algorithm the only meaningful way to present it is the Pareto front. I learned about it too late, possibly should put blog post about it.

    Comment by Alex Mikhalev — 21/6/2010 @ 6:13

  5. On the other hand, beware of the fallacy of relative numbers: if my web site has the fastest growth in access, that is may be because it went up from 1 access (myself) to, say, 100. This is usually clear with the rate of adoption of, say, new browsers.

    Comment by Muhammad Alkarouri — 29/6/2010 @ 20:33

Sorry, the comment form is closed at this time.

« Blog's main page

Powered by WordPress