Better computational complexity does not imply better speed

A recent magazine article presents a theoretical result: Harvey and van der Hoeven have shown that you can multiply two n-bit integers using O(n log n) complexity. Roughly speaking, this means that as n grows large, there is a mathematical model of how long the resulting algorithm might run that grows like n log n in the worst case. That is, it does not get much worse as n gets larger, mathematically speaking. It is likely that this result will go into the textbook as an important contribution to theoretical computer science.

So far so good. But then the magazine articles then state that the authors have presented a faster way to compute multiplications. That is incorrect. At this point in time, they have not make this demonstration or even this claim, to my knowledge. Possibly they have an implementation somewhere and might be able to answer my questions, but I don’t expect so.

So what is the problem with claiming that they have a faster and more efficient way to do multiplications?

  • Computational complexities like O(n log n) are not a speed or even a time in the real world.
  • They are not even a model of the running time or speed. They are a mathematical asymptotic model for n large. How large must n be for a favorable model to theoretically beat a less favorable model mathematically? Possibly as large as the number of atoms in the universe. Or maybe even larger than that. I am not joking: ‪as far as I can tell, the Harvey and Van Der Hoeven would require more digits per integer than there are atoms in the universe to have a chance at being practical. Please pause to consider: if you are a scientist, you need to make predictions that apply to our universe otherwise you are doing mathematics. Mathematics is a fine activity, but it is not to be confused with science or engineering. I can hear people clamouring that, in the end, big n always win out. Let me quote a famous theoretical computer scientist on this topic:

    Asymptotics will eventually win out, as long as everything else stays fixed. But that’s the precise problem. Everything else doesn’t stay fixed.

  • But it gets worse: these are not scientific models. A scientific model would predict the running time of the algorithm given some implementation, within some error margin. However, these models do nothing of the sort. They are purely mathematical. They are not falsifiable. As long as they are mathematically correct, then they are always true. To be fair, some researchers like Knuth came up with models that closely mimic reasonable computers, but that’s not what people pursuing computational complexity bounds do, typically.
  • Importantly, this means that these asymptotic models make no prediction about the running time of actual code. If I ask Harvey and van der Hoeven how long (in seconds) their algorithm take to compute the multiplication between 1024-byte integers on my macbook, they cannot tell me for their paper alone. Thus they don’t know that it is actually faster than the big-integer library I am using (gmplib).
  • Maybe you would argue that, eventually, their algorithm would beat whatever gmplib has implemented, given large enough integers. But that’s not a scientific (i.e., falsifiable) statement. I could implement their algorithm and apply it to 100000-byte integers and get that their approach is no faster. You could then argue that I need to go even larger. If I do and fail again, you might argue that I need to go even larger. That is not how science or engineering should work. If you have a scientific model, it should make predictions that can be correct or false about the real world. And maybe you end up telling me that if I use 21024 bits per integer, then surely the implementation of the new algorithm is faster: but it is again an unscientific statement because there is no such thing as 21024-bit integers and there will never be in this universe.
  • Possibly Harvey and van der Hoeven have more precise bounds than just a big-O result. But even if they make falsifiable predictions about actual use cases, it does not follow that they are correct. They are working from a mathematical model. It is an idealized version of a computer. Without checking, you cannot tell whether your model is correct. You have to test it out. And, importantly, if you make flawed predictions, then your model is incorrect from a scientific point of view. And what are you modelling exactly? If you are assuming that a computer that manages petabytes of memory works the same as a computer that has gigabytes of memory, something is fishy.

I am not just nitpicking. This is of critical importance. Medical researchers cure Alzheimer’s or cancer in marvelous ways almost weekly using “animal models” or in vitro (in the laboratory). These breakthroughs almost never translate in the real world. It is ridiculously hard to go from models to applications.

If you do theoretical work and arrive at a model that suggests that you have a better, faster algorithm, then you are not nearly done. The map is not the territory. If you are good, then your model should match closely reality. But, no matter how smart you are, and no matter how clever your mathematical work is, you can only claim to be solving a problem faster after you have built it into a computer and recorded the elapsed time.

I am not diminishing Harvey and van der Hoeven accomplishments. If my understanding is correct, their names will be in textbooks for a very, very long time. It is well deserved based on the mathematical construction.

But we are not able to multiply integers faster thanks to their paper. This paper of theirs is not an engineering contribution. To be fair, it may lead to one such contribution. Working with models is often a useful way to do better engineering. However, you have to keep in mind that the model is not reality and that reality the only thing that matters at the end. If you lose track of this important fact, then you are falling back into prescientific thinking.

What if you disagree with my post? What if you think that Harvey and van der Hoeven’s contribution is indeed a step forward engineering-wise? Then you have a clear path forward: provide an implementation, benchmark it against well established software (e.g., gmplib) and show the benefits. No amount of white board arguments can make this requirement go away: show me the code.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

13 thoughts on “Better computational complexity does not imply better speed”

  1. It should be possible to calculate the tipping point when the authors algorithm outperforms gmplib without implementing & running the algorithm. Just don’t ignore all constants and lower order terms in the cost function. It would still be an approximation, so your remark remains valid, however it would make clear if the algorithm optimization still holds on a realistic computer system.

    1. it would make clear if the algorithm optimization still holds on a realistic computer system

      Importantly, in the breakthrough paper, the authors make no such claim as far as I can tell.

      I think you could make reasonable estimates if you are sufficiently careful, and it could be a scientific claim (i.e., that can be invalidated). Knuth has done such work, for example. It is uncommon.

  2. I think the primary reason they don’t even demonstrate their algorithm is – if I understood it correctly – that the algorithm doesn’t actually differ in running behavior from previous best theoretical run time complexity algorithms for reasonably sized inputs. Here being outside “reasonable” means that tipping point where differences start to show may require more storage than the whole observable universe could provide for a classical computer, plainly as a requirement for inputs that would show the difference. Surely this is well beyond any real-life machinery we could think for practical benchmarking…

  3. You seem a bit too harsh on purely mathematical algorithms research. Perhaps their algorithm isn’t generally better-in-practice, but their analysis opens the door to someone else making further progress later. Of course I do wonder if much of the research going on now will ever lead anywhere interesting. But in principle there can be value in foundational theoretical work.

  4. Better computational complexity does not imply better speed

    It implies there exists a point where it will have a better speed past a certain value of n.

    Mathematics is a fine activity, but it is not to be confused with science or engineering

    Good. It would be an embarrassment to be known as a scientist or an engineer.

    1. It implies there exists a point where it will have a better speed past a certain value of n.

      It does not.

      It would be an embarrassment to be known as a scientist or an
      engineer.

      People who look down on engineering should not make engineering claims.

  5. Good post, and fully agreed! Unfortunately this issue is prevalent even among software engineers. All too often people focus on theoretical worst-case behaviour without any regard for performance in the real world (eg. replacing Quicksort with a much slower sort).

    In my experience a good implementation of an algorithm with worst case of O(N * N) can seriously outperform a guaranteed O(N) algorithm (see eg. [1]). For the strstr case the constant factors in the O-notation vary by about about 2 orders of magnitude, and triggering the worst cases requires special randomized inputs which maximize the number of branch mispredictions in each implementation.

    So yes, claiming one algorithm is better than another solely based on theoretical complexity is completely missing the point. Another crazy assumption is that using O(N) temporary storage is just for “free”. Many string search algorithms require a pointer for every input character, ie. 8N bytes for N input bytes, making them completely useless in the real world.

    [1] https://github.com/bminor/glibc/blob/master/string/strstr.c

  6. Please pause to consider: if you are a scientist, you need to make predictions that apply to our universe otherwise you are doing mathematics.

    I think that’s the issue and it’s one of semantics. Theoretical CS has always had one foot firmly in the “pure math” camp. Based on my non-exhaustive Google search, it seems that a lot of people are in the “math is not science” camp because it lacks empirical verification or experiments involving natural phenomena. So therefore the “S” in “CS” becomes problematic for you.

    Really, who cares? At the level of the original paper and the claims of the author, no one is being fooled. Everyone in the know understands the difference between these theoretical bounds and real-world performance. Yes, it’s math result, not a “applied CS” result. A lot of other important results in CS qualify as “math”. What do you want the others to do? Refuse to publish their (IMO, very important) result because it doesn’t satisfy your “this universe, empirical” requirement?

    So I think you can only really pick a bone with the terminology in the Quanta article. Frankly, this is an article for lay people, even if Quanta still goes deeper than say “USA Today”. Yeah, it is very remiss for not mentioning that this technique is totally infeasible. I don’t think the bar is very high for this kind of article and honestly a lot of it is good. They could have been more honest by walking back some of the claims and including a good disclaimer about practical speeds, but I feel your attack here is broader than called for.

    1. Everyone in the know understands the difference between these
      theoretical bounds and real-world performance.

      Everyone who builds systems does. The authors of the cited paper almost surely do.

      1. Admittedly, I haven’t read the paper (maybe link it?), but do the others claim anything other than a pure mathematical result?

        That is, do they make the kind of “faster and applicable in the real world” claims that you object to, or do those only appear in the magazine?

  7. This problem of misusing asymptotic efficiency often takes place in the area of data structures.

    We tend as humans to simplify very complex topics into a simple number or description for comparison, and then we ignore the actual meaning of the simplification when we do a comparison.

    In the topic of data structures, big O notation usually means number of operations. However this becomes unrealistic for state of the art memory hierarchies, which is why, for example, cache oblivious algorithms use number of memory accesses for asymptotic efficiency instead.

    I think the point of the article is not to speak bad about the publication per se, but rather the media that claims existing multiply implementations are slower. The fact that they assume the ability to handle big enough numbers does not mean existing computers will handle it properly.

    It reminds me about the claims about quantum computers and their superior ability to tackle any problem than traditional computers.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax