Classical Newtonian mechanics is always mathematically consistent. However, Newtonian mechanics assumes that bodies move without friction and that we stay far from the speed of light. When your car is stuck in the mud or you are running an intergalactic spaceship, frictionless Newtonian mechanics is the wrong model even though it remains mathematically consistent. Why do we still use Newtonian mechanics? Because it gets the job done in many practical cases.
Similarly, in computer science, we routinely analyze algorithms using the big-O notation. This notation is always mathematically consistent. In this sense, it is always right.
However, most computer scientists and engineers use the big-O notation as a model for real-world performance (at a high level). So if a computer scientist tells you that algorithm X runs in time O(n2) whereas a algorithm Y runs in time O(n log n), you expect that for some large but reasonable value of n and for some data, algorithm Y will be faster than algorithm X. If it does not happen, it does not mean that the big-O notation is mathematically wrong, but it means that it is wrong as model for real-world performance. It must be rejected. That is, the big-O notation does not model real-world performance and is not useful as a scientific model. That’s just like saying that if you run an intergalactic spaceship, Newtonian mechanics is wrong. It is not up to debate: Newtonian mechanics will simply fail to model how your engine relate to the speed of your ship.
What do I mean by large but reasonable value of n? First we must agree that there is a limit. Just consider that our solar system is finite. We could spend all our ressources on a single massive computer, but it would still be finite. Even if we were to expand the computer to encompass all of the universe, it would still be finite. So there is clearly a limit to how big n can be. In practice, this limit is set by the practical problems we encounter. If, for your problems, n is too small, then the big-O notation is the wrong model for you.
To make matters worse, nobody uses the same program to process 10KB and to process 100TB of data. Suresh summarizes the problem:
Asymptotics will eventually win out, as long as everything else stays fixed. But that’s the precise problem. Everything else doesn’t stay fixed. Well before your n log n algorithm beats the n2 algorithm, we run out of memory, or local cache, or something else, and the computational model changes on us.
So even if your algorithm would eventually win out for a value of n that is not outrageous, your asymptotic analysis can still be irrelevant because larger values of n are handled with a different architecture.
Ultimately, the big-O notation is a tremendously useful but crude tool. It is great to convey broad ideas. It can help to explain some simple decisions. For example, if you need to search an element in an array and you expect the array to be large, you might just say that you opt for a binary search instead of a sequential scan because the former has O(log n) complexity wheres the latter has O(n) complexity. It is unlikely that your colleagues will expect you to run benchmarks. In this case, the big-O notation captures and summarizes our knowledge of the problem.
However, when designing a software system, the fraction of your decisions that rely on big-O analysis is small. Good engineers rely on more sophisticated models and metrics. In this sense, it is unfair to compare the big-O notation with Newtonian mechanics. The latter allows you to model complex problems from start to finish with exact results (under some assumptions that can be almost realized). The big-O notation is far more limited in its applications. Of course, when it is applicable, the big-O notation is tremendously powerful.
Continue reading with my post Better computational complexity does not imply better speed.
This reminds me of my post, O(n) beats O(lg n) I wrote a while ago:
http://willwhim.wpengine.com/2011/07/07/on-beats-olg-n/
Further, Jeff Ulmann complained about people using easy test sets that show their algorithm working, while, in the real word this would not happen. Some, even cheat.
The same problem is with Big-O notation. Computer scientists derive new estimates using clever mathematical tricks. The problem is that these tricks introduce such enormous constants, so that (I believe) a majority of Big-O published formulas are totally bogus.
They are mathematically correct, but you would never get a sufficiently large n.
I recently optimised an algorithm that was part of a scientist’s MATLAB program. I described how I did it to a computer scientist who poured scorn on my work ‘Trival! It changes nothing in the big-O sense’
The scientist, who’s workflow was now many times faster, bought me a case of wine.
I think another way to say this is that for real world problems the constant matters. If you really want to know what is faster you need to work harder and get a sharp bound. Big-O just tells you how the bound will scale with size. However, if the difference is between O(log(n)) and O(n), then the ratio of constants must be pretty big for log(n) not to win for any appreciable n.
@Carson Chow. The problem is that it is not always log(n) vs n. Quite often it is log^3(n) or, say, log^5(n). (have a real example, of course).
Mike, you need to find a computer scientist to talk to who isn’t an idiot.
In addition to this point, the smaller but important point is that big-O gives only an upper bound, asymptotically, and not necessarily a tight one. Use big-Omega for that. See http://mcw.wordpress.fos.auckland.ac.nz/2011/09/27/big-omicron-and-big-omega-and-big-theta/
So, if I may paraphrase, big O-notation is a mathematically very useful notion that allows us to talk about algorithms very reasonably, and often corresponds pretty directly to reality. But you have to consider it with a grain of salt — hidden constant factors, the fact that it’s asymptotic, it doesn’t (by itself, necessarily) take into account parallelism/caches/other hardware issues all are things to worry about.
Great. My lecture in my algorithms class has all bases covered.
(Oh, and I forgot — it’s worst case analysis, not average/typical/whatever your case is analysis. That’s in my lecture too.)
@Michael
I would hope that everything in this blog post is standard knowledge and uncontroversial.
While I’d agree that the limitations of the O notation you describe should be uncontroversial (and should be clearly taught with the notation), I think you exaggerate when you say:
“However, when designing a software system, the fraction of your decisions that rely on big-O analysis is small. Good engineers rely on more sophisticated models and metrics.”
and
“The big-O notation is far more limited in its applications.”
In my experience, a great many basic decisions stem from understanding what your code is doing at the level of counting operations — is it linear, quadratic, n log n, etc. — and then fine-tuning for constant factors.
So while I agree with all of your caveats, I think of O-notation as not being useful as still an exceptional rather than common case.
Asymptotic analysis tends to be more relevant with polynomial complexities than with (poly-)logarithmic ones. After all, log n ≈ 30 is usually a reasonable estimate, while cache misses cost from tens to hundreds of CPU cycles.
Jouni, as far as I remember (let anybody correct me if I am wrong), quadratic time algorithm for suffix-tree(array) construction are faster in practice than linear ones.
Itman: A few years ago the fastest suffix array construction algorithms (at least under some assumptions) were indeed O(n^2 log n)-time. These days the fastest well-known implementation (under similar assumptions) is Yuta Mori’s libdivsufsort, which uses an O(n log n)-time algorithm.
The caveat is that the benchmarks are usually done with small inputs of up to ~100 MB in size. With such small inputs, memory locality and various heuristic tricks play more important role than in the multi-gigabyte range.
Jouni,
Thank you for the update. Yes, this may be the case. But, still, up to a “small” 100 MB input, the quadratic beats the classic LINEAR.
Itman: The quadratic complexities of those fast algorithms are basically rough upper bounds. A reasonably straightforward implementation of naive suffix sorting takes O(nL log n) time, where L is the average LCP value. The fast “quadratic time” algorithms basically improve upon this in two ways:
1) They identify a subset of suffixes, whose sorted order can be used to induce the order of the rest of the suffixes. This typically improves the running time by a constant factor.
2) They use some heuristics to identify long repetitions, and sort them in some other way. There are known bad cases for some of the heuristics, but even then the asymptotic complexity does not grow over the naive O(nL log n). For other heuristics, neither bad cases nor better time bounds are known.
Now consider a snapshot of the English language Wikipedia. In the first tens of megabytes, the average LCP is in low tens. When we increase dataset size to hundreds of megabytes or even a few gigabytes, L increases into the 50-100 range. After that, L starts to rise, reaching the 500-1000 range with dataset size in tens of gigabytes.
Then consider the human genome. Due to the long runs of Ns (unknown bases), the average LCP in the entire reference genome is in hundreds of thousands. Yet this is exactly the kind of repetition the heuristics in the quadratic algorithms can easily handle. If we ignore the Ns, the average LCP drops below 100 for the most of the chromosomes (I couldn’t find the numbers for the entire genome).
In both cases, asymptotic analysis gives us reasons to believe that the speed of the “quadratic” algorithms should be in the same neighborhood as the linear and O(n log n)-time algorithms, at least if we consider the typical ~100 MB test cases. Yet as the Wikipedia example suggests, the situation may change with larger inputs.