Jeffrey Ullman, a famous computer science professor, published an essay pushing back about the need to run experiments in computer science. Apparently, some conference reviewers gave him a hard time regarding a paper he submitted because it did not include experiments.
I am in general agreement with him: not every research paper should include experiments. That is obviously an excessive requirement. I also think that building theoretical models is very important. Experiments alone are unsatisfying. Indeed, science is not merely about running experiments. Imagine if we did not have Newton’s laws? Would we need to run experiments on every new object to see how it behaves when we push it? Experiments are often badly designed and misleading. I don’t tend to trust a paper that includes experiments more than another paper. In fact, a paper with a useful theoretical results can be much more powerful than any large set of experiments… since it is often easier to check the mathematics than to be sure that the experiments were carried on fairly. There is also a sense in which experiments can be pointless. For example, I was once asked to show experimentally that an algorithm based on external-memory sorting could scale up.
But, at some point, Ullman linked to my blog post A criticism of computer science: models or modèles? This brought mixed feelings. I think he misrepresented my opinions:
In this article, the author argues that one should never use a model that is not real running time on a real computer. For example, this author would not accept the O(n log n) lower bound on sorting, because it is based on counting comparisons rather than machine instructions executed.
I don’t think that my blog post says anything of the sort. I disagree with this statement even though Ullman is under the impression that it is what I believe. I think that our knowledge that sorting (in general) requires at least n log n comparisons is critical and fundamental. If you were to learn one thing from computer science, it would probably not be a bad choice. But it is also practically incorrect: you can use Pigeonhole sort or Radix sort to achieve sorting in linear time (so better than O(n log n)). And you are probably relying daily on software that sort data faster.
Models are always simplifications. However, if your model is right by definition… that is, if no comparison with real world could lead you to conclude that your model is wrong, then it is not a scientific model. It might be a tremendously useful mathematical construction… but it is not science.
In his essay, Ullman concludes from the model:
Returning to the sorting example, if you know one algorithm is O(n log n) and another is O(n2), you might not know which is really better for small n, but you know for certain that the first is better for matters a database researcher might be interested in.
Of course, Quicksort is an O(n2) algorithm that is often used by professional software engineers including those building database engines. My point is that Ullman’s models are much less informative than one might think.
It is a scientific fact that the sorting algorithm used in Java, Timsort, is often preferable to the classical algorithms like merge sort or quick sort. We can run extensive experiments to check that fact. There is also a theoretical analysis that explains why Timsort can often be better. That’s a scientific approach.
Working with models without experiments is akin to mathematics. If you model is right, you ought to be able to demonstrate it with experiments.
Not everything needs to be science. For example, Codd‘s work on relational algebra (the foundation for SQL) has been impactful. I am really happy we have SQL and everything that was built on it. But notice that it was built by people who didn’t merely come up with models… they implemented their ideas and had them face the real world.
The problem is that if we don’t require people to test out their ideas in the real world, we are going to get thousands of rambling researchers for every Codd. Sometimes, that is the impression that academic research in computer science gives.
Some researchers will eagerly point out that pure theory often proves useful in unexpected ways. But that is also true of art and philosophy. What sets our current civilization apart from the preceding ones is that we are founded on the scientific paradigm. The industrial revolution was possible because hackers built machines that worked, and irrespective of what the great minds of the time thought, they were adopted. In effect, results are what matters, not how smart you are or how prestigious your position is.
I do not believe we should do away with mathematics, art and philosophy, but we need to be watchful. Do you use a model because influential people are supporting it or because it fits the facts?
Continue reading with my post on Big-O notation and real-world performance.
5 thoughts on “Should computer scientists run experiments?”
Daniel — you’re not the “villain” in my story. But I disagree with (what I interpret as) your criticism of models that do not model absolutely every aspect of a computation. Would you dismiss computing the communication cost of a MapReduce algorithm, even though there are other costs involved (but communication cost typically dominates)? Do you dismiss any claim of a big-oh or big-omega bound, because no such claim can ever be falsified by a finite experiment?
Would you dismiss computing the communication cost of a MapReduce algorithm, even though there are other costs involved (but communication cost typically dominates)?
Do you dismiss any claim of a big-oh or big-omega bound, because no such claim can ever be falsified by a finite experiment?
If you want to be formal about it, no model can ever be proven wrong.
We both know that, in practice, if you plot the performance curves and you get something that is quadratic in n… you have effectively disproved the n log n model.
You may argue that formally, I have disproved nothing… but the point of a scientific model is to tell you something about the real world. If you constantly get quadratic curves… you can argue until you are blue in the face that it is really n log n… engineers are going to reject your performance model and use a better one.
So falsification is not a black and white thing. Rather, in practice, it becomes increasingly harder to fit a model to the facts. Now, that’s assuming you have a scientific model. If it is not a scientific model, then you can always fit it to any fact… because facts don’t matter.
Big-O notation may very well be _practically useless_, but it is certainly not _meaningless_, even in the context of a universe with a finite number of elementary particles.
_We both know that, in practice, if you plot the performance curves and you get something that is quadratic in nâ€¦ you have effectively disproved the n log n model._
No, we don’t. We both know that in practice, if one sorting algorithm has better performance on real-world data, then it has better performance on real-world data, by definition. But that’s not a falsification of the asymptotic complexity model, because that’s not what the asymptotic complexity model describes.
It’s not the _model’s_ fault if you’re using it wrong.
If you want to be formal about it, then big-O notation is a purely mathematical construct that has no bearing on reality.
Take sorting… just artificially pad any array so that it has at least 10^10 elements. You haven’t changed the big-O notation, but you have drastically altered the real-world performance for all practical cases.
If you insist on this interpretation, then yes, big-O notation belongs to math. departments.
But that is not how it is used. Clearly, Jeffrey’s intent is that his big-O analysis has some bearing on real world performance in this universe.
Model’s prediction and experimental results are separate things, but they are not unrelated. What the model says, if it’s constructed under assumptions similar to what holds for the experiment, is evidence about what the experimental results will be. And the shape of experimental results is evidence for the existence of a reasonable model that would predict it.
You may subscribe to this blog by email.