Many papers in Computer Science tell the following story:
- There is a pre-existing problem P.
- There are few relatively simple but effective solutions to problem P. Among them is solution X.
- We came up with a new solution X+ which is a clever variation on X. It looks good on paper.
- We ran some experiments and tweaked our results until X+ looked good. We found a clever way to avoid comparing X+ and X directly and fairly, as it might then become obvious that the gains are small, or even negative! We would gladly report negative results, but then our paper could not be published.
It is a very convenient story for reviewers: the story is simple and easy to assess superficially. The problem is that sometimes, especially if the authors are famous and the idea is compelling, the results will spread. People will adopt X+ and cite it in their work. And the more they cite it, the more enticing it is to use X+ as every citation becomes further validation for X+. And why bother with algorithm X given that it is older and X+ is the state-of-the-art?
Occasionally, someone might try both X and X+, and they may report results showing that the gains due to X+ are small, or negative. But they have no incentive to make a big deal of it because they are trying to propose yet another better algorithm (X++).
This process is called citogenesis. It is what happens when the truth is determined solely by the literature, not by independent experiments. Everyone assumes, implicitly, that X+ is better than X. They beauty of it is that you do not even need for anyone to have claimed so. You simply need to say that X+ is currently considered the best technique.
Some claim that science is self-correcting. People will stop using X+ or someone will try to make a name for himself by proving that X+ is no better and maybe worse than X. But in a business of science driven by publications, it is not clear why it should happen. Publishing that X+ is no better than X is an unimpressive negative result and those are rarely presented in prestigious venues.
John Regehr made a similar point about our inability to address mistakes in the literature:
in many cases an honest retrospective would need to be a bit brutal, for example to indicate which papers really just were not good ideas (of course some of these will have won â€œbest paperâ€ awards). In the old days, these retrospectives would have required a venue willing to publish them, (…), but today they could be uploaded to arXiv. I would totally read and cite these papers if they existed (…)
But there is hope! If problem P is a real problem, for example, a problem that engineers are trying to solve, then you can get actual and reliable validation. Good software engineers do not trust research papers: they run experiments. Is this algorithm faster, really? They verify.
We can actually see this effect. Talk to any Computer Scientist and he will tell you of clever algorithms that have never been adopted by the industry. Most often, there is an implication that industry is backward and that it should pay more attention to academic results. However, I suspect that in a lot of cases, the engineers have voted against X+ and in favor of X after assessing them, fairly and directly. That is what you do when you are working on real problems and really need good results.
Credit: This blog post was inspired by a comment made by Phil Jones on Google+.
10 thoughts on “Citogenesis in science and the importance of real problems”
May be this is an instance of what you’re saying:
Theory say that the performance of Dijkstra shortest path algorithm is best when using a Fibonacci Heap, but some experiments disagree.
It is a benign example of what I am pointing out. There are indeed countless engineering papers written every year using the Fibonacci heap. It is unclear *why* they use the Fibonacci heap because we now have overwhelming evidence that it is not worth it.
However, there are many hidden Fibonacci heaps out there in research papers.
Is this also a BIG reason why authors don’t make available their source code publicly? I believe it is.
Fibonacci heaps are still cited only because authors are too lazy to look past their textbooks to the more recent literature, where simpler, faster, and more practical data structures with the same theoretical guarantees have been known for years. (Fibonacci heaps are still cited in textbooks because _textbook_ authors are too lazy to look past _their_ textbooks to the more recent literature, where etc.)
The point where things completely come full circle is when the engineers come out with an open source package/library (e.g., pyX) that implements several of the techniques, including X, X+, and X++, making it easy for researchers to try them all.
From my personal experience, it’s the lack of experience and exposure to better / more advanced techniques that means that these better methods get left on the shelf.
The cognitive effort required to apply, let alone come up with, is prohibitive to their adoption, and yet the majority of developers I know would rather work things out from first principals and use inappropriate levels of abstraction, more for macho purposes than practical ones.
my 2c. 😉
It is even worse. Suppose I find that X+ is no better (or even worse) than X and thus mention X+ negatively in my paper. Very quickly, my mentioning is going to be reduced to “a citation” and increase X+’s citation count. Even with my negative result, I’ll actually help promote X+!
This sounds analogous to how software bloat can accumulate: http://en.wikipedia.org/wiki/Software_bloat
True. Good point.
Doesn’t this all come down to the need for more replication of findings in CS research, and the related problem of actually valuing this type of article?
Last year, there was a panel at the CHI conference on the topic of replication in the Human-Computer Interaction field.
See also: Graduate Student Perspectives on replication
You may subscribe to this blog by email.