Taking scientific publishing to the next level

Scientific publishing is wasteful. We spend much time perfecting irrelevant papers to get them through peer review. Meanwhile, important papers—that thousands of researchers will have to study—remain filled with errors or suffer from a suboptimal presentation. Surely, you have stumbled on an important paper and thought to yourself: this paper could use a couple of examples. Or maybe the important results are buried deep into irrelevant material because the authors did not know what was really significant when they wrote the paper. We patch the system by writing lecture notes and even textbooks, which are themselves obsolete soon after their publication.

We can do better:

  • Research papers should be subject to bug reports and feature requests. We need a bugzilla for research papers. It would cost nearly nothing, but it would dramatically improve the important research papers. Moreover, young researchers could build up a reputation: finding and reporting bugs in the literature is sign of leadership.
  • Research papers need versioning: the authors should revise their most important work, to fix bugs and improve the presentation. Important research papers should be perfected as much as possible. (Some open archives such as arXiv already have this function.)
  • When the authors are unwilling or unable to improve their important papers,  then someone should create a derivative paper. For example, a graduate student could take a classical research paper and rewrite it to fix bugs and improve the presentation. As a community, researchers should promote licensing which specifically allows this type of derivative work. Researchers should also publish documents that can be easily edited (LaTeX source code or Microsoft Word documents). Many small fixes do not warrant yet another (possibly irrelevant) research paper. We need to be able to go back and patch older work for everyone’s benefit.

Yes, I am effectively saying that we should consider research papers like Open Source software.

Further reading: The Journal Manifesto 2.0 by Bill Gasarch and my Simplified Open Publishing Manifesto.

Credit: This blog post was motivated by an email exchange with Daniel Gayo Avello.

Update: Michael Nielsen sent me a point to his essay Micropublication and open source research.

22 thoughts on “Taking scientific publishing to the next level”

  1. Amen.

    I don’t think the human need for time and space limits would particularly hamper this approach to writing. Releases can be scheduled, compilations can be collected, pages can be counted, etc. as usual.

  2. I agree that our works should have their own dynamic collaborative documentation, just like software. But, right now, papers are like software releases, not software projects. Since papers have no edition number, they represent the state of the research in a given time. Once you release the software, you cannot override the same version with changes. If you do it, users/readers will get lost with features appearing and disappearing.

    Following the book model, if a paper had edition numbers, you would provide bug fixes, but still not new features. An article in the Wikipedia, however, is always updated. This means that the article IS always the current version of the whole software and may change dramatically from one day to the other. Citing in the project level may be dangerous, because things that were there to support your citation may disappear in the next day.

    Right now, there is no distintion between citing releases or projects. Can we publish and cite scientific findings in a Wikipedia style? Probably. But we need to change the way to cite first.

  3. Daniel,

    insteresting idea. It is becoming a recurrent subject lately, and I believe we already have the technology we need to make it happen.

    On a very basic level one could use a dropbox account (which comes with version control) or a google document/scribd to keep a “latest version of my research” of his/her papers. This might be a quick-and-dirty patch to get teachers to love the idea while a ‘standard’ infrastructure is put in place.

    This infrastructure could be something like: each university has their own bugzilla/trac system open for public/peers review, where each paper has his own ‘branch’ in the ‘research’ line, ‘bugs’ that have to be fixed, ‘patches’, etc. Something between a wiki and github.

    Also, having a ‘diff’ of each paper would be great for another reason: learning how to do it. I believe that seeing how a paper is written and becomes a finished work is a fundamental part of the learning process for a graduate student. At the beginning of our careers probably we assumed people wrote papers the way they appear in journals… In my case that would get me very depressed (I’d never be able to have such clarity of mind! How do you know your idea would work right away?! Etc.).

    There will be a couple of issues to overcome: (i) copyright and journals with closed mind (those that do not allow you to retain ownership of your paper, nor to put on your personal web page); (ii) researchers’ ego; (iii) human laziness(*); and (iv) being afraid somebody would ‘steal’ our work if we publish too early.

    (*) What do I mean by human laziness? The fact that having a clear deadline and a strict number of words or pages often forces you to re-think your work and to find better, more effective wording. Maybe make two pictures into a more precise one. Take out some irrelevant material. This, sometimes, is what makes you crank out a good paper. Otherwise we’d be happy with our work once we finish writing the first draft.

    All those ‘problems’, however, will slowly disappear if such a versioning system takes place. I think it’s just a matter of time.

    One last question for you: what would you do to make this happen? How can I help? 🙂

  4. Daniel, I couldn’t agree with you more. You explain my own thoughts clearly and concisely! I really hope I live to see scientific research reach that point. It will open so many possibilities. (For one, enabling serious research contributions by people outside of academia.)

  5. Couldn’t agree more. I think there is even more that researchers could benefit from adopting the development processes that come with the Open Source model. For example, by following the “Release Early, Release Often” scheme, you can get in touch with others researchers much faster than through traditional means, with the potential of self-organization collaboration.

    See my blog post http://blog.mikiobraun.de/2010/01/open-source-process.html

  6. “Citing in the project level may be dangerous, because things that were there to support your citation may disappear in the next day.”

    In a way, that sounds like a feature. Things you cite may turn out to be false. How much better would it be if some conflict resolution software popped up “paper X now concludes this was false”, rather than risk building huge citation trees based off an original mistake. All citations of knowledge “decay”, this makes the process explicit.

    There’s also an analogy to conflict resolution in source control (when somebody commits a change that doesn’t mesh with your own local changes). If you point your citations to specific paragraphs, and perhaps annotate them with a brief explanation of what you’re citing, then anyone committing a fix to that area could be prompted with “are these citations still valid?” If not, helpful participants could look into finding other citations, or reworking the paper given this new knowledge.

  7. idea of bugzilla for papers is great ! but there’s one thing you have to worry about. Whether we like it or not, assigning credit for papers (and reputation) is the oily scum that greases the high-minded research enterprise. In this kind of open model, it’s not clear to me how credit gets assigned for work. I agree that citation models need to change, but if you’re citing the 7th patch of a paper after a series of bug fixes, are you citing any specific person ? the authors ? the union of authors and bug fixers ? and how are these contributions weighted ?

    I guess I’m wondering what an appropriate glide path would be to get to this future.

  8. Suresh hits the nail on the head. IMHO we are still using an inefficient research process mostly because our reward system is ready-made for that process.

    This is backwards: we should design a reward system that favor the emergence of more efficient research processes.

  9. I think we need a website that allows discussion on any paper ever published. That will serve two purposes: 1) This will allow people to report bugs, and other people/authors to clarify what the correct version of the sentence/paragraph/figure is. 2) It will help develop a healthy discussion around every paper. Both newbies as well as experts in the field can quickly learn about the merits of a work, how it relates to other works, and what might be the pitfalls of the paper. This has another advantage – now, authors only worry about getting their papers accepted. Once there is a website than can list the weaknesses of a paper, the authors would be more concerned to write a better paper that others cannot poke holes through.

  10. There’s already plenty of places, where you can do this. One example is CiteULike. What’s stopping anyone from adding comments and reviews to the paper? CiteULike has excellent interface to track the comments with proper attribution.

    Well, the biggest problem for this is the motivation. The authors clearly don’t have any extra motivation to fix minor bugs in the paper (If they do, it would be a journal paper or an extended tech report). Why would others take this burden? We need some reward system to make this work.

  11. I like this idea. It certainly worth more discussion. I am also fully in agreement with installing a post-publication review process. That is, having a generalized way to give feedbacks and evaluate papers after they have been published, not only before.

  12. @Itman

    Publishing an errata is not very satisfactory. However, in some journals, it counts as an article (technically)… so a scientist could extend his publication list with errata.

  13. Everything in life has advantages and disadvantages. Assessing quality under the current publication model is done heuristically, using number and order of authors (modified by disciplinary practices), forum impact factor and/or acceptance rate, citations, assessments by peers, etc. (And, yes, “heuristically” here is synonymous with “pulled out of the assessor(s) ass(es)”.)

    A publication model based on open-source repositories could substitute hard statistics for the above metrics: hits, comments, likes/dislikes, cites, patches, plus perhaps some that we haven’t considered yet (it’s a graph with multiple kinds of edges; what would be the meaning of each metric one could extract?).

  14. I often find bugs in the papers, sometimes, serious ones. My impression is that, unfortunately, scientists do not always care about fixing bugs in their papers. Overall, I feel that finding bugs is not a popular business. It may be that instead of building a reputation one will get a lot of enemies. May be our researches, should care more about their own bugs in the first place. Only then, we will see bug-reports to their papers.

  15. Daniel,
    This is good point. Let’s hope things will change in the future. Already today, a few serious journals are truly on-line. How difficult should it be to fix it or, at least, publish a manual errata?

  16. I like this idea. I find that often when doing research, you run into papers that have really serious yet not immediately obvious flaws. In a way, these sort of “pollute” the stream of progress, making it harder to publish a real paper that implements the technique correctly/effectively. The new work is just not seen as novel.

Leave a Reply

Your email address will not be published. If you leave an email, you will be notified when there are replies. The comment form expects plain text. If you need to format your text, you can use HTML tags such <strong> and <em>. For formatting code as HTML automatically, I recommend tohtml.com.