Science is self-regulatory… really?

Many theoretical systems are self-regulatory. For example, in a free market, prices will fluctuate until everyone gets a fair price. But free markets are a mathematical abstraction.

The business of science should also be self-regulatory. Scientists who produce bad work should build poor reputations. We have journals that have strict peer review: these journals will filter out the insignificant and poor work. Yet I believe that the business of science fails to be self-regulatory. Ioannidis et al. (2010) make several observations which support my belief.

  • Myth 1: peer review is a sign of quality.  There is what looks like a never-ending “bubble” in academic publishing: Nowadays, some authors have been co-authoring more than 100 papers annually. Some of these researchers actually published only 3 or 4 papers per year until their mid-forties and fifties. How do you explain that so many researchers became suddenly so prolific? The increasing competition for funding and jobs has a role to play… but can competition increase the productivity of researchers so much? It is doubtful.  The dirty little secret in science is that you can endlessly resubmit your work to as many journals as you want. In fact, you can even simultaneously submit similar papers to several different journals. Eventually, if only through random chance, your work will be accepted. Why do people expect such a system to be self-regulatory? There is no penalty for writing poor papers, ignoring reviewers, publishing junk… but great rewards for being prolific. Just as long as you stay away from outright fraud, there is no price to pay for empty work. The truth is that peer review is not regulation,  peer review is an honor-based system: it only works well when both reviewers and authors are committed to the greater good. But there is no penalty for being evil! If I spot fraud or a substantial failure in some paper I review, the authors can just go around and resubmit the paper elsewhere, free of charge! Compare this with blogging: I could write 12 boring blog posts a day… and what would happen? People would quickly start ignoring me. I have a strong incentive to limit the frequency of my posts and keep the quality high if I want to attract and retain more readers. And before you object that journals have the same incentive, consider that journals are not assessed on their readership: at best, they are measured by the number of citations they receive (the so-called Impact Factor).
  • Myth 2: even if journals publish junk, counting how many citations a paper has received will tell you how good it is. This assumes that authors cite the very best work after reading it carefully. According to Ioannidis et al., anything will be cited: Two decades ago, only 45% of published papers indexed in the Web of Science received at least one citation within 5 years. This pattern has now changed: 88% of medical papers published in 2002 were cited by 2007. Almost anything published this year will eventually be cited.  But what about looking at how often papers get cited? The problem is that if citations are attributed at random, and you publish many papers, then eventually you will get lucky and get a few highly cited papers. It is not a matter of producing quality, just quantity. And, of course, I frequently find great papers that have received few or no citation. The hard truth is that there is no substitute for reading the papers!

So you disagree with me? You believe that the business of science is self-regulatory? Then please, explain the mechanism.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

17 thoughts on “Science is self-regulatory… really?”

  1. @Louis

    Right. So you can’t trust citation statistics alone to determine the validity of a paper. It is self-evident, but worth repeating.

    @yarbel

    I don’t recommend anyone starts reading more junk papers.

    Your proposal makes sense but it touches a nerve: what does “citing a paper” means?

  2. Myth or Fact? Best paper awards are really the best papers. Sometimes I read these papers and see a whole new problem space. Other times, authors use them to gain credibility and popularity.

  3. more and more journals are using systems that automatically spot self-plagiarism

    Consider this: it is actually advantageous to republish your own papers, even though your work is already widely available electronically in the first place. This is totally insane if you think about it!

    The simple fact that we need tools to detect self-plagiarism proves that there is a problem. If the system was self-regulating, we wouldn’t need this.

    Moreover organizations like ACM and IEEE are enforcing blacklisting.

    I’m very interested in these black lists. Is this documented somewhere?

  4. An excellent post. Thanks. I completely agree with the argument and think the problem is actually much worse (considering how past record of publications increases your chances of being published. Most peer reviews, I believe, are far from being fully blind).

    However, I completely disagree with the conclusion. It is simply a waste of time to read all the junk that is being published just to get a sense of what’s good and what’s not. There are quite a few substitutes to reading the full paper. We should think along the line of better mechanisms for regulation instead of abandoning it altogether.

    My first suggestion would be to limit the amount of allowable citations per paper to (60%*current average). This will force authors to cite only what’s crucial to their issue and will raise the credibility of the citation index.

  5. I do not think that the situation is so bad: more and more journals are using systems that automatically spot self-plagiarism, and also paper submission systems like EDAS have this feature.
    Moreover organizations like ACM and IEEE are enforcing blacklisting. I know a guy that was considered to have done some form of self plagiarism submitting a paper to a conference and a variation to a journal, spotted by a reviewer and the result was that both papers were rejected, plus the guy risked to be blacklisted for some time.

  6. It is not quite clear to me what you mean by the phrase “science is self-regulatory”. And once you have spelled that out, is it something you really want? I want a science that explodes when measured by the amount of insight and benefit it generates.

  7. The simple fact that we need tools to detect self-plagiarism proves that there is a problem. If the system was self-regulating, we wouldn’t need this.

    Agree on this.

    Moreover organizations like ACM and IEEE are enforcing blacklisting.

    I’m very interested in these black lists. Is this documented somewhere?

    This is not well documented but here are a couple of links:

    http://www.ieee.org/publications_standards/publications/rights/plagiarism/index.html
    IEEE ha what they call PAL

    http://www.acm.org/publications/policies/plagiarism_policy
    ACM is less clear, but the case I mentioned happened in a ACM conference, so I think they have their own list.

  8. @Daniele

    You would need to ensure that the same paper, only slightly modified, keeps the same hash value with high probability. I bet that it would not be very long before someone could reverse engineer the hash function and break the system.

  9. One can imagine a centralized system to which different conferences and journals can submit a ‘fingerprint’ of each paper. This fingerprint could be some form of perceptual hashing for textual content, in our case the stripped down version of the PDF or PS.
    The fingerprint should guarantee the confidentiality of each submission by assuring that it is computationally infeasible to reverse the fingerprint leading to the actual text.

    The centralized system could then ‘fuzzily’ match new fingerprints against the database of stored fingerprints and flag paper that are almost identical.

    Also, the system could be informed of the dates when the reviewing process for a certain paper begins and ends. This way the system could also spot concurrent submissions.

    I can see some problems with this approach, namely the centralized authority would know, after a paper becomes public, how many times a certain paper has been resubmitted. This is thanks to the fact the perceptual hashing will be able to spot similarities. But I am not sure whether this is a bug or a feature. Thoughts?

  10. I agree that sexurity by obscurity would ne a bad idea and the perceptual hashing function would be public. Current state of the art techniques in perceptual hashing (e.g., for images) are robust to a certain degree of noise. I need to do some research in similar systems for text, but I would bet that they are very good at detecting similarity even in the presence of slight or even large modification of the text. The problem would be very related to plagiarism detection with the added constraint of only keep a limited size perceptual hash rather than the full text.

  11. Good post!
    I completely agree.

    Regarding with it, recently (July 2011) a report of the “House of Commons” about peer review has been published.
    This is the report (Volume I)
    http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/856/856.pdf

    And they have also published the additional written evidence by researchers (Volume II), editors… really interesting!
    http://www.publications.parliament.uk/pa/cm201012/cmselect/cmsctech/856/856vw.pdf
    In Volume II there are a few proposal about how to improve the peer review, and alternatives to it.
    In my opinion, it is necessary to find some alternative mechanism to it, something that considers a first phase with peer review an then a “continuous social review” that contrast the paper with the “use” that the society is doing with it. But this is difficult.

    Beatriz Barros (coordinator of the SISOB project)

  12. Now that content is used in different ways to gain online presence, it would really be important to follow a strategy that will allow you to identify original works from those that are mere remakes of another.

Leave a Reply to Daniel Lemire Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.