Scientists, businessman and even spies are supposed to analyze data collaboratively. Are they?

If you are a scientist, you are familiar with the following type of research collaboration: a lowly student collects the data, crunches the numbers and plots the data. Other collaborators—such as the professor—merely comment on the tables and plots. Similarly, the CEO sees the pie chart, while the assistant crunches the numbers. That is vertical collaboration: you clean the basement and I will clean the main floor.

Yet, reliable data analysis requires horizontal collaboration.  Indeed, there are downsides to task specialization:

  • By never looking at the data, senior scientists and managers rely on experience and hearsay. Their incoming bandwidth is dramatically reduced. Nature is the best coauthor. Consider how the best American spies were fooled prior to 9/11 while all the data to catch the terrorists was available. Bandwidth is a requirement to be smart.
  • When a single person crunches the numbers, hard-to-detect errors creep in. The problem is serious: Ioannidis showed that most research findings are wrong.
  • With nobody to review the source data, the sole data analyst is more likely to cheat. Why rerun these tests properly, when you can just randomly dismiss part of the data? People are lazy: when given no incentive, we take the easy way out.

The common justification for task specialization is that senior researchers and managers do not have the time. Yet, 30 years ago, researchers and managers did not type their own letters. Improve the tools, and reduce task specialization.

With Sylvie Noël, I decided to have a closer look. My preliminary conclusions are as follows:

  • There are adequate tools to support rich collaboration over data analysis. Collaboratories have been around for a long time. We have the technology! Yet, we may need a disruption: inexpensive, accessible and convenient tools. The current migration tower Web-based applications might help.
  • Given a chance, everyone will pitch in. To make our demonstration, we collected user data from sites such as IBM Many Eyes and StatCrunch. We then ran an Ochoa-Duval analysis. We find that the network of users within web-based data analysis tools is comparable to other Web 2.0 sites.

As a database researcher, I think that further progress lies with loosely coupled data (no big tables! no centralized tool!) and flexible visualization tools (stop the pie charts! go with tag clouds!). I am currently looking for new research directions on this problem, any idea?

Further reading

William Meehan—president of the Jacksonville State University—got his Ph.D. by copying largely word-for-word the dissertation of another student. He did not even copy an obscur thesis published in some remote country. In fact, he copied the thesis of a fellow University of Alabama graduate. And wait for it: they graduated nearly at the same time. And 3 professors were on both dissertation committees.

Call me naïve, but I am surprised.  We all know there are bad apples. Students will cheat. But cheating on a Ph.D. dissertation must be extremely difficult. It takes guts to copy a dissertation submitted recently, at the same school. It should not be possible. The University of Alabama seems like a respectable school, with actual professors and Ph.D. programs. What happened?

The thesis supervisor ought to know. A supervisor must provide feedback throughout the student’s work, from the proposal stage, to the final revision.  Either he knew about the plagiarism (I doubt it) or else, he played no role in supervising the student. The student came to him with a complete thesis. He read it over, made some minor comments, and approved it. Rubber stamping a thesis should be as bad as plagiarism.

(It seems that professor Howard Jones was his supervisor though I am unsure.)

Further reading: Alabama college president accused of plagiarism (USA Today)

« Previous Page

Powered by WordPress