Why senior researchers and managers should analyze data themselves…

Scientists, businessman and even spies are supposed to analyze data collaboratively. Are they?

If you are a scientist, you are familiar with the following type of research collaboration: a lowly student collects the data, crunches the numbers and plots the data. Other collaborators—such as the professor—merely comment on the tables and plots. Similarly, the CEO sees the pie chart, while the assistant crunches the numbers. That is vertical collaboration: you clean the basement and I will clean the main floor.

Yet, reliable data analysis requires horizontal collaboration.  Indeed, there are downsides to task specialization:

  • By never looking at the data, senior scientists and managers rely on experience and hearsay. Their incoming bandwidth is dramatically reduced. Nature is the best coauthor. Consider how the best American spies were fooled prior to 9/11 while all the data to catch the terrorists was available. Bandwidth is a requirement to be smart.
  • When a single person crunches the numbers, hard-to-detect errors creep in. The problem is serious: Ioannidis showed that most research findings are wrong.
  • With nobody to review the source data, the sole data analyst is more likely to cheat. Why rerun these tests properly, when you can just randomly dismiss part of the data? People are lazy: when given no incentive, we take the easy way out.

The common justification for task specialization is that senior researchers and managers do not have the time. Yet, 30 years ago, researchers and managers did not type their own letters. Improve the tools, and reduce task specialization.

With Sylvie Noël, I decided to have a closer look. My preliminary conclusions are as follows:

  • There are adequate tools to support rich collaboration over data analysis. Collaboratories have been around for a long time. We have the technology! Yet, we may need a disruption: inexpensive, accessible and convenient tools. The current migration tower Web-based applications might help.
  • Given a chance, everyone will pitch in. To make our demonstration, we collected user data from sites such as IBM Many Eyes and StatCrunch. We then ran an Ochoa-Duval analysis. We find that the network of users within web-based data analysis tools is comparable to other Web 2.0 sites.

As a database researcher, I think that further progress lies with loosely coupled data (no big tables! no centralized tool!) and flexible visualization tools (stop the pie charts! go with tag clouds!). I am currently looking for new research directions on this problem, any idea?

Further reading: 

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

8 thoughts on “Why senior researchers and managers should analyze data themselves…”

  1. There’s also a subtle bias when senior researchers don’t routinely look at data. Analyses that confirm expectations will not be reviewed, but analyses that contradict expectations will be reviewed. So some kinds of mistakes will be more likely to go undetected than others.

  2. Vertical collaboration, in my mind, is more analogous to “you clean the whole house; I’ll offer feedback on whether you did an adequate job or not.”

    Cool idea though. I wasn’t clear: are you calling for more collaboration inter-research group, or intra? I think one of the best advances, in software research, anyway, would be for mandatory public data warehousing for all papers. E.g., http://promise.site.uottawa.ca/SERepository/

    This would facilitate intra-group collaboration. From reports on climate modeling, that field has a healthy ‘co-optition’ model: each group designs a model, but once built, they share ideas to improve them all.

  3. An interesting point. I agree that the whole process, as it is done today, is troublesome. Analyzing the data is the most important point in doing research, and it is basically the point in which professors are less likely to be involved.

    The main technical problem is that in many cases, the data goes through several processes, and it is almost impossible to present a nice big data structure that can be analyzed by everybody.

    Maybe a a more modest approach is to leave traits of the data used in each stage. Its not really a technical solution, and can be implemented by Google spreadsheet (in most cases) and some good will, but it is much better than burying the data in some Phd’s computer.

  4. Can I offer a distinction, and/or ask for a clarification?

    You seem to be saying that there are two choices: Collaborators can either be (1) vertical and specialized, or (2) horizontal and general.

    I agree that (1) is a problem, because the “senior” collaborator never engages with the low-level data. But I don’t see why the solution has to be (2). Isn’t there a third option?

    “Horizontal and Specialized”

    What I mean is, both collaborators should work on the same underlying, low-level data. But the tools given to each person to do their analysis are specialized and different. Perhaps overlapping, but with a certain amount of complementarity as well. Again, this is not complementarity of the data; both collaborators have access to the full raw data. But the system actively helps them look at that same, shared data in different ways, so as to see patterns that the other person might not be seeing.

    At least, that’s the way my research group has been thinking about things, in our “collaborative exploratory search” work. Horizontality, but with specialization. Is this what you mean, too?

  5. @Daniel,

    Let me give a naive, oversimplified example: Suppose you had a tool that made one collaborator always see the raw data as a bar graph, and the other collaborator always see it as a pie chart. Then if the pie person ever saw a pattern that was less apparent in the bar view, the pie person could share that view with the bar person. But the point is, the tool would automatically (and hopefully helpfully) push both collaborators into different modes of seeing things. Horizontal, but specialized.

    Now, of course the world is more complicated than “I specialize in pie charts and you specialize in bar graphs”. But you get the idea.

  6. @Jeremy That’s an interesting approach to the issue and one that could work SO LONG AS everyone in the group is at the same level.

    Or let me put it this way. Let’s say Pie Chart Viewer is the boss and Bar Graph Viewer is the employee. Is Bar Graph Viewer actually going to point out to Pie Chart Viewer that there is something wrong with his view? Is the boss going to believe his employee or dismiss it since he can’t see anything “wrong” with his own personal view?

    Yes, this is an interaction problem that can happen in any small group, no matter what tools they are using, but I’d be worried that your approach might accidentally exacerbate it. Actually, this would make for a fascinating study 🙂

  7. @neil

    I wasn’t clear: are you calling for more collaboration inter-research group, or intra?

    Actually. Let us do away with the notion of “group” shall we? Let us talk about networks of researchers instead.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.