So, you want to be a mad scientist?

Exceptional scientists are often a bit crazy:

  • Kurt Gödel starved to death when his wife was hospitalized. He was paranoid.
  • John Nash suffered from paranoid schizophrenia.
  • Paul Erdõs was a homeless itinerant most of his life.
  • Henry Cavendish was so shy that he only communicated with his servants by writing notes.
  • Theodore Kaczynski (the Unabomber) became assistant professor at the University of California at Berkeley at the age of 25, before going to live in a cabin without electricity or running water.
  • Nikola Tesla was obsessive compulsive and mysophobic.

Further reading: Scientists and their emotions.

Source: tvtropes via Daniel Lowd.

Three of my all-time most popular blog posts

  • Emotions killing your intellectual productivity: We all have to deal with setbacks. And even when things go our way, we can still remain frustrated. I offer pointers on how to remain productive despite your emotional state.
  • Turn your weaknesses into strengths: We all have weaknesses. Maybe you are unemployed. Or maybe you failed at getting a research grant. Maybe you live in a remote or poor area. I think that, within reason, many of your weaknesses can actually play in your favor if you adapt your strategy.
  • How reliable is science?: I got a lot of heat for this blog post. Basically, I believe that the business of science is unreliable. And I am not alone. The Nobel laureate Harry Kroto wrote: “The peer-review system is the most ludicrous system ever devised. It is useless and does not make sense (…)”. I couldn’t agree more as my post The hard truth about research grants makes clear.

Remarkable scientists without a wikipedia page

I was surprised today to learn that Michael Ley’s wikipedia page had been deleted (because it failed to indicate the significance of the subject). I have yet to meet anyone in Computer Science or Information Technology who does not know about the DBLP Computer Science Bibliography. Michael has received numerous prestigious awards for his work. He is a remarkable pioneer.

But there are other remarkable people without a wikipedia page. Patrick O’Neil is another good example. Consider the citations that some of his papers received (according to Google Scholar):

  • The dangers of replication and a solution: cited 1036 times;
  • The LRU-K page replacement algorithm for database disk buffering: cited 515 times;
  • Improved query performance with variant indexes: cited 421 times;
  • ORDPATHs: insert-friendly XML node labels: cited 273 times.

Chances are that if you have ever used a database engine at all, it implemented an algorithm or a technique related to the O’Neil’s work.

How many other remarkable scientists don’t have a wikipedia page?

Update: Thanks to Ragib Hasan and David Eppstein, these two computer scientists now have wikipedia pages (see Ley and O’Neil).

Why you may not like your job, even though everyone envies you

In a provoking post, Matt Welsh—a successful tenured professor at Harvard—left his academic job for an industry position. It created a serious malaise: his department chair (Michael Mitzenmacher) wrote a counterpoint answering the improbable question: “why I’m staying at Harvard?” To my knowledge, it was the first time a departement chair from a prestigious university answered such a question publicly. Michael went even as far as arguing that, yes, indeed, he could get a job elsewhere. These questions are crazy if we consider that for every job advertised at Harvard, there are probably hundreds of highly qualified applicants.

But let me get back at Matt’s reason for leaving a confortable and prestigious job at Harvard:

(…) all of that extra work only takes away from time spent building systems, which is what I really want to be doing. (…) At Google, I have a much more direct route from idea to execution to impact. I can just sit down and write the code and deploy the system, on more machines than I will ever have access to at a university. I personally find this far more satisfying than the elaborate academic process.

In other words, Matt is happier when his work is more immediately useful. But where does the malaise about his decision comes from? After all, he will probably make as much or even more money at Google. Matt is not alone, by the way, Matthew Crawford—a Ph.D. in philosophy—left a high paying job in an American think tank for a job repairing motor bikes. His book Shop Class as Soulcraft tells his story.

I think that Matt’s decision might be hard to understand—at least, his departement chair feels the need to explain it to us—because he is putting into question the very core values of our society. These core values were explored by Veblen in his unconventional book The Theory of the Leisure Class. He argued that we are not driven by utility, but rather by social status. In fact, our society pushes us to seek high prestige jobs, rather than useful and productive jobs. In effect, a job doing research in Computer Science is more prestigious than an industry job building real systems, on the mere account that it is less immediately useful. Here are some other examples:

  • The electrician who comes and wires your house has a less prestigious job than the electrical engineer who manages vague projects within a large organization.
  • The programmer who outputs useful software has a less prestigious job than the software engineer who runs software projects producing software that nobody will ever use.
  • The scientist who tinkers in his laboratory has a less prestigious job than the scientist who spends most of his time applying for research grants.

Note how money is not always immediately relevant. While it is normally the case that manual labor has lower pay, it is almost irrelevant. And indeed, plumbers make more than software developers in some parts of the world (like Montreal)… Even though software jobs are usually considered more desirable.

There are at least three problems with this social-status system:

  • Nature is the best teacher. Working on real problems makes you smart. The German philosopher Heidegger famously made this point with a Hammer. To paraphrase him, it is not by staring at a hammer that we learn about hammers. Similarly, scientists who do nothing but abstract work in the context of funding applications are missing out. The best scientists work in the laboratory, in the field; they tinker.
  • By removing ourselves from the world, we risk becoming alienated. We become strangers to the world around us. Instead, we construct this incoherent virtual reality which has often much to do with soviet-era industrialism. We must constantly remain vague because truth has become subjective. Whereas the hammer hits, whereas the software crashes, whereas the experiment fails… projects are always successfully, marketing is always right and truth is arrived at by consensus. Yet, we know deep down that this virtual reality is unreal and we remain uneasy, trapped between reality and virtuality. The perfect example are the financial markets which are creating abstract products with agreed-upon values. As long as everyone plays along, the system works. Nobody must ever say that the emperor is naked. Everyone must accept the lies. Everything becomes gray.
  • Human beings like to make their own stuff. We value considerably more what we did ourselves. You may be able to buy computers for $200, but nothing will ever replace the computer you made yourself from scratch. It may be more economical to have some Indian programmers build your in-house software, but the satisfaction of building your own software is far more than what you get by merely funding it. Repairing your own house is a lot more satisfying than hiring handymen.

To summarize: trading practical work for high-level positions is prestigious, but it may make you dumber, alienated and unhappy. Back when I was a graduate student, we used to joke about the accident. The accident is what happens to successful professors: they suddenly become uninteresting, pompous, and… frankly… a tad stupid.

Thankfully, there is hope. The current financial crisis, mostly because it couldn’t happen according to most economists, was a waking call. The abstract thinkers may not be so reliable after all! The millions of college graduates who are underemployed in wealthy countries all around the globe have unanswered questions. Weren’t these high-level abstract college degrees supposed to pay for themselves?

How do we fix this broken caste system and bring back a healthier relationship with work? Alas, we cannot all become plumbers and electricians. But it seems to me that more and more people are realizing that the current system, with its neat white collar jobs and rising inequalities, could be improved upon drastically. The Do it yourself (or Do it with others) wave has been a revelation for me. Yes: Chinese factories can build digital thermometers much cheaper than I can. But making your own digital thermometer is far more satisfying. Saving money by abstracting out reality is not a good deal. And of course, building real systems is not the same as finding money for your students to do it for you.

Further reading: Working long hours is stupid and Formal definitions are less useful than you think.

Public funding for science?

Terence Kealey has been arguing against public funding of science. Is it efficient to fund science with government dollars? He argues that when science is mostly funded by large government agencies, other funding sources are effectively crowded out. He has two good historical example. Firstly, while France massively invested in research and academic institutions in the 17th and 18th centuries, the United Kingdom, and not France, gave birth to the industrial revolution and the accompanying scientific surge. Secondly, the United States was leading the world in technological innovation starting in the 19th century whereas it had a comparative underdeveloped academic system, and no public research funding.

In short, whereas there is a correlation between wealth and scientific output, there is no evidence that public science funding generates economic growth. Moreover, government funding results in a concentration of power in the hands of few politicians. Trusting politicians with almost all of the research funding is a tad insane. It is even crazier to think that politicians have science in mind when allocating funding.

Kealey argues that for every dollar invested by the government, more than a dollar is withdrawn from research by private investors. While I don’t know whether this is true, I do know that I have no idea how I would go about asking for private funding, outside government programs, for my research. How do you go about it? Do you post a video on, say, kickstarter?

Note: I am a research grant recipient. The system has generally been good to me.

Can Science be wrong? You bet!

A common answer to my post on the reliability of science, was that fraud was marginal and that, ultimately, science is self-correcting. That is true on one condition: that the science in question is bona fide science. Otherwise, I disagree that institutional science is self-correcting. It is self-correcting about as much as human beings are rational. That is, not often. A lot of what passes for science is actually cargo cult science. What looks like rigorous science, may not be, no matter what the experts tell you. Don’t fool yourself: science is not the process of getting published in prestigious journals or a tool to get a tenured job. Richard Feynman defined science as the belief in the ignorance of experts.

Institutional science can be wrong or not even wrong for decades without any remorse:

  • Economists failed to predict or explain the last financial crisis. Yet they can’t put into questions their models. Philip Mirowski explains why: “The range in which dissent happens is so narrow. (…) The field got rid of methodological self-criticism.”
  • A large fraction of AI researchers have convinced themselves that intelligence must emerge from Prolog-like reasoning engines. This gave us twenty years of predictions that the future was in expert systems, and the last ten years spent predicting the rise of the Semantic Web. This ever-growing community of AI researchers are oblivious to their own failure to produce any useful result.
  • Like Fred Brooks, I’m amazed that in 2010, the waterfall method is taught in software engineering school as the reference model. There is no evidence that it is beneficial and, in fact, much evidence that it is hurtful. That is, students would be better off learning nothing rather than learning to use the waterfall method. Yet, entire Ph.D. thesis are still built on the assumption that the waterfall method is sound. Accordingly, criticizing the waterfall method on campus is a risky business.
  • The dominant paradigm of modern Theoretical Physics is String theory, which is not even a scientific theory.

We should not trust that self-correction will happen. Instead, biases are often self-reinforcing. Rather, we must ask how self-correction can happen. I think that all science must be verified by independently designed and reproduced experiments. For example, it is insufficient to verify the speed of light with one reproducible experiment. It must be possible for different researchers to come up independently with different experiments, which are all reproduced independently several times. And if everyone is working from the same data, the limitations of the data may never be revealed. And if there is no experiment, you are doing Mathematics or art, not science.

Peer review does not lead to self-correction. Peer review increases quality, but it can also reinforce biases. In Information Retrieval, we often talk about the trade-off between precision and recall. Peer review improves precision, but degrades recall. If your primary goal is to please your peers, you won’t be tempted to point out the flaws in their research!

However, I am optimistic for the future. The rise of Open Scholarship will allow outsiders to participate in the research process and keep it more honest.

How reliable is science?

It is not difficult find instances of fraud in science:

How did these people fare after being caught?

  • Ranjit Chandra still holds the Order of Canada, as far as I can tell. According to Scopus, his 272 research papers were cited over 3000 times. As for his University? Let me quote wikipedia: University officials claimed that the university was unable to make a case for research fraud because the raw data on which a proper evaluation could be made had gone missing. Because the accusation was that the data did not exist, this was a puzzling rationale.
  • According to Scopus, Woo-suk Hwang has been cited over 2000 times. Despite having faked research results and having committed major ethics violations, he has kept his job and… he is still publishing.
  • Despite all the retracted papers, Jan Hendrik Schön has still 1,200 citations according to Scopus. He lost his research job, but found an engineering position in Germany.

Conclusion: Scientific fraud is a low-risk, high-reward activity.

What is more critical is that we still equate peer review with correctness. The argument usually goes as follows: if it is important work, work that people rely upon, and it has been peer reviewed, then it must be correct. In sum, we think that conventional peer review + citations means validation. I think we are wrong:

  • Conventional peer review is shallow. Chandra, Hwang and Schön published faked results for many years in the most prestigious venues. The truth is that reviewers do not reproduce results. They usually do not have access to the raw data and software. And even if they did, they are unlikely to be motivated to redo all of the work to verify it.
  • Citations are not validations. Chandra, Hwang and Schön were generously cited. It is hardly surprising: impressive results are more likely to be cited. And doctored results are usually more impressive. Yet, scientists do not reproduce earlier work. Even if you do try to reproduce someone’s result, and fail, you probably won’t publish it. Indeed, publishing negative results is hard: journals are not interested. Moreover, there is a risk that it may backfire: the authors could go on the offensive. They could question your own competence.
  • There are many small frauds. Even without making up data, you can cheat by misleading the reader, by omission. You can present the data in creative ways, e.g. turn meaningless averages into hard facts by omitting the variance (see the fallacy of absolute numbers). These small frauds increase the likelihood that your paper will be accepted and then generously cited.

How do we solve the problem? (1) By trusting unimpressive results more than impressive ones. (2) By being suspicious of popular trends. (3) By running our own experiments.

Further reading: Become independent of peer review, The purpose of peer review and Peer review is an honor-based system.

Source: Seth Roberts.

Manifesto for Half-Arsed Academic Research

  • Research results are more important than the number of publications or citations.
    This is fine. Yet, we don’t have time to read your papers. So, just keep publishing a lot of papers each year. And get your influential friends to cite you. That’s how we’ll know whether you are good.
  • Science and truth are more important than spin and marketing.
    Yes, but keep pretending you will solve world hunger. And align your research results with the current fashionable trends.
  • You cannot tell where the next science breakthrough is going to come from.
    Maybe. Still, we want a plan of your research activities for the next five years.

Further reading: The hard truth about research grants and The secret behind radical innovation.

Source : Manifesto for Half-Arsed Agile Software Development via John D. Cook.

Counterintuitive factors determining research productivity

  • Permanent researchers publish more when they are in smaller labs.
  • Having many Ph.D. students fails to improve productivity.
  • Funding has little effect on research productivity.

Reference: Carayol, N. and Matt, M., Individual and collective determinants of academic scientists’ productivity, Information Economics and Policy 18 (1), 2006.

Further reading (on this blog): To be smarter, ignore external rewards, Is collaboration correlated with productivity?, Big schools are no longer giving researchers an edge?

How to get everyone talking about your research!

Deolalikar claims to have solved the famous P versus NP problem. Is the proof correct? Some influential researchers doubt it: Scott Aaronson is betting 200k$ of his own money against Deolalikar.

What I find most interesting is that Deolalikar did not submit the paper to a journal, as far as I know. He didn’t even post it on arxiv like Perelman. Yet, he is receiving much attention. His name is being tweeted several times a minute. Many of the most influential theoretical computer scientists are reacting to the paper. He is getting the best peer review possible. Most similar papers don’t get so much attention.

Why is this paper different?

  • Everyone seems to agree that the paper is well written, it has nice (color!) figures and the reference section appears up-to-date and complete.  If your result is important, communicate it well.
  • Deolalikar has published just a handful of papers in theoretical computer science, and none at the major conferences. But he has enough peer-reviewed research papers to be treated as a peer.
  • While I doubt he was hired to work on complexity theory, Deolalikar is an industry researcher at HP. Being paid to do research might make you more credible.

Further reading: Deolalikar’s publication list on DBLP, A Proof That P Is Not Equal To NP? by Lipton and P ≠ NP by Baker.

Update: Porreca has the best write-up on reactions to this paper.

Update 2: The consensus after two weeks is that the proof wrong and unfixable.