Scientists are typically rather secretive about whatever they are working on right now. While in most universities, you can at least see where the researchers work, in some government laboratories, such as NRC, you would think that Russian spies are on every corner: how else can you explain the armed guards you find at the entrance of some buildings? I bet that some private laboratories are even better protected.

Initially, when I started this blog, I wanted to tell the world about what I was working on. Somehow, on paper, it sounded like a nice approach. By sharing my ideas, I could get some early feedbacks, some extra references, I could maybe even get some collaboration going.

While it may work for some, opening up my research ideas simply does not work for me. Explaining, clearly, what I work on is hard. I could sketch my ideas, but only a handful of people would grasp even half of what I would write. Moreover, many ideas never make it outside my office. I abandon most of my ideas, eventually. Thus, taking the time to explain my current set of ideas would be very wasteful.

So, you simply cannot tell what I am working on. I just won’t tell you. I will tell you to go read my papers.

Most researchers behave the same way. Interestingly, however, many researchers have another reason for behaving this way: they do not want to give their competition an edge. They do not divulge their ideas for the same reason they keep their data and their software secret: they want to make sure nobody can catch up to them.

Whenever several researchers are working toward the very same goal, this a sensible concern. After all, being the first to solve a given scientific problem, is important. Science is a winner-takes-all game, at least some of time. Other times, people are simply misguided: keeping yourself out of the information-sharing loop only makes you less useful to the community and, ultimately, less important.

Me? If I were able to read minds… if I were able to see what other researchers are thinking about… I would most certainly not bother. I am already overwhelmed with carefully crafted papers on Google Scholar, I would certainly not care for the early drafts of my competitors. It helps that I do not feel like I have competitors. This is no accident since I apply Dijkstra’s rule:

Never tackle a problem of which you can be pretty sure that (now or in the near future) it will be tackled by others who are, in relation to that problem, at least as competent and well-equipped as you.

Since the end of World War II, at least half of all university professors in North America have tenure: they cannot be dismissed without adequate cause. This job security is earned: you need to be a professor for several years, and to perform well, before you can be granted tenure.

At several schools, a large fraction of the teaching positions do not lead to tenure. There are claims that there is a growing trend to hire more and more people people on temporary positions. One justification for this trend might be that universities need the flexibility to adapt quickly to the market.

The same might be true of research institutions. While I worked at NRC, several projects were staffed by people whose contracts was for the duration of a project.

One argument that I hear to justify tenure is that it saves money. Indeed, people will accept a lower salary if they have job security. Even if the job market is favorable, even if if you could get more money elsewhere, few people like to change job frequently. Stability is nice.

But this argument is limited. What if the job market is difficult? If there is an oversupply of Ph.D.s, shouldn’t managers do away with tenure then?

Maybe not. Bland et al. have shown that tenure matters. According to their study, tenure-track professors perform significantly better than others:

faculty on tenure appointments are significantly more productive in research, more productive in education, more committed to their positions, (…)

I recently proposed that scientists should adopt the find a readership or perish motto. (A related goal for engineers might be “find users or perish.”) The goal is certainly not to have as many readers as possible, but having some serious readers matter.

I was chatting with Seb Paquet today and he came up with a good argument to support this view: it is very hard to have a significant readership without serving a useful function in a community. In other words, it is easier to write many empty/wrong papers than it is to attract a large readership with empty/wrong papers.

picture by Mr. Guybrarian

The Netflix competition is a $1 million game to build the best possible movie recommender system. It has already contributed to science tremendously by providing the largest freely available collaborative filtering filter data set (about 2GB): it is at least an order of magnitude larger than any other similar data set. It has also generated many valuable research papers. Among interesting contributions is a paper showing that the anonymized data might not be so anonymized, after all.

However, Greg wonders whether the game itself will have a valuable output:

Participants may be overfitting to the strict letter of this contest. Netflix may find that the winning algorithm actually is quite poor at the task at hand — recommending movies to Netflix customers — because it is overoptimized to this particular contest data and the particular success metric of this contest.

Because I have written collaborative filtering papers in the past, on multidimensionality and rules, on the Slope One scheme and on the data normalization problem, people were quick to ask me if I would participate. The issue was quickly settled: the rules of the game forbid people from Quebec from participating. But privately, I expressed concerns that the game would be more about tuning and tweaking than about learning new insights into the science of collaborative filtering. I never expressed these concerns publicly for fear that it might be badly interpreted.

I do not think that the next step in collaborative filtering is to find ways to improve accuracy according to some metric. I think this game got old circa 2000. I am rather looking forward to people coming up with drastically new problems and insights.

Disclaimer. If you are working on the Netflix game, please continue. I do not deny that it is an interesting engineering challenge.

Scientists are silly sometimes. For example, there is no standard way to figure out what a given researcher has published, nor to find out which papers appeared at a given conference and a given year. DBLP is one tool that tries to solve this problem for Computer Science. It is far from perfect, but its coverage is sometimes considered good enough.

For better or for worse, the reviewers of my last grant applications skipped my carefully crafted c.v. and jumped right into my DBLP publication list. (Remind me to no longer submit publication lists with my grant proposals and just write “see DBLP”.) I must admit that I routinely use DBLP to quickly determine what are the research interests of a given individual, or to find interesting conferences or journals.

One limitation of the DBLP is that its search engine is just not very powerful. Things are improving thanks to Faceted DBLP. The search function itself is not more powerful, but you get aggregated statistics on individuals and conferences. Want to know whether a given conference grows in number of papers presented per year? Want to know the most prolific author at a given conference over the years? These queries are now trivial.

Try it out now!

« Previous PageNext Page »

Powered by WordPress