A common feeling among creative workers is the lack of time. Yet, most people will run out of energy before they run out of time. A single task that takes you 5 minutes (asking a Business Development Officer for Intellectual Property rights) can drain you out for a week. Another task, like lecturing for 3 hours, can energize you for the rest of the week. Highly productive people do not have more time, but they may have more energy, more method and better feedback on their progress.

I believe that three problems lead us to conclude we lack time:

  • You are spending too much time on boring tasks. To be productive, you need to work on projects you love. For this reason, creative people should pick their projects.
  • You fail to manage your projects. Without help, you can only keep track of our 7 projects or tasks at any one time. If you want to do more, a method is needed. Myself, I use GTD. But some method is needed to scale up to a large number of projects. Without method, you will drift to unessential tasks and then blame the lack of time to explain why important tasks went unattended.
  • You do not measure your progress. You need to get feedback about the quality and quantity of your work. Myself, I put my work under subversion and get daily emails of what files changed. It is a crude by effective measure of my work. Also, tracking your project carefully, at the task level, helps. Finally, having coworkers who react to your work is a blessing. Without measure of your progress, you may realize too late that your projects did not progress and then blame the lack of time.

Among scientists-bloggers, the new buzz word is Mendeley: a social networking platform for scientists (Ricardo Vidal, Sylvie Noël, Misha Lemeshko, Michael Kuhn, …). The site is barely getting started and is still in early beta, there are bugs and limitations. However, the London-based has funding and a solid staff.

Their vision statement is compelling:

Mendeley is free social software for managing and sharing research papers. It is also a Web 2.0 site for discovering research trends and connecting to like-minded academics. To achieve our long-term vision of a “Last.fm for research“, we’re working with the former founding engineers of Skype and Last.fm’s former chairman.

Last night I created a profile. I got tired of entering my papers and I stopped entering them around 2005-2006. If you have 100 published papers, you are going to swear a lot. It is sad that you cannot just link to your existing pub. list (such as arxiv.org).

Where I see the potential is in the social networking. It seems that all of the scientific networking I do is “hand-crafted”. I am hoping for more!

It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as quality of the presentation, length, logical correctness, prestige of the authors, and methodology, but not on the significance of the work. Significance cannot be disproved at the time of the review. Even technical details end up being fundamental ideas: this happens frequently in mathematics where lemmas often outshine theorems on the long term.

I review several research papers every month, and several research funding proposals every year. At best, I can determine that something is badly presented. I can find logical or mathematical errors. Beyond this, my opinion is probably often wrong.

Here are a few things I would have or I have categorized as crackpot ideas:

  • Back in 1990, I would have predicted that the WWW was impractical. How can you deal efficiently with broken links? Who is going to maintain all these links? Yet, it works. I almost never encounter a 404 (missing page) error.
  • Back in 1991, I would have laughed had anyone that you can efficiently index and categorize over 8 billion dynamic Web pages, much of which appears and disappears frequently. Yet Google, Yahoo and many other search engines are able to index daily the content of my posts. They differentiate my content from webspam. They also determine the authority of my page. Yet, there is no central registry, no form of quality control, and so on. While they use technically sophisticated techniques, much of it works simply by brute force: keep revisiting and reindexing the sites you expect to change.
  • Not long ago, I had concluded that Twitter was a useless idea. Months later, I realize that Twitter offers ambient collaboration. I believe it caters to an essential need that  had gone mostly unnoticed previously. (If you are not on Twitter, you ought to be.)
  • The first time I read about bitmap indexes, I thought it was a limited clever technical trick with little scientific interest. (I just published two papers on bitmap indexes and I have more on the way!)
  • Jim Gray’s data cube idea is to work with a lattice of 2d cuboids. Since, in data warehouses, we often have d large (d>15), the materialization of even a small fraction of these cuboids is impractical. Yet, it has been very fruitful both in industry and in academia.

Fortunately, if you merely discard the papers that omit to follow my guidelines, you already discard quite a number! Requiring papers to be without logical flaws and well written is often quite harsh!

Anyhow, there must be some link to evolution theory. I am sure that there has been new species which presented initially little interest, but ended up being of crucial importance.

For an entertaining take on this problem, see:  Simone Santini, We are sorry to inform you…, IEEE Computer, December 2005.

I have been arguing on this blog that while everyone knows diversity is a desirable property of recommender systems, there has been little work on the topic. To make my claim precise, I decided to list the papers addressing both recommender systems and diversity. I mean this list to be complete.

You can find a few more references and some analysis in our technical report:

Daniel Lemire, Stephen Downes, Sébastien Paquet, Diversity in open social networks, published online, October 2008.

If I am missing any paper, tell me!

Maybe this warrants a Wikipedia page?

Daniel Tunkelang comments on the recent progress in collaborative filtering:

(…) the machine learning community, much like the information retrieval community, generally prefers black box approaches, (…) If the goal is to optimize one-shot recommendations, they are probably right. But I maintain that the process of picking a movie, like most information seeking tasks, is inherently interactive, (…)

I disagree with him. Even for non-interactive recommendations, the Machine Learning community is off-track for two reasons:

  • They fail to take into account diversity. In Information Retrieval, we know that if precision is high (all documents are relevant) but recall is low (few of the relevant documents are presented), then the system is poor. There is no such balance in collaborative filtering. Precision above all else is the goal. This is wrong. Diversity metrics must be used.
  • They work over static data sets. A system like Netflix is not static and so, accuracy on a static data set might be a good predictor for real-world performance. The problem is intrinsically nonlinear. People will rate different items, and they will rate differently, if you change the recommender system. The feedback loop may work against you or in your favour. The effect might be large or small. As far as I can tell, I am the only one who keep pointing out this fundamental, but never addressed limitation of working over static data sets. Update: This has absolutely nothing to do with online versus batch algorithms.

See also my post Netflix: an interesting Machine Learning game, but is it good science?

Disclaimer: I organized the ACM KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition along with people like Yehuda Koren. Yahuda is among the candidates to win the Netflix prize. I do not encourage the Netflix competition. I just do not think that it will solve our big problems.

Next Page »

Powered by WordPress