The Netflix competition is nearly concluded. We have learned that ensemble methods are the solution for more accuracy.

The recommender system community moves on. Immediate questions come to mind:

  • Researchers continue to use the Netflix data set. Will it remain freely available?
  • We need to study the effect of a 10% accuracy gain (measured by RMS) on user satisfaction. How do we go about it?

Otherwise, it seems that future research is bound by the available data. Models and theory alone have never had much of an impact on the field. Accordingly, there has been a surge of research on recommender systems using social network sites as a data source. Alas, social data is sparse, heterogeneous and ephemeral. Are all the low-hanging fruits gone?

Goodreads is a Social Web site about books. They need a recommender system. Thus, they issued a challenge: design their recommendation engine, and they will read your resume. I suppose this is the poor man’s version of the Netflix challenge.

There is nothing wrong with the challenge, but I wish the current Goodreads engineers would publish their own recommender system. How hard can it be to do item-based analysis?

Update: Students at Stanford built and tested a recommender system for Goodreads.

Most database users know row-oriented databases such as Oracle or MySQL. In such engines, the data is organized by rows. Database researcher and guru Michael Stonebraker has been advocating column-oriented databases. The idea is quite simple: by organizing the data into columns, we can compress it more efficiently (using simple ideas like run-length encoding). He even founded a company, Vertica, to sell this idea.

Daniel Tunkelang is back from SIGMOD: he reports that column-oriented databases have grabbed much mindshare. While I did not attend SIGMOD, I am not surprised. Daniel Abadi was awarded the 2008 SIGMOD Jim Gray Doctoral Dissertation Award for his excellent thesis on Column-Oriented Database Systems. Such great work supported by influential people such as Stonebraker is likely to get people talking.

But are column-oriented databases the next big thing? No.

  • Column stores have been around for a long time in the form of bitmap and projection indexes. Conceptually, there is little difference. (See my own work on bitmap indexes.)
  • While it is trivial to change or delete a row in a row-oriented database, it is harder in column-oriented databases. Hence, applications are limited to data warehousing.
  • Column-oriented databases are faster for some applications. Sometimes faster by two orders of magnitude, especially on low selectivity queries. Yet, part of these gains are due to the recent evolution in our hardware. Hardware configurations where reading data sequentially is very cheap favor sequential organization of the data such as column stores. What might happen in the world of storage and microprocessors in the next ten years?

I believe Nicolas Bruno said it best in Teaching an Old Elephant New Tricks:

(…) some C-store proponents argue that C-stores are fundamentally different from traditional engines, and therefore their benefits cannot be incorporated into a relational engine short of a complete rewrite (…) we (…) show that many of the benefits of C-stores can indeed be simulated in traditional engines with no changes whatsoever.  Finally, we predict that traditional relational engines will eventually leverage most of the benefits of C-stores natively, as is currently happening in other domains such as XML data.

That is not to say that you should avoid Vertica’s products or do research on column-oriented databases. However, do not bet your career on them. The hype will not last.

(For a contrarian point of view, read Adabi and Madden’s blog post on why column stores are fundamentally superior.)

Apparently, it is prestigious to write research papers with people from other countries. Funding agencies routinely favor collaboration between different  universities.

Presumably collaboration improves productivity? Maybe not:

(…), there is no clear evidence that correlation exists between the resort to extramural collaboration and the overall performance of a research institution

Reference: Giovanni Abramo, Ciriaco Andrea D’Angelo, Flavia Di Costa, Research collaboration and productivity: is there correlation? High Educ (2009) 57:155–171.

« Previous Page

Powered by WordPress