Netflix: an interesting Machine Learning game, but is it good science?

picture by Mr. Guybrarian

The Netflix competition is a $1 million game to build the best possible movie recommender system. It has already contributed to science tremendously by providing the largest freely available collaborative filtering filter data set (about 2GB): it is at least an order of magnitude larger than any other similar data set. It has also generated many valuable research papers. Among interesting contributions is a paper showing that the anonymized data might not be so anonymized, after all.

However, Greg wonders whether the game itself will have a valuable output:

Participants may be overfitting to the strict letter of this contest. Netflix may find that the winning algorithm actually is quite poor at the task at hand — recommending movies to Netflix customers — because it is overoptimized to this particular contest data and the particular success metric of this contest.

Because I have written collaborative filtering papers in the past, on multidimensionality and rules, on the Slope One scheme and on the data normalization problem, people were quick to ask me if I would participate. The issue was quickly settled: the rules of the game forbid people from Quebec from participating. But privately, I expressed concerns that the game would be more about tuning and tweaking than about learning new insights into the science of collaborative filtering. I never expressed these concerns publicly for fear that it might be badly interpreted.

I do not think that the next step in collaborative filtering is to find ways to improve accuracy according to some metric. I think this game got old circa 2000. I am rather looking forward to people coming up with drastically new problems and insights.

Disclaimer. If you are working on the Netflix game, please continue. I do not deny that it is an interesting engineering challenge.

2 thoughts on “Netflix: an interesting Machine Learning game, but is it good science?”

  1. I tend to agree with the intuition that the systems being thrown at this are overfitting to the data set. The KorBell system is a hodgepodge of different methods that it seems unlikely would generalize to anything else without a lot of tweaking. I also agree that metrics like root mean squared error and mean absolute error have both reached the limit of their usefulness (there seems to be a collaborative filtering equivalent of a sound barrier). That said, I guess we can always hope the prize purse will bring someone to the field who makes a breakthrough.

  2. Daniel,

    You made a blatant statement: “do not think that the next step in collaborative filtering is to find ways to improve accuracy according to some metric. I think this game got old circa 2000”. My blatant response is that competitors revealed pretty soon that methods developed till circa 2006 got old, and cannot lead to significant improvements or further insight into the data. That’s why quite a few innovations were developed by competitors, thanks to the Netflix challenge, during the past year.

    It will take some time to fully recognize and appreciate these innovations. Certainly better familiarity with the methods themselves is required. The chosen RMSE error measure (which is an excellent choice in my eyes, but that’s another topic) has the tendency to miniaturize impression of progress, due to that square root…
    However, a deeper look into the new developments would reveal some important contributions to the field: (1) Improvement in accuracy will definitely have an impact on user experience. E.g., our studies show that 8% drop in RMSE means a very significant improvement in the quality of the top-K recommendations. (2) Key innovations are not specific to the contest, but general, and can be leveraged by a company like Netflix to obtain further improvements based on integrating the extra information that they hold. (3) Almost all new methods are scalable and computationally efficient (what is implied by the size of the Netflix dataset, which is much larger than previous ones.)

    I sympathize your willing to think bigger, beyond improving prediction error, but we should never forget the basics and the important impact they have on recommenders’ quality.

    Best wishes for the new year,
    Yehuda

Leave a Reply

Your email address will not be published. Required fields are marked *