How to win the Netflix $1,000,000 prize?

Yahuda Koren, one of the winners of the Netflix game so far, was nice enough to send me a pointer to a recent paper he wrote, Chasing $1,000,000: How we won the Netflix progress prize (link is to PDF document, see 4th page and following).

Their approach is based on the linear combination of large number of predictors. Their work is difficult to summarize because it is so sophisticated and complex. Nevertheless, it might be useful to try to see what lessons can be learned.

First, some generic observations that are not very surprising, but nice nevertheless:

  • All data distributions are very skewed. A single movie can receive 200,000 ratings whereas a large fraction of the movies is rated fewer than 200 times. Some users have rated 10,000 movies or more whereas most users have rated around 100 movies.
  • Ratings on movies with higher variance (you either like it or hate it) are more informative.

Here are some principles I take away from their work:

  • Singular Value Decomposition is useful to get overall trends.
  • Nearest-neighbor methods are better at picking up strong interactions inside small sets of related movies.
  • Nearest-neighbor methods should discard uninformative neighbors.
  • If you discard ratings and focus on who rated which movie, you seem to get useful predictors complementing the rating-based predictors.
  • Regularization is important (they use ridge regression) as expected.

There is, in their work, a very clear trade-off from our ability to explain the recommendations, in favor of the accuracy. This is somehow dictated by the rules of the game, I suppose. They acknowledge this fact: “when recommending a movie to a user, we don’t really care why the user will like it, only that she will.” Presumably, neither the engineer or manager running the system, nor the user should care why the recommendation was made. I have argued the exact opposite, and so have others. I hope we can agree to disagree on this one. (I have said it before, my goal in life is to make people smarter, not to make smarter machines.)

Note. They claim that there is no justification found in the literature for the similarity measures, it is all arbitrary. I think Yahuda did not read my paper Scale And Translation Invariant Collaborative Filtering Systems. I suggested that the predictions of an algorithm should not change if we transform the data in some sensible way. Of course, this may not be enough to determine what the similarity measures must be, but it allows you to quickly discard some choices.

4 thoughts on “How to win the Netflix $1,000,000 prize?”

  1. I subscribed to your blog after finding your work on CF for research on Netflix.

    A few things I’m curious about and would appreciate reading your thoughts on:

    -SVM’s barely get a mention by competitors. It seems odd that they wouldn’t be used. Do you know if performance is the issue? They are of course one of the worst for explainability.

    -Date seems to have a large effect on ratings, but all I find is people saying they can’t take advantage of it to improve their scores. Have you heard of any good research on this? With several percentage points variation over the year in average score, it boggles my mind nothing could be squeezed out of it.

  2. As for date… certainly, if you knew when the user entered his ratings, or at least when the user was active, you could leverage this information… but otherwise, it is hard to see where to go.

    There is published work on taking into account the time factor, mostly to dampen older ratings.

    I do not have the faintest idea how SVMs would fare on Netflix. However, scalability is definitively a serious issue for all algorithms. Most of academic machine learning work is done on relatively small data sets, and Netflix is huge in comparison, though it is still modest compared to what you face in industry.

  3. Two comments…

    About explain-ability:
    I am the last one to argue against the importance of explaining the recommendations to the user. (See my [email protected]…) However, the seemingly tradeoff between parsimonious explanation and accuracy is much exaggerated, especially when considering real life systems rather than idealist situations. Basically, you want to push as much as you can on both frontiers. Top-notch accuracy lets you stay with the more confident recommendations, and/or assume more risk, which allows you to dare and surprise the customer with not so popular recommendation specially tailored for him. Then, at a *second stage*, you may need to explain “why”. To this end, you are going to utilize the best explaining techniques in your disposal. And BTW, contrary to some belief, latent factor models, such as SVD, won’t prevent coming up with working explanations.

    About dates:
    We did utilize them. Description is scattered across our papers. Overall, they are far less helpful than what we hoped, especially considering the significant transitions over time that are present in the data. Maybe it won’t be true in other datasets.

    Regards,
    Yehuda

  4. In my view scale invariance is one of the indicators of a robust algorithm in all of numerical analysis. Any algorithm that can be defeated by a simple transformation is not correctly identifying the information in the data. It was Peter Deuflhard who first introduced me to this insight with a paper on affine invariance.

Leave a Reply

Your email address will not be published. Required fields are marked *