Recommender systems: where are we headed?

Daniel Tunkelang comments on the recent progress in collaborative filtering:

(…) the machine learning community, much like the information retrieval community, generally prefers black box approaches, (…) If the goal is to optimize one-shot recommendations, they are probably right. But I maintain that the process of picking a movie, like most information seeking tasks, is inherently interactive, (…)

I disagree with him. Even for non-interactive recommendations, the Machine Learning community is off-track for two reasons:

  • They fail to take into account diversity. In Information Retrieval, we know that if precision is high (all documents are relevant) but recall is low (few of the relevant documents are presented), then the system is poor. There is no such balance in collaborative filtering. Precision above all else is the goal. This is wrong. Diversity metrics must be used.
  • They work over static data sets. A system like Netflix is not static and so, accuracy on a static data set might be a good predictor for real-world performance. The problem is intrinsically nonlinear. People will rate different items, and they will rate differently, if you change the recommender system. The feedback loop may work against you or in your favour. The effect might be large or small. As far as I can tell, I am the only one who keep pointing out this fundamental, but never addressed limitation of working over static data sets. Update: This has absolutely nothing to do with online versus batch algorithms.

See also my post Netflix: an interesting Machine Learning game, but is it good science?

Note: I organized the ACM KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition along with people like Yehuda Koren. Yahuda is among the candidates to win the Netflix prize. I do not oppose the Netflix competition. I just do not think that it will solve our big problems.

8 thoughts on “Recommender systems: where are we headed?”

  1. The NYT article actually does raise a few issues that you mention – such as the importance of diversity (through Maes’ complaint about narrow-mindedness).

    Also, I’ve elaborated on the problems of RMSE on our blog – it was interesting to see Koren’s comment to your Dec 07 post about RMSE giving a misleading measure of progress.

  2. Regarding your criticisms of machine learning, there is research in that field that considers diversity constraints and non-static data sets.

    Section 5.1 of this paper by Smola and Le explicitly considers diversity constraints for ranking problems, of which collaborative filtering is a special case.

    There is also quite a lot of research in ML on what you call “non-static” data sets. However, the ML community refers to this as “online” learning. Stochastic gradient descent is a well known and practical example of these type of algorithms.

    There is even research that addresses both of your short-coming at once. For example, Crammer and Singer have a paper that provides an online algorithm for ranking and apply it to the EachMovie data set. Searching for “online learning” and “ranking” reveals more along these lines.

  3. There is also quite a lot of research in ML on what you call “non-static” data sets.

    Here is the problem again. When working with a static data set, such as what Crammer and Singer do with the EachMovie data set, they ignore the fact that, in practise, the ratings are influenced by the collaborative filtering algorithm! If you change the algorithm, you will collect different ratings. That’s because your users browse the movies, say, based on what the recommender suggests (for example, Amazon says 30% of their sales is due to the recommenders they use)… so they will rate different items, differently, if you change the algorithm. In turn, this will influence the algorithm which will then change how it influences the users.

    This is like the polls right before the election. The polls are supposed to measure how people vote, but in fact, they influence people… there is a feedback loop.

    That has *absolutely* nothing to do with whether you do batch or online processing. The online versus batch is a performance issue, I’m talking about how the *algorithm* changes the *data* (and vice versa).

    Section 5.1 of this paper by Smola and Le explicitly considers diversity constraints for ranking problems, of which collaborative filtering is a special case.

    Thanks. This recent non-peer-reviewed paper looks good indeed. But it is hardly representative of the algorithmic research done in collaborative filtering though. The diversity issue has not be ignored in collaborative filtering. I have a survey report somewhere of what was done. Several people, from way back, talked about diversity in recommender systems. However, the diversity work is tiny and vastly ignored.

    Why? Because it is a lot easier to measure accuracy. So, all the work (99%) focuses on this one issue above all else.

  4. I see what you mean by non-static data now and take your point. However, I disagree that online processing has “absolutely nothing to do” with non-static data since online methods are able to track non-stationary targets. That is, if the results of the algorithm are changing the distributions underlying the data then as more data is taken into account the algorithm will adapt to them.

    I also agree that there is not much work on diversity measures but I thought you were too quick to discount ML research with a sweeping statement so felt compelled to offer a counter-example.

  5. I see what you mean by non-static data now and take your point. However, I disagree that online processing has “absolutely nothing to do” with non-static data since online methods are able to track non-stationary targets. That is, if the results of the algorithm are changing the distributions underlying the data then as more data is taken into account the algorithm will adapt to them.

    An online algorithm will have a tighter feedback loop, but ultimately, you are limited by how quickly your users can react and input data. Hence, the difference between a batch algorithm run every day, and an online algorithm that adapts on the fly, might not be so large.

    Of course, I would favor the online algorithm given a chance… šŸ˜‰ But google seems to do well with batch indexing algorithms. I understand that they run PageRank in batch mode… and they seem to do fine.

    I also agree that there is not much work on diversity measures but I thought you were too quick to discount ML research with a sweeping statement so felt compelled to offer a counter-example.

    I am equally critical of my own work, of the work in any domain. It is by seeking flaws that we make progress. And I have received my share of criticism from the ML community and from the TCS community as well. (Note that I have published papers in ML and TCS journals/conferences. I refuse to live in closed gardens.)

    Please go read my papers and criticize them! Publicly! If people can’t take criticism, they should stay home, in the labs, and never publish.

    However, my impression is that the ML community suffer from the same flaw than any of these tightly integrated communities: it becomes strongly biased. See my post Encouraging
    diversity in science
    for a related discussion.

    I believe research should not occur within groups, but within networks. The communities should be open, not closed. Single-minded people (“accuracy above all else”) should be left behind. Science requires us to be open minded, to have a dialogue not only with people who “think alike” but also with people who think differently, so that we can do “richer” science.

  6. I definitely agree with your last point. I’m also wary of very tightly knit groups. On that not, you’ll be happy to hear that I’m presenting a paper at a conference on Australian literary culture next month. Is that diverse enough for you? šŸ™‚

Leave a Reply

Your email address will not be published. Required fields are marked *