• You can build an effective recommender system with as little as two people.
  • As you have more users, you tend to have more training data. Hence, you may have more accurate recommendations.
  • More accurate recommendations may not be important to your users.
  • The exact count of your users may not matter as much as the diversity of your users.
  • A good rule of thumb is that you should have many more users than you have items to recommend.
  • Given the right algorithms, your accuracy will improve monotonically with the number of users and the amount of training data.
  • The users may enter feedback data to correct the assumptions of your recommender system and thus, improve it over time.

Explanation: The title of my blog post is the subject of an email I got recently. A very popular question.

Acknowledgment: Andre inspired me to write this post.

I have written that solid-state memory drives (SSD) — as found in recent laptops such as the MacBook Air — nearly bridge the gap between internal and external memory. Indeed, we went from 3 orders of magnitude to 1 order of magnitude of difference between disk and RAM!

There is a catch however. SSDs can have terrible random write performance: at least two orders of magnitude slower than sequential writes!

Kevin Burton points out that — as a work-around — you can use log-structured file system. In effect, random writes are replaced by appends at the end of a log of changes. There are certainly cases where log-structured file systems are appropriate — I don’t know much about them — but are they appropriate for external-memory B-trees or hash tables?

However, some systems are designed to avoid random writes. For example, Google’s BigTable sorts data in memory before writing it to disk. Random writes are also minimized with most column-based databases and indexes such as C-store and bitmap indexes.

It is an interesting time to be a database researcher!

« Previous Page

Powered by WordPress