Data has natural layouts:

  • text is written from the first to the last word,
  • database tables are written one row at a time,
  • Google presents results one document at a time,
  • the early recommender systems compared users to other users,
  • discussions are organized in newsgroups and posting boards by topic,
  • research papers are organized in journals and conferences,
  • objects have attributes (a ball is red), and from these attributes we determine similarities between objects.

Using a database terminology, these are horizontal layouts.

We can rotate these models to create vertical layouts:

  • Instead of writing text sequentially, we can store the locations of each word in an inverted index.
  • Instead of writing database tables row-by-row (e.g., Oracle, MySQL), we can write databases column by column (e.g., C-Store/Vertica, LucidDB, Sybase IQ, and my Lemur Bitmap Index Library).
  • Instead of presenting results sets one document at a time, we present tag clouds and use faceted search to support exploration. Thus, instead of listing documents, we focus on attributes (date, topic, author).
  • Recommender systems are often more scalable when they compare items instead of users: the most famous example is Greg Linden‘s Amazon recommender (if you liked this book, you may like…). For example, the Slope One algorithms outperform many user-to-user algorithms.
  • The social web started out with topic-oriented newsgroups and posting boards, but it is not dominated by user-centric blogs and social sites (such as Facebook or Twitter). Since then, we have realized that user-oriented blogs can be preferable.
  • While research papers are published in conferences and journals, I argue that we should turn this around and organize research papers by author through author-specific feeds.
  • Some AI researchers are suggesting that relations might be primary whereas attributes would be secondary.
Many of the best solutions are hybrids. For example, text search sometimes require full-text indexes such as suffix arrays and Oracle recently announced a row/column hybrid.
Take away message: If you are stuck, try to rotate your data model. If neither the vertical nor the horizontal model is a good fit, create an hybrid.

2 Comments

  1. I tried rotating my data model and ended up in dimension n….

    Comment by Stephen Downes — 4/9/2009 @ 10:10

  2. I agree; thinking about things vertically, rather than horizontally, can be quite interesting.

    Did some work on this idea, for information retrieval. Came up with a very simple, yet effective, approach, simply by thinking vertically:

    http://www.fxpal.com/publications/FXPAL-PR-08-467.pdf

    Comment by jeremy — 4/9/2009 @ 14:08

Sorry, the comment form is closed at this time.

« Blog's main page

Powered by WordPress