Tagging as a new information retrieval paradigm

The Sydney Morning Herarld is reporting that Tagging is popular. Tags are a Web 2.0 feature popularized by the Canadian Web site Flickr (possibly the largest and most popular multimedia database even built, before youtube came along). Essentially, tags allow us to replace semantically rigid taxonomies with more workable labels. These labels may be part of a hierarchy (java would be part of programming), but these hierarchies are typically never written down anywhere. When using tags, a given object will typically receive many tags, unlike taxonomies where we often try to have mutually exclusive categories. I like to think of tags as multidimensional, collaborative, and personalized taxonomies.

According to a recent survey, 28 percent of internet users have tagged content, and 7 percent have done so on a typical day. The number is maybe not surprising if you consider that youtube, flickr, del.icio.us, and so on, are very popular sites.

What I want to know is how many people actually use these tags for Information Retrieval?

Well, I just realized recently that Google Mail is tag-based (its “labels” are nothing but tags!). So all Google Mail heavy users are tag users. However, I use maybe only 10 labels, most others are never used and recently, Google took away all tags I had not used in a long time (which was a relief!). I have used tags more actively while preparing a conference, to categorize the various emails (paperunderreview, answerfromreviewer, and so on), but again, there were very few tags.

Also, whereas the posts on my blog used to belong to a taxonomy, increasingly, things are “degrading” (gracefully) into a tagging system (the hierarchies a getting less useful over time). However, the tagging is mostly used by me to help my readers find other posts they might like. I actually never use the tags to find my way on my blog. I have no idea whether users actually use these tags. I doubt it.

Slashdot started using tags recently, but I am not sure whether they will prove useful. I certainly never use them.

I still think that tags go in the right direction, generally speaking. But I am not sure we have the right recipe yet, but I could be wrong.

Already, you can find a tag cloud on my site, but it uses automatically parsed content. Look for it if you are interested. No, it does not appear on the front page. In fact, I find that most sites do not offer their bird’s eye view of the tags. You typically have to look for it. If tags were used often, wouldn’t they be easier to find?

What about you? Do you use tags? Do you find them useful?

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

One thought on “Tagging as a new information retrieval paradigm”

  1. I use tags in combination with a hierarchical organization system. For example, my repository of research papers exist a single folder, with the name of the file being the paper. I find setting the name to the title is useful for the times that I search via browsing.

    Beyond that single folder, I don’t organize them; I use tags. The tags aren’t meant to so much as an organizational tool so much as a useful “search hook”. If I need a document, I almost always search for it via Spotlight. Any tags I assign beyond the text indexing that Spotlight does are for “meta” terms that may not appear in the paper itself.

    Music, photos and video clips are all managed via iTunes, where I make extensive use of the metadata. Importing can be time consuming, but it’s worth it later on. I tried the hierarchical approach and I didn’t care for it, mainly because I was too pedantic about making things fit together “just right”. I resigned myself to the “lump it all on one place” approach and I find I think about it a lot less.

    I don’t think tags are a replacement for a folder system, but together they can be very useful.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax