Searching the Blogosphere
Searching the blogosphere ought to be easier than ever. We have several dedicated tools:
Alas, I have found that the blogosphere is simply not searchable at the present time and I suspect that most of my readers don’t use these search tools on a regular basis. Because the blogosphere is so dynamic, it is really hard to filter. Web sites and the web topology is static enough to give Google and its competitors a chance (though a slim one). Collaborative filtering may help you find interesting blogs though I have yet to think of, or see, an algorithm that can do better than word-to-mouth. I used to think that filtering blog posts using regular expressions would lead interesting results, but short of using a lot of bandwidth, it just doesn’t work. Fixed regular expressions are not dynamic enough and they tend to gather a lot of low quality content.
So? Well, this explains why I read so many Theoretical Computer Science blogs… even though I don’t really belong: I’m more of database/data mining researcher. Beside the people I know well enough that they will always belong to my list, I’m mostly slowly migrating from one blog to another though I’ve pretty much reached a local maximum. I know there is more going on out there, but I cannot get to it.
Someone said that if you didn’t get into the RSS game by 2002, you were too late. I’m sorry, but I feel like the winning team hasn’t arrived yet. Anyone care to go for a start-up?
Montreal, Canada 
Follow on
How about this:
1. A crawler scans a post/blog and uses the Yahoo Contextual API (sorry don’t have the link on me now) to find buzz words – aka tags.
2. Tags get registered in some nifty DB, I like MySQL
, and then used in collaborative filtering engines, such as slope one.
That would solve the biggest problem of tagging – instead of self-defined tags, Yahoo would take care of that. So, if I typically write about development and then ranted about web design in ONLY one post, Yahoo would maybe treat “design” as a lighter weighted tag compared to “ruby” “rails” “collaborativefiltering”.
Comment by Kunal Anand — 2/11/2005 @ 23:45
Kunal: If I get you right, the Yahoo Contextual API looks in the current page for buzzwords. Hmmm… I don’t like a “business plan” which is based on some API by Yahoo…
But maybe you have something there!
Comment by Daniel Lemire — 3/11/2005 @ 9:54