Searching the blogosphere ought to be easier than ever. We have several dedicated tools:

Alas, I have found that the blogosphere is simply not searchable at the present time and I suspect that most of my readers don’t use these search tools on a regular basis. Because the blogosphere is so dynamic, it is really hard to filter. Web sites and the web topology is static enough to give Google and its competitors a chance (though a slim one). Collaborative filtering may help you find interesting blogs though I have yet to think of, or see, an algorithm that can do better than word-to-mouth. I used to think that filtering blog posts using regular expressions would lead interesting results, but short of using a lot of bandwidth, it just doesn’t work. Fixed regular expressions are not dynamic enough and they tend to gather a lot of low quality content.

So? Well, this explains why I read so many Theoretical Computer Science blogs… even though I don’t really belong: I’m more of database/data mining researcher. Beside the people I know well enough that they will always belong to my list, I’m mostly slowly migrating from one blog to another though I’ve pretty much reached a local maximum. I know there is more going on out there, but I cannot get to it.

Someone said that if you didn’t get into the RSS game by 2002, you were too late. I’m sorry, but I feel like the winning team hasn’t arrived yet. Anyone care to go for a start-up?

2 Comments »

  1. How about this:

    1. A crawler scans a post/blog and uses the Yahoo Contextual API (sorry don’t have the link on me now) to find buzz words – aka tags.

    2. Tags get registered in some nifty DB, I like MySQL :) , and then used in collaborative filtering engines, such as slope one.

    That would solve the biggest problem of tagging – instead of self-defined tags, Yahoo would take care of that. So, if I typically write about development and then ranted about web design in ONLY one post, Yahoo would maybe treat “design” as a lighter weighted tag compared to “ruby” “rails” “collaborativefiltering”.

    Comment by Kunal Anand — 2/11/2005 @ 23:45

  2. Kunal: If I get you right, the Yahoo Contextual API looks in the current page for buzzwords. Hmmm… I don’t like a “business plan” which is based on some API by Yahoo… ;-) But maybe you have something there!

    Comment by Daniel Lemire — 3/11/2005 @ 9:54

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress