Yuhong published her Working Notes in preparation for IJCAI were our paper was accepted.

She had the great idea of inviting people to comment on her review of related papers:

I read more papers according to IJCAI reviews. I noted down my summary on these papers. I hope some reviewers or the authors of the reference papers can read my blog, so that we can have more discussion.

I never saw anyone use a blog for this exact function before. Let’s see if it works!


Mathemagenic discusses research blogging and she found, based on her experience, that research blogging covers the following tasks:
  • publishing / dissemination / announcements (of papers, presentations, events by me and others)
  • research process
    • reflections
    • emotions
  • event blogging
    • notes
    • reflections
    • event planning (including travel planning)
  • paper blogging (notes on papers I read)
  • asking for help (explicit)
  • “enculturation” into research (reflection/learning on research culture, practices, tricks of the trade, etc.)
  • articulation
    • articulation of personal experiences (relevant for PhD)
    • articulation of problems/questions (may be implicit call for help, but often just thinking aloud)
  • writing-related (this is the difficult one)
    • drafting/testing pieces that supposed to go into a paper
    • giving space to pieces that do not fit into a paper
  • reflections on methodology

I just became aware of the Attention.XML specification. The goal of Attention.XML is:

  • How many sources of information must you keep up with?
  • Tired of clicking the same link from a dozen different blogs?
  • RSS readers collect updates, but with so many unread items, how do you know which to read first?
  • Attention.XML is designed to to solve these problems and enable a whole new class of blog and feed related applications.

Technically, Attention.XML is about making available to others the posts and feeds you like and the ones you dislike.

Attention.XML is an XML file (specifically an XOXO file) that contains an outline of feeds/blogs, where each feed itself is an outline, and each post is also an outline under the feed. This hierarchical outline structure is then annotated with per-feed and per-post information which captures such information as, the last time the feed/post was accessed, the duration of time spent on the feed/post, recent times of feed/post access, user set (dis)approval of posts, etc.
Attention.XML is an XML file (specifically an XOXO file) that contains an outline of feeds/blogs, where each feed itself is an outline, and each post is also an outline under the feed. This hierarchical outline structure is then annotated with per-feed and per-post information which captures such information as, the last time the feed/post was accessed, the duration of time spent on the feed/post, recent times of feed/post access, user set (dis)approval of posts, etc.

The idea is then to use collaborative filtering to find out what you may like.

This sounds like a great idea, except for what Dare Obasanjo points out:

The only cloud I see on the horizon is that if anyone figures out how to do this right, it is unlikely that it will be made available as an open pool of data. The ‘attention.xml’ for each user would be demographic data that would be worth its weight in gold to advertisers. If Bloglines could figure out my likes and dislikes right down to what blog posts I’d want to read, I find it hard to imagine why the Bloglines team would make that information available to anyone including the user. For comparison, it’s not like Amazon makes my ‘attention.xml’ for books and CDs available to myself or their competitors.

It seems to me that what we need is a legal solution. We need to make it so that companies using publicly available Attention.XML files must give back (à la GPL). For example, if you use my Attention.XML, then you need to make yours available. This way, companies like blogline would be forced to either use only internal data, or else make available their data sets when requested to do so.

Indeed, Attention.XML is very different from RSS. With RSS, you provide content that you want to be used… everyone wants more readers, so RSS is a winner. But Attention.XML provide my preferences, and why would I share my preferences? What do I win? Why would a company share my preferences if not for financial gain?

(For further reading on collaborative filtering, see Slope One Predictors for Online Rating-Based Collaborative Filtering [SDM'05] and Scale And Translation Invariant Collaborative Filtering Systems [Journal of Information Retrieval, 2005].)

Update: See PyLucene instead which relies in Java Lucene.

Some crazy folks ported the famous search engine Lucene to Python and the result is called Lupy!

Lupy is a is a full-text indexer and search engine written in Python. It is a port of Jakarta Lucene 1.2 to Python. Specifically, it reads and writes indexes in Lucene binary format. Like Lucene, it is sophisticated and scalable. Lucene is a polished and mature project and you are encouraged to read the documentation found at the Lucene home page.

Who needs Java? No really, who needs Java?

Subscribe to this blog
in a reader
or by Email.

According to the Toronto Sun, an American website breached the publication ban set forth by the Gomery commission (follow the wikipedia link if you don’t know what this is about).

AN AMERICAN website has breached the publication ban protecting a Montreal ad exec’s explosive and damning testimony at the AdScam inquiry. The U.S. blogger raised the ire of the Gomery commission this weekend by publishing extracts from testimony given in secret by Jean Brault last Thursday.

It took me about 60 seconds to find and read the blog in question. I’m not going to help you in any way, except to tell you that it is on the Web out there, and what is on the Web can be found easily, most of the time.

Do publication bans even make sense in the Web era?

In this particular case, having this leak can prove very useful: what if the commission doesn’t lift the ban quickly? What’s the point of a censored public inquiry? I think that the most immediate consequence here is that you can’t keep the information from the public so easily. This is a good thing. Information is freedom.

However, it would have been better for the authors of the leak to keep quiet a few weeks… maybe even a month or so. Individuals have a right to privacy and a fair trial. This is why we had a ban so these people could go to trial without being already guilty by association.

However, the judge should have made the inquiry private at this point. In the information age, you can’t have a secret public inquiry. The judge assumed that only the media can spread information quickly. He is outdated: the blogosphere has far more bandwidth. And you can’t enforce a ban on the blogosphere. You just can’t.

« Previous PageNext Page »

Powered by WordPress