The buzz is all about tags these days. Tagyu is an interesting tool which claims to suggest tags based on the text content of the page. I’d like to see a description of the algorithm, but I see none.
- http://www.daniel-lemire.com/ gets the tags “firefox” “web2.0”.
- http://www.daniel-lemire.com/en/ gets the tag “job”.
- http://www.daniel-lemire.com/fr/ gets the tags “france” and “uqam”.
It seems like the tags for my blog make sense, but the tags for my home pages (French and English) are really bad. Tagging my French home page with “france”? Maybe because I use the French language? It is a bit of a stretch. Tagging my English home page with “job”? No. I don’t think so.
The problem is interesting and I bet there are solid solutions, but we are not there yet.
I also question whether collaborative tags have a future. I must admit I don’t use them, so I won’t comment much further, but it is a bit too empirical for my taste.
http://aida.homelinux.net/wordpress/ gets the tags: itunes, drm, linux, music. My homepage doesn’t mention any of these; I’m not impressed.
Tagyu doesn’t do a great job on home pages because they are “about” too many things. Blogging home pages typically contain several subjects on one page, and that can confuse the tool.
Tagyu works by finding documents similar to your text and determining how they have been tagged by others. The tags suggested come from the tags on related documents.
If I send the text of this blog entry to Tagyu, then I get the following tags:
tools tagging blog tags del.icio.us
Hi Mr. lemire,
I think that a part of the problem is the way tags are handled by some systems and not tag in themselves.
First, if we can consider that tags are the main topics of a digital document (text, video, etc). Then, some tags (between 2 to 8) will describe the meaning of the content of that document. They could be keywords present in the document or semantically related terms in relation with that document.
This said, systems that handle tags would have to check the meaning, the semantic links, between the tags used to describe a digital document to know what they are really describing.
salutations,
Frédérick.