Â Many researchers advocate the use of metadata to help find or recommend content automatically. Metadata is certainly useful when aggregating content for human beings: I first read the titles of research papers before reading them.Â However, machines do better when they access at least some of contentÂ Â (Lin, 2009). Moreover, metadata is of little value in ranking answers (Hawking and Zobel, 2007).Â
I think that researchers cling to metadata because that is how we have indexed books for so long. When I was a kid, full text searches in a library was unthinkable. Yet, there is no escape:Â everything is miscellaneous. Folksonomies and ontologies will not save the day. When working with machines, let go of metadata and embrace the full content.
I am particularly puzzled by a common research approach. Take an object. Extract metadata. Then compare objects between themselves using the metadata, or use the metadata for retrieval. I understand that this may constitute a useful form of dimensionality reduction. Yet, researchers frequently omit to check whether it is necessary to extract metadata at all.
- David HawkingÂ andÂ Justin Zobel,Â Does topic metadata help with Web search?Â Journal of the American Society for Information ScienceÂ 58 (5), 2007.
- Jimmy Lin, Is searching full text more effective than searching abstracts? BMC Bioinformatics 2009, 10:46, 2009.
Credit:Â Thanks to Andre Vellino for motivating this post.