Yahoo! to exploit more metadata

Long ago, search engines stopped using the metadata available in the header of HTML pages, because people would lie or enter misleading data by mistake. Many web sites still provide Dublin Core metadata as part of their HTML, but this data is known to be misleading, incomplete and wrong. There is no evidence that metadata can enhance search. Period.

Nevertheless, Yahoo ! announced that it is going to enhance its search results with RDF metadata. They give linkedin as an example: apparently, linkedin pages are filled with metadata waiting to be exploited. Using this metadata is great idea because linkedin can be trusted. Some other things would make sense, like GeoRSS. It would be great to know where some pages say they live.

Extracting metadata from one trusted web site is one thing. Exploiting the metadata out there is another.

A few things should be pointed out:

  • As far as I can tell, Yahoo! is not talking about using metadata to improve its result sets in general. It would fail. They merely want to better describe the links found and maybe provide specialized services. If I were them, I would go around and entice various important web sites (amazon to begin with!) to provide more trusted metadata. They probably have been doing just that.
  • Beside some specific instances, I do not see how it will make their search engine better than Google. No matter what, the vast majority of web sites will contain no metadata, or wrong metadata.
  • There is no talk of non-trivial inference engines. Yahoo! still won’t be able to tell you whether G. W. Bush is a drunk or not.
  • Graduate students worldwide, stay calm. I could not find one occurrence of the word ontology in Yahoo!’s post. They are talking about RDF, not OWL. So you can stop describing the whole world in a RDF graph.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

One thought on “Yahoo! to exploit more metadata”

  1. Hello Daniel,

    If I can add to your post, I think we all have the intuition that metadata *can* be useful. If there’s no clear success so far, I like to believe that it is not because it is useless.

    I think metadata will not improve significantly Yahoo “ranking” function for Web pages. However, there’s clearly something at the level of “query parsing”.

    Try typing “montreal restaurant” in Google. One of the first hit is a map. It means that Google understands that “montreal” is a location and “restaurant” is a business. It may look simple but Query Parsing is a nightmare! Ambiguity is everywhere and there’s no context to leverage in a query. However, take a look at the series of patent by Google, Yahoo, Microsoft etc. about local search. It gives an idea of how they tackle problem.

    In the case of this new initiative by Yahoo, I think the hope is to use similar techniques. And at some point, if someone type the query “Product manager in San Francisco”, Yahoo could put the LinkedIn profile of Amit Kumar in the hit list.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see