Disambiguate words using wikipedia

A common problem in information retrieval is that words are ambiguous. That is a fancy way of saying that you cannot tell the meaning of a word when you take it out of context. Some people claim that this problem must be solved by using the Semantic Web. I have long advocated that the Semantic Web is more of a solution in search of a problem.

We already have some good strategies regarding disambiguation, but I have wondered recently why we can’t use wikipedia to disambiguate words. After all, wikipedia knows the difference between Java (the island) and Java (the programming language). It turns out that Google has implemented and patented this very idea!

Bunescu, R. and Pasca, M., Using Encyclopedic Knowledge for Named Entity Disambiguation, EACL-06, 2006.

See? Who needs RDF to disambiguate words?


Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

One thought on “Disambiguate words using wikipedia”

  1. And even more recently:

    Cucerzan, S. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. EMNLP-CoNLL Joint Conference. Prague, 2007.

    (Silviu Cucerzan works for Microsoft)

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax