Toward the Commoditization of Natural Language Processing

In a remarkable paper, Peter Turney shows that using a simple family of algorithms and freely available software, one can determine analogies, synonyms, antonyms, and relations between words automatically. Here is the beginning of the abstract:

Recognizing analogies, synonyms, antonyms, and associations appear to be four distinct tasks, requiring distinct NLP algorithms. In the past, the four tasks have been treated independently, using a wide variety of algorithms. These four semantic classes, however, are a tiny sample of the full range of semantic phenomena, and we cannot afford to create ad hoc algorithms for each semantic phenomenon; we need to seek a unified approach.

I do not work in Natural Language Processing (NLP) per se, but this sounds like commoditization to me in the sense that you no longer need to design, learn and tweak custom algorithms. If you have enough data, you can do NLP after learning one (remarkably simple) family of algorithms. Peter Norvig might approve.

In the database research world, commoditization is already an accomplished fact. Database researchers have been wondering about their relevance for about ten years.

Peter might argue that in such a context, researchers should become bold and daring. Computer Science researchers should choose crazy problems.

Reference: Peter Turney, A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations, Coling 2008 August 2008.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Toward the Commoditization of Natural Language Processing”

  1. Alas this is still grunt work driven by the “competition imperative”, typical of what I criticised as “nifty promising results” for another domain (software) but the basic flaw is the same and has been well stated 25 years ago by Marcel Schoppers:
    “If AI has made little obvious progress it may be because we are too busy
    trying to produce useful systems before we know how they should work.”

    As you say Daniel: Being sane, most researchers work on problem where it is plausible they can make some progress in a few months by working in small increments each day.
    But THIS is “the problem”, not the way to a solution and it stems directly from the rules of publishing (irrespective of the “goodness” of peer reviewing) and from the need for a career.
    Early scientists from Newton to may be somewhere in the middle of XIX century didn’t have so much pressing economic constraints and were able to speculate more freely on abstract questions, not El Naschie way of course, LOL (though… Newton delved in many kooky topics…).

  2. Daniel, thanks for your kind words. My algorithm is only a small increment, as Kevembuangga notes. I believe that science always proceeds by small increments. I give an informal description of the paper here.

Leave a Reply to Peter Turney Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see

You may subscribe to this blog by email.