How close are AI systems to human-level intelligence? The Allen AI challenge.

With respect to artificial intelligence, some people are squarely in the “optimist” camp, believing that we are “nearly there” as far as producing human-level intelligence. Microsoft co-founder’s Paul Allen has been somewhat more prudent:

While we have learned a great deal about how to build individual AI systems that do seemingly intelligent things, our systems have always remained brittle—their performance boundaries are rigidly set by their internal assumptions and defining algorithms, they cannot generalize, and they frequently give nonsensical answers outside of their specific focus areas.

So Allen does not believe that we will see human-level artificial intelligence in this century. But he nevertheless generously created a foundation aiming to develop such human-level intelligence, the Allen Institute for Artificial Intelligence Science.

The Institute is lead by Oren Etzioni who obviously shares some of Allen’s “pessimistic” views. Etzioni has made it clear that he feels that the recent breakthroughs of Google’s DeepMind (i.e., beating the best human beings at Go) should not be exaggerated. Etzioni took for example the fact that their research paper search engine (Semantic Scholar) can differentiate between the significant citations and the less significant ones. The way DeepMind’s engine works is that it looks at many, many examples and learn from these examples because they are clearly and objectively classified (we know who wins and who loses a given game of Go). But there is no win/lose label on the content of research papers. In other words, human beings become intelligent in an unsupervised manner, often working from few examples and few objective labels.

To try to assess how far off we are from human-level intelligence, the Allen Institute launched a game where people had to design an artificial intelligence capable of passing 8th-grade science tests. They gave generous prizes to the best three teams. The questions touch various scientific domains:

  • How many chromosomes does the human body cell contain?
  • How could city administrators encourage energy conservation?
  • What do earthquakes tell scientists about the history of the planet?
  • Describe a relationship between the distance from Earth and a characteristic of a star.

So how far are we from human-level intelligence? The Institute published the results in a short paper.

Interestingly, all three top scores were very close (within 1%). The first prize went to Chaim Linhart who scored 59%. My congratulations to him!

How good is 59%? That’s the glass half-full, glass half-empty problem. Possibly, the researchers from the Allen Institute do not think it qualifies as human-level intelligence. I do not think that they set a threshold ahead of time. They don’t tell us how many human beings can’t manage to get even 59%. But I think that they now set the threshold at 80%. Is this because that’s what human-level intelligence represents?

All three winners expressed that it was clear that applying a deeper, semantic level of reasoning with scientific knowledge to the questions and answers would be the key to achieving scores of 80% and beyond, and to demonstrating what might be considered true artificial intelligence.

It is also unclear whether 59% represent the best an AI could do right now. We only know that the participants in the game organized by the Institute could not do better at this point. What score are the researchers from the Allen Institute able to get on their own game? I could not find this information.

What is interesting however is that, for the most part, the teams threw lots of data in a search engine and used information retrieval techniques combined with basic machine learning algorithms to solve the problem. If you are keeping track, this is reminiscent of how DeepMind managed to beat the best human player at Go: use good indexes over lots of data coupled with unsurprising machine learning algorithms. Researchers from the Allen Institute appear to think that this outlines our current limitations:

In the end, each of the winning models found the most benefit in information retrieval based methods. This is indicative of the state of AI technology in this area of research; we can’t ace an 8th grade science exam because we do not currently have AI systems capable of going beyond the surface text to a deeper understanding of the meaning underlying each question, and then successfully using reasoning to find the appropriate answer.

(The researchers from the Allen Institute invite us to go play with their own artificial intelligence called Aristo. So they do have a system capable of writing 8th grade tests. Where are the scores?)

So, how close are we to human-level artificial intelligence? My problem with this question is that it assumes we have an objective metric. When you try to land human beings on the Moon, there is an objective way to assess your results. By their own admission, the Allen Institute researchers tell us that computers can probably already pass Alan Turing’s test, but they (rightfully) dismiss the Turing test as flawed. Reasonably enough they propose passing 8th-grade science tests as a new metric. It does not seem far-fetched to me at all that people could, soon, build software that can ace 8th-grade science tests. Certainly, there is no need to wait until the end of this century. But what if I build an artificial intelligence that can ace these tests, would they then say that I have cracked human-level artificial intelligence? I suspect that they would not.

And then there is a little embarrassing fact: we can already achieve super-human intelligence. Go back in 1975 but bring the Google search engine with you. Put it in a box with flashy lights. Most people would agree that the search engine is nothing but the equivalent of a very advanced artificial intelligence. There would be no doubt.

Moreover, unlike human intelligence, Google’s intelligence is beyond our biology. There are billions of human brains… it makes no practical sense to limit computers to what brains can do when it is obviously more profitable to build machines that can do what brains cannot do. We do not ask for cars that walk like we do or for planes that fly like birds… why would we want computers that think like we do?

Given our limited knowledge, the whole question of assessing how close we are to human-level intelligence looks dangerously close to a philosophical question… and I mean this in a pejorative sense. I think that many “optimists” looking at the 59% score would say that we are very close to human-level intelligence. Others would say that they only got 59% by using a massive database. But we should have learned one thing: science not philosophy is the engine of progress and prosperity. Until we can make it precise, asking whether we can achieve human-level intelligence with software is an endlessly debatable question akin to asking how many angels fit in a spoon.

Still, I think we should celebrate the work done by the Allen Institute. Not because we care necessarily about mimicking human-level intelligence, but because software that can pass science tests is likely to serve as an inspiration for software that can read our biology textbooks, look at experimental data, and maybe help us find cures for cancer or Alzheimer’s. The great thing about an objective competition, like passing 8th-grade science tests, is that it cuts through the fog. There is no need for marketing material and press releases. You get the questions and your software answers them. It does well or it does not.

And what about the future? It looks bright:

In 2016, AI2 plans to launch a new, $1 million challenge, inviting the wider world to take the next big steps in AI (…)

8 thoughts on “How close are AI systems to human-level intelligence? The Allen AI challenge.”

  1. I dislike the phrase “human-level intelligence” because it implies the existence of a simple, objective intelligence metric (which I think Daniel would agree, does not exist).

    A modest proposal: What about the phrase “human-mimic intelligence”? I think this more accurately reflects the questions that sometimes get asked (“humans can do X; can AIs do X?”), and has a nice mildly pejorative ring to my ear.

    1. Right. So I think that “human-level intelligence” is a subjective and needlessly debatable term.

      That’s like saying… “we’ll agree that you have flying machines when they are indistinguishable from birds… otherwise you do not have bird-level flying machines”.

      What we actually want is superhuman intelligence… like how any smartphone can locate the nearest McDonald’s anywhere in the world in seconds and tell you exactly how to get there. There is nothing “human” about it but it is clearly “intelligence”.

      1. I would in turn argue that the focus on super-human intelligence is wrong. For one thing, there is no good way to define intelligence. For another thing, there is no good way to define what super human is. Calculator is super human (Oren Etzioni), so what? Human brains are surprisingly week in some areas, but likewise are surprisingly strong in others.

        A focus on super-human “intelligence” is also wrong, because we should be solving real problems instead. This focus creates incentives to beat humans on some tests, but consequences of these wins are not clear. A calculator beats humans in math, Deep Blue beats human in chess, IBM Watson beats human in Jeopardy, and Alpha Go beats humans in Go. Translating these wins into real life applications turns out to be super complex.

        The reason for this is that we, speaking in machine learning terms, we are overfitting to specific problems, specific data sets, etc. One should be careful not to do so. The real focus should be on real-world problems.

        1. A focus on super-human “intelligence” is also wrong, because we should be solving real problems instead.

          I agree with the second part of your statement.

          As for the first part, I agree with you that it needs care. When I use the term, I refer to technology that extends the capabilities of human beings… but, of course, all technologies do that in a way… starting with the hammer. Maybe I should be more careful with the term.

          This focus creates incentives to beat humans on some tests, but consequences of these wins are not clear.

          Regarding AlphaGo, it did show conclusively that deep learning is a powerful too (for some problems). Regarding the current Allen Institute test, it did show the power of information retrieval. Basically, a finely tuned search engine can nearly pass (59%) science tests.

          Translating these wins into real life applications turns out to be super complex.

          Yes. Thankfully, we have tens of thousands of brilliant engineers on the job.

          The reason for this is that we, speaking in machine learning terms, we are overfitting to specific problems, specific data sets, etc. One should be careful not to do so. The real focus should be on real-world problems.

          The ability of machines to specialize is not necessarily a fault. I like hammers, but I also use screwdrivers.

          Right now, if you have open-ended problems, you need human beings… but we have no shortages of human beings so that’s ok.

          I am sure we will get to a point where the same machine that learned to play Go can play tennis thanks to a robotic body… but I am not sure I care.

          Probably, the machine that maps the route I need to take to get to the dentist has little to do with the machine that tells me about the latest movies… but I don’t need all of these machines to work the same, to be based on the same principles.

          Nature was limited. It could not evolve a brain to make paintings, a brain to hunt, a brain to care for the young… it needed to integrate all functions into one machine. Morever, this machine could not use too much energy and it had to be robust (with respect to injuries, diseases and so forth).

          We are not similarly limited.

          I would add that since we already have the brains we do, the last thing we need are machines that can replace us. Rather, we need to specialized machines that can extend our reach.

          It is not that it would not be interesting to general artificial intelligence… but I think it would be most interesting from a philosophical point of view.

  2. Fei Fei Li of Stanford has said that supervised deep learning (using deep CNNs) is at its best somewhere about the threshold of the intelligence of a 3-year old, as it relates to the large scale visual recognition Imagenet challenge.

    Designing domain-specific AI tailored to pass 8th-grade science tests shouldn’t be as hard considering that the all three winners thought that “a deeper, semantic level of reasoning with scientific knowledge to the questions and answers would be key to achieving scores of 80% and beyond.”

    LSTMs (https://en.wikipedia.org/wiki/Long_short-term_memory) and Google’s n-gram model appear promising but a fundamental problem often overlooked by researchers is that a high degree of semantic level of reasoning requires semantic unambiguity. English is an ambiguous language and anyone who has tried using voice search (Alexa, Cortana, et al) would attest to it.

    My guess is that machine learning would probably take a magic leap forward after a novel English-like unambiguous context-sensitive intermediate mapping language is invented. Here’s an interesting read on Quora: https://www.quora.com/What-is-the-reason-behind-saying-that-Sanskrit-is-the-most-suitable-language-for-programming

    1. supervised deep learning (using deep CNNs) is at its best somewhere about the threshold of the intelligence of a 3-year old

      Deep learning was a key ingredient of AlphaGo, and AlphaGo appears to be far beyond the abilities of a 3-year-old child.

      a high degree of semantic level of reasoning requires semantic unambiguity (…) My guess is that machine learning would probably take a magic leap forward after a novel English-like unambiguous context-sensitive intermediate mapping language is invented.

      I would rather think that we are doing away with formal reasoning as the cornerstone of intelligence.

      Further reading:

      When bad ideas will not die: from classical AI to Linked Data
      http://lemire.me/blog/2014/12/02/when-bad-ideas-will-not-die-from-classical-ai-to-linked-data/

  3. I think the question of an “ai” is, at its core, more like quicker human thinking. if we can get a computer brain to do most of the thinking we do (except much faster) then we humans can focus on the things only a human mind can do. This way, in theory, we would be able to use this technology to advance faster than ever before.
    To evaluate, it’s like using google search and a person to apply information to a situation/task. if it were just the person it would take longer for that person to think of/learn all of the necessary information and apply it to the situation/task. With google search acting like a hive mind and information hub, the information is found and available in seconds. All information known to man is there now. Now this person can take that information to the situation/task almost immediately with the aid of the google search. The task will be completed with higher efficiency than a lone person could. this but on a higher technological scale is what we need from an “ai”

    ps. could quantum mechanics possibly be applied here? as in once we advance quantum mechanic processing.

    1. Well quite frankly, most people think of A.I. as something that ‘feels’ they think of robots like wal-e or Chappie but the real question is , how can we define if a being is sentient if we ourselves do not fully understand what consciousness is? the real task at hand is creating something with a base amount of human input ( the same as natural instincts for us homo sapiens) but then allow the program to write itself. just as our brains react independently to the data that is collected by our sensory organs

Leave a Reply

Your email address will not be published. Required fields are marked *