Artificial intelligence is mostly a matter of engineering?

Unless you live under a rock, you should know by now that AlphaGo, an artificial intelligence, has beaten a world champion at the game of Go. After Tic Tac Toe, Checker and Chess, Go was the last conventional board games where computers could not beat the best human beings. We have made history!

John Langford reminds us that defeating human beings at Go is not the same thing as achieving, generally, a human-level intelligence. Possibly, the AlphaGo software could be used to play a good game of Chess, but there are other simpler games that are easy for human beings to master, that AlphaGo-like software would find difficult.

Real progress often looks mundane in retrospect. For example, we have improved our health tremendously in the last few decades by smoking less. That does not look like a technological breakthrough, but to the scientists who had to built a case against smoking using science, I am sure it feels like a victory for science.

So how do we assess AlphaGo? Is it real progress or just hype?

Let us put aside the fact that there is clearly hype involved. It is a publicity stunt for Google. Starting today, they will have a much easier time recruiting the best researchers. The value of their brand went up.

But is there any meat?

To put things in perspective, let us recall what the New York Times was telling us in 1997:

When or if a computer defeats a human Go champion, it will be a sign that artificial intelligence is truly beginning to become as good as the real thing. (NYT 1997)

But how did AlphaGo succeed?

AlphaGo uses a lot of hardware. According to their Nature paper:

Evaluating policy and value networks requires several orders of magnitude more computation than traditional search heuristics. To efficiently combine MCTS (Monte Carlo Tree Search) with deep neural networks, AlphaGo uses an asynchronous multi-threaded search that executes simulations on CPUs, and computes policy and value networks in parallel on GPUs. The final version of AlphaGo used 40 search threads, 48 CPUs, and 8 GPUs. We also implemented a distributed version of AlphaGo that exploited multiple machines, 40 search threads, 1,202 CPUs and 176 GPUs. The Methods section provides full details of asynchronous and distributed MCTS.

Miles Brundage wrote a critical analysis of the more recent incarnation of AlphaGo:

AlphaGo (…) used 280 GPUs and 1920 CPUs. This is significantly more computational power than any prior reported Go program used, and a lot of hardware in absolute terms.

Deep Blue, the system that defeated Kasparov, had 11 GFLOPS whereas a modern iPhone has close to 200 GFLOPS. A single GPU delivers today can deliver about 7000 GFLOPS. So AlphaGo has computing capabilities that are maybe hundreds of thousands of times what Deep Blue had.

This is aside from all the hardware used to train the software. Don’t expect a JavaScript version of AlphaGo to run in your browser and to do well any time soon. AlphaGo runs on powerful hardware and makes full use of it.

Simply put, we should not be surprised if we get qualitatively different results when we throw hundreds of time more power behind it.

If “all” it takes to build superhuman intelligences is more hardware… and the ability to use it… then it is good news. Though hundreds of GPUs and thousands of CPUs is a lot, Google, Amazon, Microsoft, the NSA, Apple… could all throw a lot more power at the problem.

Does it make sense that more computation power, coupled with relatively simple algorithms, could be the path to superhuman intelligence? Naively, I’d ask why not?

It looks like all we need is the right hardware, a regular team of great scientists and engineers, and a few short years. No need for great unpredictable conceptual breakthroughs. No need for thousands of people.

I predict that if researchers today could look at the software of our personal assistants in the future, they would be disappointed. “That’s all it does? We knew about most of these techniques in 2016…”

Update: At its annual conference in May 2016, Google announced that AlphaGo has been running on custom hardware, the Tensor Processing Unit, which is reportedly 10 times faster than anything else on a per-Watt basis.

37 thoughts on “Artificial intelligence is mostly a matter of engineering?”

    1. @Evan

      There are a few issues to consider.

      1. When Hassabis says “one machine”, he does not mean “one processor”. He probably refers to a machine worth the price of dozens of commodity PCs, at least.

      2. How much hardware was used during the training phase?

      1. What is even more astounding is the team of 100+ PhDs (5x the size of IBM Watson team) that worked on this highly specialized problem. If deep learning is so easy, why would you need so many people?

        1. As someone working at a genomics institute, that doesn’t surprise me at all. The real world is messy, and 100+ PhDs isn’t that much. When you do something for the first time, it may involve a lot of routine work, which still requires deeper understanding of the problem.

              1. Doesn’t this conflict with your usual “no genius” stance?

                Technology acts as a multiplier. My favorite example is Greg Linden. He implemented Amazon’s recommender system (“If you like this book, you might like…”). Greg is in the top 1% of engineers… very smart… but he does not have magical powers. What he implemented is relatively simple, and it was probably just a challenge because of the scope involved at Amazon.

                Now, what happened? Well. It changed the world. We all know about this feature. It has changed how we think of e-Commerce.

                So yes, I would say that there is strong evidence that a few smart people can use software to make a huge difference in the world… but they don’t need to be Einstein-like geniuses. Hard work, a solid education and lots of focus is probably all that is needed.

            1. Machine learning tends to be more about the data than the software. Instead of having a well-defined problem to solve, you start with only a general idea of what you’re trying to achieve. You then explore various interpretations of the data and different approaches to solving the problem, tweaking the solutions until you have something good enough. To me, that looks more like a biological problem than a typical computer science problem.

          1. I counted 21 on this paper, not counting the founder! You don’t take into account the prior art either (e.g., the mentioned Monte Carlo tree search). Actually, the same was true for Deep Blue and IBM Watson (lots of prior art). Hundreds of PhDs making advances over dozens of years for a highly specialized AI task. But, yeah, what if AI is just a matter of computational power? Well, this may be true, but there is no evidence for this yet.

            1. I counted 21 on this paper, not counting the founder!

              I count 20 authors but that’s the whole team. There is then a division of labor (described in the manuscript). Some wrote the search routines, some wrote the deep learning software, some setup the testing framework, some worked on the papers…

              You don’t take into account the prior art either (e.g., the mentioned Monte Carlo tree search).

              According to Wikipedia Monte Carlo Search goes back to the 1940s… https://en.wikipedia.org/wiki/Monte_Carlo_tree_search#History

              They also used deep learning which dates back the 1980s…
              https://en.wikipedia.org/wiki/Deep_learning

              Of course, they use the modern version that relies on GPU computing to be practical… but by now, it is not new.

              By their own accounts, they have used well known techniques coupled with superb engineering and good hardware.

              Hundreds of PhDs making advances over dozens of years for a highly specialized AI task.

              I think that’s precisely what did not happen here. Here is what happened. A small team (~20 people) worked from scratch over two years… using mostly standard machine-learning expertise, plus lots of expensive hardware, plus some of their own domain knowledge… and they cracked the problem.

              I think you are selling DeepMind short here. It is not a company set out to solve Go. They want to solve AI. All of it.

              To solve AI, all of it, you can’t solve every specialized problem using specific solutions and hundreds of PhDs. You need generic tools that are widely applicable.

              They clearly mean to solve various other games, health problems and so forth… using very similar techniques.

              But, yeah, what if AI is just a matter of computational power? Well, this may be true, but there is no evidence for this yet.

              My concluding statement is : If “all” it takes to build superhuman intelligences is more hardware… and the ability to use it… then it is good news.

              You know me well enough, Leo, to know that I know that it is hard to make good use of computing resources. Using thousands of CPUs and hundreds of GPUs, and using them well is hard.

              These 20 people are no doubt amazing people.

              But the point is still that with a small team (20 is not large), and the right tools is all you need.

              This should not surprise us. It did not take 2000 engineers to invent the GPS, the transistor, the plane, the modern engine, the car, the computer… Once you have right tools, enough resources and political support, inventions fall into place.

              It appears that AI is following that pattern. When we barely had the computers to solve Chess, it happened. We now have barely the computing ressources to solve Go, and it happened. And so forth.

              Let me qualify. I don’t mean that “I” could have solved Go the way they did given the computing resources. But there are many human beings, lots of smart people… given the possibilities, some team, somewhere, is bound to get it done… as long as we encourage it.

              To be clearer, had the DeepMind team been killed, probably Facebook would have solved Go next year or the year after that.

              I think it is precisely why we should worry about the end of Moore’s law. If our software performance stalls, it is possible that our computing technology would stall as well. We need to push forward.

              In other words, performance and engineering matter, a great deal.

              1. I am not shortselling anybody or anything. I am just reminding you what John Langford reminded us: good performance in Go crucially depends on the effectiveness of the Monte Carlo tree search. This was found by trial and error over a period of many years. It was verified only recently, but before AlphaGO was created.

                For one thing, one should give credit to people who invented the algorithm AND demonstrated its effectiveness in Go.

                For another thing, it has nothing to do with hardware (though having more hardware obviously helps here).

                This is just what John Langford said.

                Another problem with your post is that it reads like: being clever doesn’t matter, we can just have more hardware. I have to disagree here again, because, clearly, all the widely publicized milestones (Chess, Jeopardy, GO) were about being clever: call it engineering or whatever.

                In fact, when you say that AlphaGO is just the result of engineering, it is you who are selling them short. I am pretty sure it was a lot of hard core research, not just engineering.

                1. (…) good performance in Go crucially depends on the effectiveness of the Monte Carlo tree search. This was found by trial and error over a period of many years. It was verified only recently, but before AlphaGO was created.

                  Yes. But I think nobody had combined MCTS and DeepLearning. At least, not in the way AlphaGo did it. As far as I can tell, it was non-obvious, but also not entirely counter-intuitive, insight.

                  For another thing, it has nothing to do with hardware (though having more hardware obviously helps here). This is just what John Langford said.

                  I am not sure what in Langford’s article you refer to.

                  Deep Blue, the system that defeated Kasparov, had 11 GFLOPS whereas a modern iPhone has close to 200 GFLOPS. A single GPU delivers today can deliver about 7000 GFLOPS. So AlphaGo has computing capabilities that are maybe hundreds of thousands of times what Deep Blue had.

                  Is your contention that the Deep Blue team could have defeated the best Go players had they been cleverer using only 11 GFLOPS?

                  Another problem with your post is that it reads like: being clever doesn’t matter, we can just have more hardware. I have to disagree here again, because, clearly, all the widely publicized milestones (Chess, Jeopardy, GO) were about being clever: call it engineering or whatever.

                  The DeepMind people were clever. The people behind Watson and Deep Blue were clever. No doubt about that. But they couldn’t have done what they did with ENIAC.

                  Watson became possible at the point where having tens of thousands of GFLOPS doesn’t risk bankrupting IBM. Not before.

                  I am pretty sure it was a lot of hard core research, not just engineering.

                  If by hard-core you mean “academic publication focused research” then the answer is no.

                  Here are the DBLP pages of the founders of DeepMind:

                  http://dblp.uni-trier.de/pers/hd/l/Legg:Shane

                  http://dblp.uni-trier.de/pers/hd/h/Hassabis:Demis

                  http://dblp.uni-trier.de/pers/hd/s/Suleyman:Mustafa

                  1. The saying about Langford got mixed up. I didn’t mean to claim he said anything about hardware.

                    Hardware is a necessary condition, but it is not a sufficient one. For example, IBM Watson worked on a single computer, but it took too long to answer.

                    >If by hard-core you mean “academic >publication focused research” then the answer is no.

                    I am not sure hard-core research and academic publications are always equal. Often, it is true, but not always.

                    1. Hardware is a necessary condition, but it is not a sufficient one. For example, IBM Watson worked on a single computer, but it took too long to answer.

                      You can get AlphaGo to run on a Raspberri Pi or a 386. I mean, it is all about Turing machines, right?

                      But hardware matters a great deal.

                      You know how it is Leo. When you are programming, you need to test your ideas out. If it takes forever to test the simplest idea, progress is going to be slow.

                      If your progress is too slow, the project will die. Either you will get discouraged, or people will stop funding you or… something more interesting will come along.

                      The fact is, it is really hard to be “ahead of your time”.

                      This is just an extension of “the medium is the message”. In theory, the Internet is nothing new. You write text, other people read it. So there is nothing, on the surface, that we can do with the Internet that we couldn’t do before. I mean, we could be exchanging letters right now.

                      The fact that things get easier means that we can start working on new problems that were too hard before.

                      Of course, it is not just the hardware. You need to have the engineering talent to use it. You need the resources, the encouragement, the courage, the focus and so forth. But the starting point is access to good tools.

              2. While using “commodity vector” processors (GPUs) and Intel CPUs is helpful I would argue that the chief challenge for AI generally is that the hardware platform is insufficiently plastic. A key facet of the brain is that it changes over time in structure.

                Thankfully Intel has invested in Altera which could result in a general processing environment which is both fixed (IA) and plastic (FPGA). It is going to take some work to make FPGAs be more general programming friendly and the notion of computing in space (FPGAs) vs. computing in time (CPUs) is something that will have to be contemplated and worked on for some time.

                With a more plastic processing model I suspect that we’d see even more emergent properties of a “AI” game-master and perhaps even the ability to allow something the scale of a Rasberry Pi to be Go or Chess champion.

                1. As far as I can tell FPGA are not much faster than GPUs and CPUs, but they can be ten times more energy efficient. At scale, this can matter a great deal, I suppose.

                  I do not know whether they have other benefits.

          2. PS: I am not implying that this wasn’t a great feat, not even close. But claims: intelligence is solved now and you AI is merely more computing power… Excuse, but I don’t see these claims substantiated.

            1. intelligence is solved now

              No. Intelligence isn’t solved. We can’t even do as well as a bee using computer vision.

              What I am suggesting is that we may have lots of what we need to build drones that are just as smart as bees. We may not need so many conceptual breakthrough. We may just need more focus on good engineering.

        2. Great point! Deep learning is not easier, it just gives you larger and more powerful building blocks. But the number of engineers involved does not seem to decrease.

          1. @Benoit

            AlphaGo was built by 20 engineers over 2 years.

            There are many open source projects, including some I have been involved with, that have multiple times this number of engineers.

      2. Agreed! Certainly the single machine is still expensive and beefy, but we also know how much harder it is to scale vertically vs. horizontally. I just thought it was interesting that the utility of adding more hardware diminished that quickly, especially considering that as Google they likely have access to as much hardware as they possibly want.

        1. I think that it might be an unfair comparison. It is probable that all instances of AlphaGo benefit from some of the same training. So it is not like one team worked with a single machine all along while another team had all of Google’s resources… All instances benefited from tremendous computational resources.

  1. There was a conceptual breakthrough, namely Monte Carlo Tree Search in 2006. Without it, you could be using the entire Google data center and still not make a dent in the problem.

      1. Monte Carlo is indeed much older, but Rémi Coulom is the one who made it work for tree search and for Go in 2006. I believe this is widely acknowledged, and that he deserved to be on stage in Seoul with the DeepMind team today.
        Back to your point about general intelligence, I agree with you that it may be mostly a matter of having enough computational resources, but I also believe that some conceptual breakthroughs are still needed. Maybe not “100 Nobel prizes away” as some put it, but still a few at least.

        1. Given that all of the current AI is a bag of tricks that work in specific and very limited environments, we may be even 1000 Nobel prizes away (in terms of the number of breakthroughs, not years). Likely, we will be slowly gaining speech, vision, and language processing capabilities in a very incremental way, by inventing new and new tricks.

          But it is impossible to know for sure because we don’t know anything about the brain (or almost anything). With new evidence comes understanding that brains are much more complicated that it was previously thought. One recent example, but there are many more: http://news.discovery.com/human/life/memory-10-times-more-massive-than-thought-160121.htm
          I wouldn’t be surprised to learn that in 10 years, they discover the brain can hold 10x of what we think it can remember today.

          1. Given that all of the current AI is a bag of tricks that work in specific and very limited environments, (…)

            I think I could program an application that looks at pictures and yells “butterfly” when it sees a butterfly in, probably, less than an hour, using nothing but JavaScript and publicly available APIs and librairies. And my application would work well too. It could even yell “butterfly” in the language of your choice.

            Let me add to this. I think that a smart high school students could do the same application in about the same time, if he has learned to program a bit beforehand.

            Of course, if I need to recognize the flowers from my garden by name, that’s a bit more difficult. I don’t know how to do that, but give me a team of 20 great engineers and two years… and I bet I can write an application that can satisfy paying customers.

            Likely, we will be slowly gaining speech, vision, and language processing capabilities in a very incremental way, by inventing new and new tricks.

            I think that what you call “tricks”, I might call “engineeering”.

            I am willing to bet with you that within 10 years, we will have human-level speech recognition. And that almost all of it will be achieved using techniques that are known today.

      2. To be fair, the credit assignment for the contributions is a bit more complex. Some steps I am personally aware of:
        1. UCB algorithm (bandits = finite stochastic optimization) (Finite-time Analysis of the Multiarmed Bandit Problem, Auer, Cesa-Bianchi, Fischer, 2002).
        2. UCT algorithm as an extension of UCB (generalization to game-tree search) (Bandit based Monte-Carlo Planning, Kocsis, Szepesvari, 2006).
        3. UCT successfully applied to Go (Remi Coulon).

        I think each step is a non-trivial / non-obvious extension of the previous one and thus deserves recognition.

        Note also that the first paper is a theory paper with a non-intuitive, while simple algorithm. It has had impact in practice (you can actually find its key idea almost unchanged in the AlphaGo Nature paper) by making people aware that simple explore-exploit techniques (like constant/local exploration) are sometimes bound to fail.

  2. The simple mention of raw computing power in today’s phones versus that of Deep Blue can be misleading for some readers. I think we should not forget the power requirements to sustain such computing power over extended periods of time and the resulting efforts in cooling technology. Today’s phones may have that much raw computing power but they are still not there in terms of power delivery and cooling technologies.

  3. Pingback: Quora
  4. It’s hard to interpret an AI triumph if you want to know how close we are to surpassing “natural” intelligence. We see ourselves as smart, but we know humans didn’t evolve to play go. We may well be quite bad at it. I suspect you’re better off examining animals where we do have an idea of what they evolved to do, and there AI can look pretty miserable.

    Small insects seem to crush our efforts in AIs for robotics. I don’t know how much power a bee brain has, but it can’t be much. And yet, they can fly, pathfind, recognize faces, communicate, dance, and work together to build their impressive hives. That kind of capability still seems a way off, especially in such a tiny package.

    1. Small insects seem to crush our efforts in AIs for robotics. I don’t know how much power a bee brain has, but it can’t be much. And yet, they can fly, pathfind, recognize faces, communicate, dance, and work together to build their impressive hives.

      Right. I use bees as an example myself a lot. Our autonomous drones do a lot of what bees do, but using many more tricks and seemingly less intelligence. I usually get criticize for saying so, but I think it is true… we can’t even mimic the intelligence of a bee on a supercomputer.

      That kind of capability still seems a way off, especially in such a tiny package.

      Up until recently, an autonomous drone would cost hundreds of thousands of dollars if not more. You can now purchase one on amazon for less than $2000. If you live in a big city, you have probably seen athletes running around followed by an autonomous drone filming them.

      Many of us have semi-autonomous cleaning robots in our homes. They are outrageously silly and loud… but kettles started out this way too.

      Early days, of course, but I would not bet against autonomous-drone technology at this point in time.

Leave a Reply

Your email address will not be published. Required fields are marked *