Post-Blade-Runner trauma: From Deep Learning to SQL and back

Just after posting my review of the movie Blade Runner 2049, I went to attend the Montreal Deep Learning summit. Deep Learning is this “new” artificial-intelligence paradigm that has taken the software industry by storm. Everything, image recognition, voice recognition, and even translation, has been improved by deep learning. Folks who were working on these problems have often been displaced by deep learning.

There has been a lot of “bull shit” in artificial intelligence, things that were supposed to help but did not really help. Deep learning does work, at least when it is applicable. It can read labels on pictures. It can identify a cat in a picture. Some of the time, at least.

How do we know it works for real? It works for real because we can try it out every day. For example, Microsoft has a free app for iPhones called “Seeing AI” that lets you take arbitrary pictures. It can tell you what is on the picture with remarkable accuracy. You can also go to deepl.com and get great translations, presumably based on deep-learning techniques. The standard advice I provide is not to trust the academic work. It is too easy to publish remarkable results that do not hold up in practice. However, when Apple, Google and Facebook start to put a technique in their products, you know that there is something of a good idea… because engineers who expose users to broken techniques get instant feedback.

Besides lots of wealthy corporations, the event featured talks by three highly regarded professors in the field: Yoshua Bengio (Université de Montréal), Geoffrey Hinton (University of Toronto) and Yann LeCun (New York University). Some described it as a historical event to see these three in Montreal, the city that saw some of the first contemporary work on deep learning. Yes, deep learning has Canadian roots.

For some context, here is what I wrote in my Blade Runner 2049 review:

Losing data can be tragic, akin to losing a part of yourself.

Data matters a lot. A key feature that makes deep learning work in 2017 is that we have lots of labeled data with the computers to process this data at an affordable cost.

Yoshua Bengio spoke first. Then, as I was listening to Yoshua Bengio, I randomly went to my blog… only to discover that the blog was gone! No more data.

My blog engine (wordpress) makes it difficult to find out what happened. It complained about not being able to connect to the database which sent me on a wild hunt to find out why it could not connect. Turns out that the database access was fine. Why was my blog dead?

I carried with me to the event my smartphone and an iPad. A tablet with a pen is a much better supporting tool when attending a talk. Holding a laptop on your lap is awkward.

Next, Geoffrey Hinton gave a superb talk, though I am sure non-academics will think less of him than I do. He presented recent, hands-on results. Though LeCun, Bengio and Hinton supposedly agree on most things, I felt that Hinton presented things differently. He is clearly not very happy about deep learning as it stands. One gets the impression that he feels that whatever they have “works”, but it is not because it “works” that it is the right approach.

Did I mention that Hinton predicted that computers would have common-sense reasoning within 5 years? He did not mention this prediction at the event I was at, though he did hint that major breakthroughs in artificial intelligence could happen as early as next week. He is an optimistic fellow.

Well. The smartest students are flocking to deep learning labs if only because that is where the money is. So people like Hinton can throw graduate students at problems faster than I can write blog posts.

What is the problem with deep learning? For the most part, it is a brute force approach. Throw in lots of data, lots of parameters, lots of engineering and lots of CPU cycles, and out comes good results. But don’t even ask why it works. That is not clear.

“It is supervised gradient descent.” Right. So is Newton’s method.

I once gave a talk about the Slope One algorithm at the University of Montreal. It is an algorithm that I designed and that has been widely used in e-commerce systems. In that paper, we set forth the following requirements:

  • easy to implement and maintain: all aggregated data should be easily interpreted by the average engineer and algorithms should be easy to implement and test;
  • updateable on the fly;
  • efficient at query time: queries should be fast.

I don’t know if Bengio was present when I gave this talk, but it was not well received. Every point of motivation I put forward contradicts deep learning.

It sure seems that I am on the losing side of history on this one, if you are an artificial intelligence person. But I do not do artificial intelligence, I do data engineering. I am the janitor that gets you the data you need at the right time. If I do my job right, artificial intelligence folks won’t even know I exist. But you should not make the mistake of thinking that data engineering does not matter. That would be about as bad as assuming that there is no plumbing in your building.

Back to deep learning. In practical terms, even if you throw deep learning behind your voice assistant (e.g., Siri), it will still not be able to “understand” you. It may be able to answer correctly to common queries, but anything that is unique will throw it off entirely. And your self-driving car? It relies on very precise maps, and it is likely to get confused at anything “unique”.

There is an implicit assumption in the field that deep learning has finally captured how the brain works. But that does not seem to be quite right. I submit to you that no matter how “deep” your deep learning gets, you will not pass the Turing test.

The way the leading deep-learning researchers describe it is by saying that they have not achieved “common sense”. Common sense can be described as the ability to interpolate or predict from what you know.

How close is deep learning to common sense? I don’t think we know, but I think Hinton believes that common sense might require quite different ideas.

I pulled out my iPad, and I realized after several precious minutes that the database had been wiped clean. I am unsure what happened… maybe a faulty piece of code?

Because I am old, I have seen these things happen before: I destroyed the original files of my Ph.D. thesis despite having several backup copies. So I have multiple independent backups of my blog data. I had never needed this backup data before now.

Meanwhile, I heard Yoshua Bengio tell us that there is no question now that we are going to reach human-level intelligence, as a segue into his social concerns regarding how artificial intelligence could end up in the wrong hands. In the “we are going to reach human-level intelligence”, I heard the clear indication that he included himself has a researcher. That he means to say that we are within striking distance of having software that can match human beings at most tasks.

Because it is 2017, I was always watching my twitter feed and noticed that someone I follow had tweeted about one of the talks, so I knew he was around. I tweeted him back, suggesting we meet. He tweeted me back, suggesting we meet for drinks upstairs. I replied back that I was doing surgery on a web application using an iPad.

It was the end of the day by now, everyone was gone. Well. The Quebec finance minister was giving a talk, telling us about how his government was acutely aware of the importance of artificial intelligence. He was telling us about how they mean to use artificial intelligence to help fight tax fraud.

Anyhow, I copied a blog backup file up to the blog server. I googled the right command to load up a backup file into my database. I was a bit nervous at this point. Sweating it as they say.

You see, even though I taught database courses for years, and wrote research papers about it, even designed my own engines, I still have to look up most commands whenever I actually work on a database… because I just so rarely need to do it. Database engines in 2017 are like gasoline engines… we know that they are there, but rarely have to interact directly with them.

The minister finished his talk. Lots of investment coming. I cannot help thinking about how billions have already been invested in deep learning worldwide. Honestly, at this point, throwing more money in the pot won’t help.

After a painful minute, the command I had entered returned. I loaded up my blog and there it was. Though as I paid more attention, I noticed that last entry, my Blade Running 2049 post, was gone. This makes sense because my backups are on a daily basis, so my database was probably wiped out before my script could grab a copy.

What do you do when the data is gone?

Ah. Google creates a copy of my post to serve them to you faster. So I went to Twitter, looked up the tweet where I shared my post, followed the link and, sure enough, Google served me the cached copy. I grabbed the text, copied it over and recreated the post manually.

My whole system is somewhat fragile. Securing a blog and doing backups ought to be a full-time occupation. But I am doing ok so far.

So I go meet up my friend for drinks, relaxed. I snap a picture or two of the Montreal landscape while I am at it. Did I mention that I grabbed the pictures on my phone and I immediately shared them with my wife, who is an hour away? It is all instantaneous, you know.

He suggests that I could use artificial intelligence in my own work, you know, to optimize software performance.

I answer with some skepticism. The problems we face with data engineering are often architectural problems. That is, it is not the case that we have millions of labeled instances from which to learn from. And, often, the challenge is to come up with a whole new category, a whole new concept, a whole new architecture.

As I walk back home, I listen to a podcast where people discuss the manner in which artificial intelligence can exhibit creativity. The case is clear that there is nothing magical in human creativity. Computers can write poems, songs. One day, maybe next week, they will do data engineering better than us. By then, I will be attending research talks prepared by software agents.

As I get close to home, my wife texts me. “Where are you?” I text her. She says that she is 50 meters away. I see in the distance, it is kind of dark, a lady with a dog. It is my wife with her smartphone. No word was spoken, but we walk back home together.

12 thoughts on “Post-Blade-Runner trauma: From Deep Learning to SQL and back”

  1. It looks like we are actually very far. I don’t think, e.g., that there is a single implementation of an end-to-end dialog system that learned to do useful things. For example, how do you train a system that will book hotels, flights, rental cars, shop on Amazon etc… It needs to read forms, fill them and do a lot of things. If humans can decompose these tasks into much smaller one that heavily rely on classification, collect and clean data, architecture training pipelines, only then deep learning comes into play.

  2. Lovely post.

    It’s interesting, but exactly the same thing could be said of humans:
    – they are not easy to implement and maintain: their aggregated data cannot be easily interpreted by the average engineer and algorithms (if any) are not easy to implement and test;
    – they are not updateable on the fly;
    – they are not efficient at query time: queries should be fast.

    By contrast, humans *do* satisfy the two criteria you dropped from the original version of this post:
    – they work with little data if needed;
    – they are accurate within reason.

  3. The hype is heavy indeed. My framework for thinking about this is closed-world problems vs open-world problems.
    Machine learning is closed world, artificial intelligence, is, by definition, open-world: dictionary defines intelligence as “ability to react in new situations”.

    Closed-world problems (like chess or go) can and will be beaten into submission through massive power learning very wiggly models – the problem space will be sufficiently densely explored so that it becomes an interpolative ML problem dealing with “old situations”.

    Open-world problems (like driving a car, or designing a meaningful data warehouse schema) will not be solved in the foreseeable future. They are extrapolative, *-intelligence problems with a large “new situation” content. The best we have on “general AI” is Douglas Hofstadter stuff, really.

    The open-world character in conjunction with the number of 9s required is one of the reasons self-driving cars will not be in-production for another at least 10 years.

  4. I’m happy to see that you didn’t mention IBM in the list of companies. Extraordinary disclaimer: I work for a company that would like to be like IBM. I have a PhD in AI, but not in machine learning, so maybe it’s not AI by current standards/hype.

    Laptops are made for laps. Awkwardness is probably the most salient characteristic of academics.

    My not-as-educated-as-I’d-like-to guess is that DL works, but it’s not the end of the road for AI. I hope the next steps don’t get patented…

    Relevant: https://xkcd.com/1838/

    If you want to optimize performance in data engineering, put a bit of old AI (logic) and you will have a lot of optimization to do, to make it work again.

    WRT AGI, IMHO it’s the 90% syndrome. The data used is becoming bigger and bigger, we may not be getting closer at all. On the negative side, AI would be the last problem that humanity needs to solve. On the positive one, we still have to figure out many things before that day, like a way for economic transition.

  5. Nice mind-wandering post! It was nice to meet you [again] in a summit somewhere.!

    One thing that always amaze me with deep learning is the fact that, as you mention it as well, is so brute force. What an awful amount of joules and compute dollars wasted over and over just replicating existing experiments. Can we share theses hidden layers? Can we share theses building blocks? Currently we are directly aimed at an oligopoly where only big corporations or select few research centers can afford to train theses architectures.

    The end goal of that always seems to be human-level intelligence but are we gonna be able to reach it in a world of limited resources?

    1. Michel: When I started my post, I was planning to include our meeting… it would have made the story even nicer… maybe next time! We ought to coordinate to make sure to meet again!

      Regarding power usage… We cannot build a drone that can fly as well as a bee. Yet we know that it is possible to do so while using very little power (a bee’s brain uses barely any power at all) and very little training. This suggests that scaling up will require more than just brute force.

  6. I strongly recommend Hinton’s talk on what is wrong with convolutional neural nets:

    (https://www.youtube.com/watch?v=rTawFwUvnLE)

    The essential complaint is “yes, it kind of works, if you have enough data, but we need an approach that is not so data hungry”. He has a long, detailed and compelling example about vision, and shows what it would take to build something that has better long-term prospects, at least for vision. Part of the argument is that people do not need to see an object from all kinds of different angles before learning to recognize it, but current machine learning algorithms do need this. Something is being missed, and Hinton has a suggestion for what this is.

    The reason I like this talk so much is that Hinton really knows the biology as well as the engineering, thinks deeply about the task, and has no need to indulge in hype.

Leave a Reply

Your email address will not be published. Required fields are marked *