Science and Technology links (October 20th, 2017)

John Carmack is a famous game designer. In a recent interview, he invited programmers to shy away from trying to build more realistic games in virtual reality, because he expects that hardware capabilities will not keep up.

Fraction of Americans who are obese: 36.5%. And it keeps on rising.

Brian Wansink is a famous Cornell professor who works on nutrition and gets cited thousands of times per year in academic journals. Obviously, his fellow researchers think very highly of him. The American government is using his research to set policy. Is that warranted? He recently published a paper that might be used to justify interventions on kids, there is just one little problem… in his own words:

We made a mistake in the age group we described in the JAMA article. We mistakenly reported children ranging from 8 to 11 years old; however, the children were actually 3 to 5 years old

Doing good research, as any scientist will tell you, requires lots of attention to details, even small details. We all make mistakes, but we also all have to be careful. Getting confused by such a large margin regarding the age of the participants does not seem like a small mistake, however. It is the second time that Wansink makes this same mistake, it also happened in 2012. There are currently 50 misconduct allegations against Wansink, three of his papers have been retracted. A total of 150 statistical mistakes were found in just one of his papers. I covered the Wansink case earlier this year, predicting that Cornell would not fire him. His laboratory is still running. You can read extensive reports on the incredible amount of fraud involved. Wansink’s Wikipedia page is rather dismissive of the problems. If you think I am being hard, go read Gelman’s take on the issue.

Amy Cuddy was a professor at Harvard University who became famous for her “power poses” theory. She has been cited thousands and thousands of times in research articles. Sadly, her work could not be reproduced. Once under scrutiny, her work came apart: not only could it not be reproduced, but it also seems that the initial evidence, even when set in the best light, was flimsy. Her statistical analysis makes little sense. And she defended it. The New York Times has a piece of her that represents her as a victim. The narrative is that while it was once ok to publish work without any expectation that others could reproduce it or with strong statistical analysis, the rules changed suddenly, and she was a victim of this change. I have to disagree with this narrative. It has never been “ok” in science for others to fail to reproduce your work. Feynman wrote back in the 1970s:

We’ve learned from experience that the truth will come out. Other experimenters will repeat your experiment and find out whether you were wrong or right. (…) And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work. And it’s this type of integrity, this kind of care not to fool yourself, that is missing to a large extent (…)

Though I would agree that anything resembling an ad hominem attack is outside the realm of science, gaining fame by taking down the results of someone else is entirely fair game. It is a good thing that we reward people who take a chance and try to be critical of the established theories. In fact, that is what science is all about. It is not a secondary pursuit that some ill-intentioned people choose to pursue. It is the very nature of science to take what you are told and to re-examine it. The comparison between Wansink and Cuddy raises questions, however. Why is Wansink allowed to go on with massive funding while Amy Cuddy had to drop her Ivy League professorship? Wansink is an all-out fraud whereas Cuddy appears to have simply fooled herself. In my view, Cuddy could go on, saying “we screwed up”. Wansink should be banned from science. He has damned himself.

While researching the Cuddy article, I ended up on Hacker News, and then on a blog post by Frank McSherry. It is a courageous blog post where McSherry takes apart some of the best papers from database research (including the best paper from SIGMOD 2017) and shows that you cannot trust their results, that they present their (often overengineered) approaches in a good light when a simple baseline would show that their results are not very good. Though McSherry chose to go after VLDB and SIGMOD (some of the best venues for database research), you should not conclude that things get better at lesser venues. Maybe it is a coïncidence but in the last week, I reviewed four submissions to journals, all of them proposing new, faster algorithms. Three of them did not even offer a benchmark. Because, apparently, it is good enough to argue that your algorithm is faster, and actually running it is just a waste of time… while the fourth one did include a benchmark, but by working backward from their results, one has to conclude that it takes 1,000 CPU cycles to add two numbers toghether.

There has apparently been a very large decline in the number of flying insects.

“Across several metrics, organic agriculture actually proves to be more harmful for the world’s environment than conventional agriculture.”

Oddly, it appears that using small needles on your scalp can help you grow more hair. Microneedling is apparently a thing to help diminish wrinkles, and it seems that it works on your scalp too to grow more hair. I am somewhat skeptical.

Bloomberg has a nice article on Fanuc, the leading producer of industrial robots. They are crazily secretive but also massively succesful. Indirectly, they make a lot of what we all buy, through the robots that they sell to China or Tesla.

Not satisfied to have shown that they could beat the best human beings at Go, the Deep Mind engineers now report that they can do so with software that teaches itself the game without any help. It plays against itself. This is a big deal because it opens the door for general solutions. Down this path is a machine that could learn to play just about any game, as long as it can play against itself.

Dyslexia is a common conditions where people experience difficulty reading. It appears that it may be caused by a defect in the eyes, as opposed to a problem in the brain. This reminds us that we don’t know what causes dyslexia.

Young blood appears to rejuvenate old kidneys, in mice. This suggests that it would make sense for young people to receive organs donated by old people. It might also suggest that your blood keeps track of how old you are. Certainly, there is something in our blood that can tell our age, but if we were to change our blood, could be become more youthful? We just do not know, but we will, hopefully, find out within a decade or so.

It appears that there are serious methodological errors in Piketty’s famous book “Capital in the Twenty-First Century”. It seems, in fact, that the data is pretty much useless. So maybe the book is best read as a very long opinion piece that tells us that economic inequality is rising fast.

Our neurons are covered and protected by a myelin layer, basically a layer of fat. When it erodes, we are in trouble. It appears that a common allergy drug could repair our myelin in some cases.

Cancer cells are characterized by the fact that they produce energy using fermentation, which turns sugar into energy somewhat inefficiently. It now seems that an abundance of sugar promotes fermentation and thus, maybe, cancer.

The CEO of the largest software hosting site (GitHub) thinks that programmers may be replaced by robots in the foreseeable future. I presume that he thinks his company will supervise the software that replaces programmers?

Caloric restriction extends lifespan in most species. It appears that ketone bodies would mimic this effect. Ketone bodies are normally produced in our bodies when we go a long time without eating sugar. It seems that ingesting ketone bodies from external sources might be an alternative to adopting a sugar-free diet.

Why virtual reality (VR) might matter more than you think….

I have heard it claimed that the famous novelist William Gibson uttered his famous quote, “the future is already here — it’s just not very evenly distributed”, for the first time after experiencing virtual reality, decades ago.

We are fast arriving at a point where virtual-reality will be dirt cheap, and it will work really well.

A core issue right now, and that might surprise you, is that most people, including those who have tried virtual-reality goggles, cannot really say what virtual-reality is.

The naïve answer is that virtual reality provides immersive three-dimensional world view. That is, when thinking about virtual reality, people think about the display. And they could be excused for doing so by the fact that the physical devices appear to be focus so much on displaying pixels. We have goggles with embedded screens, and so forth.

But, actually, I submit to you that the display is not entirely essential. Of course, you need perception for an experience to make sense, but you could have virtual reality without any light whatsoever. You would probably have to focus on sounds, touch, and smell.

Virtual reality also does not need to be realistic. It is not at all obvious that the more realistic the representation, the better it is. You could have great experiences with a cartoonish worldview. That would side-step the uncanny-valley issue. I actually suspect that some of the best applications of virtual reality will not involve photo-realistic worldviews.

What actually matters with virtual reality is that it engages your whole body. That’s the crucial point. When you use a computer, your fingers (mostly three of them on each hand) do most of the work. I can sit in my campus office working, and because the lights are automated, it might go dark just as I am finishing off a sentence… because I am hardly moving at all when I work in a traditional manner with my computer.

If you were paralyzed, virtual reality would not help you in the least. At a minimum, for virtual reality to make any kind of sense, you must be able to move your head around. It is not so with traditional computing where as long as you can move your arms and use your fingers, your head can be mostly stationary.

I believe that it explains in part how virtual reality affects our perception regarding the flow of time. Virtual reality is somewhat tiring, compared with sitting at a desk, so fifteen minutes of interaction in virtual reality feel (as far as the body is concerned) as tiring as hours sitting at a desk. Thus, time is somewhat accelerated in virtual reality.

But I also theorize that virtual reality affects how you think in a less trivial manner. It favors embodied cognition. An athlete or a chef has a particular type of intelligence where the space around them becomes an extension of their own mind.

It is easy to dismiss such ideas as verging on the mysticism. Yet it is undeniable that we think differently when our bodies are involved. I have now reached a point where I set a clear separating line between in visu meetings and videoconferences. They are drastically different experiences, resulting in very different cognitive outcomes. For example, I believe that it makes no sense to conduct job interviews using video conferencing. And I say this as a nerd that avoids social interactions whenever possible.

That is, the view that we are brains in a jar is hopelessly naïve and wrong. The idea that we “think with our brains” is, in my view, only true as a first approximation. There is a continuum between our brain cells and the objects around us. A spider without a web is a useless animal. The spider uses its web as an extension of itself, to measure distances, track directions, and even as a perception device. Human beings do not have physical webs coming out of their hands, but we are simply much more advanced spiders, with the ability to create our own webs, like the world-wide-web.

I believe that many of the paradigm shifts that we have encountered as intellectuals come about through changes that have little to do with pure reason and a lot to do with our bodies and their perception:

  • Museums often present very little textual information. Mostly, you get to see, and often touch, artifact. It is through the presentation of inanimate objects that people acquire a feeling of how things were many centuries ago. Try, as an experiment, to view a three-dimensional representation of the object on a screen. It is not the same! The idea that you should collect and display objects to convey information is not entirely trivial, and yet we take it for granted today.
  • Though we might credit much of the rise of statistics to the formal mathematical results introduced by famous mathematicians… I believe that we should rather credit authors such as Playfair for introducing the modern-day line graph (in 1786!). If that’s all you had, you could still study effectively inequalities and climate change. But plots are much less rational than it appears: if you were to present line charts to people and ask them to describe what they see, they would have a hard time elaborating beyond a first-level interpretation. And the provided linguistic description would not allow others to understand what was in the graph. There is more in a graph than we can tell. In some sense, it is also easier to lie with statistics than with a plot: try plotting your own weight over the last few months… and compare the result with whatever statistical rationalization you might come up with. Lying with a plot requires a more deliberate attitude. I believe that there is a deeper story to be told about the relationship between the emergence of science and the scientific method: it seems clear that the line graph preceded science. I believed that it might have played an important role.
  • The industrial revolution came about after we got to experience automatons, these popular toys from the Victorian era (and earlier) where one could see gears moving underneath. The physical reality of these devices and the fact that you could, as a kid, look at them and eventually hold the gears in your hands, probably made a huge difference.
  • The early computers were programmed using plugs and cards… but soon we imported the keyboard into computing… the keyboard is an obvious cognitive extension first created to help us make music more precisely. Without the keyboard we would not have modern-day programming, that much is certain. Isn’t it amazing how we went from musical instrument to software programming?

All of these examples illustrate how altering our environment even in a minute way allowed us to think better.

My theory is that there are entire threads of thoughts, that we cannot have yet, that we cannot even imagine, but that virtual reality will enable.

There are still massive challenges, however. One of them is affordance. For example, many virtual-reality games and systems use the concept of “teleportation” to move you from one point to another. In my view, this is deeply wrong: it uses your hand as a pointing device, just as you would do in conventional computing. Grabbing and moving objects, interacting with objects in general, is awkward in virtual reality. I don’t think we know how to enter text in virtual reality. There is also a bandwidth issue. The screens of current virtual-reality goggles have a relatively low resolution which makes reading small fonts difficult, and reading in general is unpleasant. Interactions are also at a relatively large scale: you cannot use fine motor control to flip a small switch. Everything has to be large and clunky.

Still. I think that chances are good that new world-changing paradigms are made possible by virtual reality. It should allow us to build better webs as the spiders that we are.

The Harvey-Weinstein scientific model

It is widely believed that science is the process by which experts collectively decide on the truth and post it up in “peer-reviewed journals”. At that point, once you have “peer-reviewed research articles” then the truth is known.

Less naïve people raise the bar somewhat. They are aware that individual research articles could be wrong, but they believe that science is inherently self-correcting. That is, if the scientific community reports a common belief, then this belief must be correct.

Up until at least 1955, well-regarded biology textbooks reported that we had 24 chromosomes. The consensus was that we had 24 chromosomes. It had to be right! It took a young outsider to contradict the consensus. In case you are wondering, we have 23 pairs of chromosomes, a fact that is not difficult to verify.

The theory of continental drift was ridiculed for decades. Surely, continents could not move. But they do. It took until 1968 before the theory could be accepted.

Surely, that’s not how science works. Surely, correct theories win out quickly.

But that’s not what science is. In Feynman’s words:

Our freedom to doubt was born out of a struggle against authority in the early days of science. It was a very deep and strong struggle: permit us to question — to doubt — to not be sure. I think that it is important that we do not forget this struggle and thus perhaps lose what we have gained.

That is, science is about doubting everything, especially the experts.

Many people struggle with this idea… that truth should be considered to be an ideal that we can’t ever quite reach… that we have to accept that constant doubt, about pretty much everything, is how we do serious work… it is hard…

Let me take recent events as an illustration. Harvey Weinstein was a movie producer who, for decades, abused women. Even though it was widely known, it was never openly reported and denounced.

How could this be? How could nobody come forward for years and years and years… when the evidence is overwhelming?

What is quite clear is that it happens, all the time. It takes a lot of courage to face your peers and tell them that they are wrong. And you should expect their first reaction to be rejection. Rejection is costly.

The scientific model is not what you are taught in school… it is what Feynman describes… an ability to reject “what everyone knows”… to speak a different truth that the one “everyone knows”.

The public consensus was that Harvey Weinstein was a respected businessman. The consensus is not about truth. It is a social construction.

Science is not about reaching a consensus, it is about doubting the consensus. Anyone who speaks of a scientific consensus badly misunderstands how science works.

Science and Technology links (October 13th, 2017)

Rodney Brooks, who commercialized robots that can vacuum your apartment, has written a great essay on artificial intelligence. It is worth reading.

There is some concern that the computers necessary to control a self-driving car will use so much power that they will significantly increase the energy usage of our cars.

Facebook will commercialize the Oculus Go, a $200 autonomous virtual-reality headset.

Bee-level intelligence

How close are we to having software that can emulate human intelligence? It is hard to tell. One problem with human beings is that we have large brains, with an almost uncountable number of synapses. We have about 86 billion neurons. This does not seem far from the 4.3 billion transistors that can be found in the last iPhone… but a neuron cannot be compared with a transistor, a very simple thing.

We could more fairly compare transistors with synapses (connections between neurons). Human males have about 150 trillion synapses and human females about 100 trillion synapses.

We do not have any computer that approaches 100 trillion transistors. This means that even if we thought we had the algorithms to match human intelligence, we could still fall short simply because our computers are not powerful enough.

But what about bees? Bees can fly, avoid obstacles, find flowers, come back, tell other bees where to find the flowers, and so forth. Bees can specialize, they can communicate, they can adapt. They can create amazing patterns. They can defeat invaders. They can fight for their life.

I don’t know about you, but I don’t know any piece of software that exhibits this kind of intelligence.

The honey bee has less than a million neurons and it has about a billion synapses. And all of that is determined by only about 15,000 genes (and most of them have probably nothing to do with the brain). I don’t know how much power a bee’s brain requires, but it is far, far less than the least powerful computer you have ever seen.

Bees don’t routinely get confused. We don’t have to debug bees. They tend to survive even as their environment changes. So while they may be well tuned, they are not relying on very specific and narrow programs.

Our most powerful computers have 100s of billions of transistors. This is clearly not far from the computing power of a bee, no matter how you add things up. I have a strong suspicion that most of our computers have far more computing power than a bee’s brain.

What about training? Worker bees only live a few weeks. Within hours after birth, they can spring into action, all of that from very little information.

What I am really looking forward, as the next step, is not human-level intelligence but bee-level intelligence. We are not there yet, I think.

Post-Blade-Runner trauma: From Deep Learning to SQL and back

Just after posting my review of the movie Blade Runner 2049, I went to attend the Montreal Deep Learning summit. Deep Learning is this “new” artificial-intelligence paradigm that has taken the software industry by storm. Everything, image recognition, voice recognition, and even translation, has been improved by deep learning. Folks who were working on these problems have often been displaced by deep learning.

There has been a lot of “bull shit” in artificial intelligence, things that were supposed to help but did not really help. Deep learning does work, at least when it is applicable. It can read labels on pictures. It can identify a cat in a picture. Some of the time, at least.

How do we know it works for real? It works for real because we can try it out every day. For example, Microsoft has a free app for iPhones called “Seeing AI” that lets you take arbitrary pictures. It can tell you what is on the picture with remarkable accuracy. You can also go to deepl.com and get great translations, presumably based on deep-learning techniques. The standard advice I provide is not to trust the academic work. It is too easy to publish remarkable results that do not hold up in practice. However, when Apple, Google and Facebook start to put a technique in their products, you know that there is something of a good idea… because engineers who expose users to broken techniques get instant feedback.

Besides lots of wealthy corporations, the event featured talks by three highly regarded professors in the field: Yoshua Bengio (Université de Montréal), Geoffrey Hinton (University of Toronto) and Yann LeCun (New York University). Some described it as a historical event to see these three in Montreal, the city that saw some of the first contemporary work on deep learning. Yes, deep learning has Canadian roots.

For some context, here is what I wrote in my Blade Runner 2049 review:

Losing data can be tragic, akin to losing a part of yourself.

Data matters a lot. A key feature that makes deep learning work in 2017 is that we have lots of labeled data with the computers to process this data at an affordable cost.

Yoshua Bengio spoke first. Then, as I was listening to Yoshua Bengio, I randomly went to my blog… only to discover that the blog was gone! No more data.

My blog engine (wordpress) makes it difficult to find out what happened. It complained about not being able to connect to the database which sent me on a wild hunt to find out why it could not connect. Turns out that the database access was fine. Why was my blog dead?

I carried with me to the event my smartphone and an iPad. A tablet with a pen is a much better supporting tool when attending a talk. Holding a laptop on your lap is awkward.

Next, Geoffrey Hinton gave a superb talk, though I am sure non-academics will think less of him than I do. He presented recent, hands-on results. Though LeCun, Bengio and Hinton supposedly agree on most things, I felt that Hinton presented things differently. He is clearly not very happy about deep learning as it stands. One gets the impression that he feels that whatever they have “works”, but it is not because it “works” that it is the right approach.

Did I mention that Hinton predicted that computers would have common-sense reasoning within 5 years? He did not mention this prediction at the event I was at, though he did hint that major breakthroughs in artificial intelligence could happen as early as next week. He is an optimistic fellow.

Well. The smartest students are flocking to deep learning labs if only because that is where the money is. So people like Hinton can throw graduate students at problems faster than I can write blog posts.

What is the problem with deep learning? For the most part, it is a brute force approach. Throw in lots of data, lots of parameters, lots of engineering and lots of CPU cycles, and out comes good results. But don’t even ask why it works. That is not clear.

“It is supervised gradient descent.” Right. So is Newton’s method.

I once gave a talk about the Slope One algorithm at the University of Montreal. It is an algorithm that I designed and that has been widely used in e-commerce systems. In that paper, we set forth the following requirements:

  • easy to implement and maintain: all aggregated data should be easily interpreted by the average engineer and algorithms should be easy to implement and test;
  • updateable on the fly;
  • efficient at query time: queries should be fast.

I don’t know if Bengio was present when I gave this talk, but it was not well received. Every point of motivation I put forward contradicts deep learning.

It sure seems that I am on the losing side of history on this one, if you are an artificial intelligence person. But I do not do artificial intelligence, I do data engineering. I am the janitor that gets you the data you need at the right time. If I do my job right, artificial intelligence folks won’t even know I exist. But you should not make the mistake of thinking that data engineering does not matter. That would be about as bad as assuming that there is no plumbing in your building.

Back to deep learning. In practical terms, even if you throw deep learning behind your voice assistant (e.g., Siri), it will still not be able to “understand” you. It may be able to answer correctly to common queries, but anything that is unique will throw it off entirely. And your self-driving car? It relies on very precise maps, and it is likely to get confused at anything “unique”.

There is an implicit assumption in the field that deep learning has finally captured how the brain works. But that does not seem to be quite right. I submit to you that no matter how “deep” your deep learning gets, you will not pass the Turing test.

The way the leading deep-learning researchers describe it is by saying that they have not achieved “common sense”. Common sense can be described as the ability to interpolate or predict from what you know.

How close is deep learning to common sense? I don’t think we know, but I think Hinton believes that common sense might require quite different ideas.

I pulled out my iPad, and I realized after several precious minutes that the database had been wiped clean. I am unsure what happened… maybe a faulty piece of code?

Because I am old, I have seen these things happen before: I destroyed the original files of my Ph.D. thesis despite having several backup copies. So I have multiple independent backups of my blog data. I had never needed this backup data before now.

Meanwhile, I heard Yoshua Bengio tell us that there is no question now that we are going to reach human-level intelligence, as a segue into his social concerns regarding how artificial intelligence could end up in the wrong hands. In the “we are going to reach human-level intelligence”, I heard the clear indication that he included himself has a researcher. That he means to say that we are within striking distance of having software that can match human beings at most tasks.

Because it is 2017, I was always watching my twitter feed and noticed that someone I follow had tweeted about one of the talks, so I knew he was around. I tweeted him back, suggesting we meet. He tweeted me back, suggesting we meet for drinks upstairs. I replied back that I was doing surgery on a web application using an iPad.

It was the end of the day by now, everyone was gone. Well. The Quebec finance minister was giving a talk, telling us about how his government was acutely aware of the importance of artificial intelligence. He was telling us about how they mean to use artificial intelligence to help fight tax fraud.

Anyhow, I copied a blog backup file up to the blog server. I googled the right command to load up a backup file into my database. I was a bit nervous at this point. Sweating it as they say.

You see, even though I taught database courses for years, and wrote research papers about it, even designed my own engines, I still have to look up most commands whenever I actually work on a database… because I just so rarely need to do it. Database engines in 2017 are like gasoline engines… we know that they are there, but rarely have to interact directly with them.

The minister finished his talk. Lots of investment coming. I cannot help thinking about how billions have already been invested in deep learning worldwide. Honestly, at this point, throwing more money in the pot won’t help.

After a painful minute, the command I had entered returned. I loaded up my blog and there it was. Though as I paid more attention, I noticed that last entry, my Blade Running 2049 post, was gone. This makes sense because my backups are on a daily basis, so my database was probably wiped out before my script could grab a copy.

What do you do when the data is gone?

Ah. Google creates a copy of my post to serve them to you faster. So I went to Twitter, looked up the tweet where I shared my post, followed the link and, sure enough, Google served me the cached copy. I grabbed the text, copied it over and recreated the post manually.

My whole system is somewhat fragile. Securing a blog and doing backups ought to be a full-time occupation. But I am doing ok so far.

So I go meet up my friend for drinks, relaxed. I snap a picture or two of the Montreal landscape while I am at it. Did I mention that I grabbed the pictures on my phone and I immediately shared them with my wife, who is an hour away? It is all instantaneous, you know.

He suggests that I could use artificial intelligence in my own work, you know, to optimize software performance.

I answer with some skepticism. The problems we face with data engineering are often architectural problems. That is, it is not the case that we have millions of labeled instances from which to learn from. And, often, the challenge is to come up with a whole new category, a whole new concept, a whole new architecture.

As I walk back home, I listen to a podcast where people discuss the manner in which artificial intelligence can exhibit creativity. The case is clear that there is nothing magical in human creativity. Computers can write poems, songs. One day, maybe next week, they will do data engineering better than us. By then, I will be attending research talks prepared by software agents.

As I get close to home, my wife texts me. “Where are you?” I text her. She says that she is 50 meters away. I see in the distance, it is kind of dark, a lady with a dog. It is my wife with her smartphone. No word was spoken, but we walk back home together.

On Blade Runner 2049

Back in 1982, an incredible movie came out, Blade Runner. It told the story of “artificial human beings” (replicants) that could pass as human beings, but had to be hunted down. The movie was derived from a novel by Philip Dick.

It took many years for people to “get” Blade Runner. The esthetic of the movie was like nothing else we had seen at the time. It presented a credible and dystopian futuristic Los Angeles.

As a kid, I was so deeply engaged in the movie that I quietly wrote my own fan fiction, on my personal typewriter. Kids like me did not own computers at the time. To be fair, most of the characters in Blade Running also do not own computers, even if the movie is set in 2019. Like in the Blade Runner universe, I could not distribute my prose other than on paper. It has now been lost.

Denis Villeneuve made a follow-up called Blade Runner 2049. You should go see it.

One of the core point that Dick made in his original novel was that human beings could be like machines while machines could be like human beings. Villeneuve’s Blade Runner 2049 embraces this observation… to the point where we can reasonably wonder whether any one of the characters is actually human. Conversely, it could be argued that they are all human.

Like all good science fiction, Blade Runner 2049 is a commentary about the present. There is no smartphone, because Blade Runner has its own technology… like improbable dense cities and flying cars… but the authors could not avoid our present even if they wanted to.

What we find in Blade Runner 2049 are companies that manage memories, as pictures and short films. And, in turn, we find that selection of these memories has the ability to change us… hopefully for the better. Yet we find that truth can be difficult to ascertain. Did this event really happen, or is it “fake news”? Whoever is in charge of managing our memories can trick us.

Blade Runner 2049 has voice assistants. They help us choose music. They can inform us. They can be interrupted and upgraded. They come from major corporations.

In Blade Running 2049, there is a cloud (as in “cloud computing”) that makes software persistent and robust. Working outside the cloud remains possible if you do not want to be tracked, with the caveat that the information can easily be permanently destroyed.

Losing data can be tragic, akin to losing a part of yourself.

Death, life, it all comes down to data. That is, while it was easy prior to the scientific revolution to view human beings as special, as having a soul… the distinction between that which has a soul and which that does not because increasingly arbitrary in the scientific (and information) age. I am reminded of Feynman’s observation:

To note that the thing I call my individuality is only a pattern or dance, that is what it means when one discovers how long it takes for the atoms of the brain to be replaced by other atoms. The atoms come into my brain, dance a dance, and then go out – there are always new atoms, but always doing the same dance, remembering what the dance was yesterday. (Richard Feynman, The value of science, 1955)

Science and Technology links (October 6th, 2017)

In 2011, Bollen et al. published a paper with powerful claims entitled Twitter Mood Predicts the Stock Market. The paper has generated a whole academic industry. It has been cited 3000 times, lead to the creation of workshops… and so forth. Lachanski and Pav recently tried to reproduced the original claims:

Constructing multiple measures of Twitter mood using word-count methods and standard sentiment analysis tools, we are unable to reproduce the p-value pattern that they found. We find evidence of a statistically significant Twitter mood effect in their subsample, but not in the backwards extended sample, a result consistent with data snooping. We find no evidence that our measures of Twitter mood aid in predicting the stock market out of sample.

Timothy Gowers (world-famous mathematician) writes about the death and interests of (world-famous mathematician) Vladimir Voevodsky:

(…) in his field there were a number of examples of very serious work, including by him, that turned out to have important mistakes. This led him to think that the current practice of believing in a result if enough other experts believe in it was no longer sustainable.

What is most amazing, to me, is how academics are utterly convinced that their own work is beyond reproach. When you ask: “have you had a competing lab reproduce your experiments?” The answer, invariably, is that it is unnecessary. Yet even mathematicians recognize that they have a serious problem avoiding mistakes. It should be clear that of all of scholarship, mathematicians should have the least significant problems in this respect. Nobody “cheats” in mathematics, as far as the mathematical truth is concerned. Truth is not an ambiguous concept in mathematics, you are right or wrong. Yet leading mathematicians grow concerned that “truth” is hard to assess.

This week, the Nobel prizes for 2017 were awarded. No woman received a Nobel prize. At a glance, it looks like caucasian scholars dominate. No Japanese, no Chinese, no Korean researcher that I can see. On a related note, Japan seems to be losing its status as a research power. (So we are clear, I do not believe that caucasian men make better researchers as far as genetics is concerned.)

One of the winners of the medicine Nobel prize, Jeffrey Hall, is quite critical of the research establishment:

I can’t help feel that some of these ‘stars’ have not really earned their status. I wonder whether certain such anointees are ‘famous because they’re famous.’ So what? Here’s what: they receive massive amounts of support for their research, absorbing funds that might be better used by others. As an example, one would-be star boasted to me that he’d never send a paper from his lab to anywhere but Nature, Cell, or Science. These submissions always get a foot in the door, at least. And they are nearly always published in one of those magazines — where, when you see something you know about, you realize that it’s not always so great.

Authorea has published a list of eight research papers that were initially rejected, but which ended up being rewarded by a Nobel prize. This is an illustration of the fact that it is very difficult for even the best experts to recognize the significance of some work as it happens.

It appears that under the Trump presidency, the Food and Drug Administration (FDA) has been approving new drugs at twice the “normal” rate.

Google will commercialize headphones that should allow you to understand (through just-in-time translation) over 40 different languages. The pixel buds will sell for under $200.

My wife asked me, the other day, whether people in China used hyperlinks made of Chinese characters. I assumed so. It turns out that the answer involves something called punycode which is a way to encode arbitrary characters as ASCII (English-only) characters.

Breast cancer is less letal:

From 1989 to 2015, breast cancer death rates decreased by 39%, which translates to 322,600 averted breast cancer deaths in the United States.

To be clear, we are still very far from having a breast-cancer cure, let alone a cancer cure.

On a related note, over half of the new cancer drugs approved in recent years do not improve your odds of surviving cancer. What happens, apparently, is that drugs are approved on the basis of narrow measures that may or may not translate into concrete health benefits.

How likely is a medical professional to prescribe unnecessary therapies? More so than we’d like:

We find that the patient receives an overtreatment recommendation in more than every fourth visit.

One of the effect of aging is that our telomeres get shorter. With every cell division, this non-coding piece of your DNA gets shorter and shorter. At the margin, this may affect the ability of your tissues to regenerate. TRF1 protects your telomeres, and it appears that having more of it could be helpful:

We further show that increasing TRF1 expression in both adult (1-year-old) and old (2-year-old) mice using gene therapy can delay age-associated pathologies.

A common theme in politics these days is “inequality”: some people are getting richer while others are not. In turn, this inequality can be related to a falling share of income that goes toward salaries. That is, a society where most of the wealth is distributed as salaries tends to be more “equal”. Right now, it appears that a lot of wealth goes into property values in a highly desirable areas, for example. Since the poor do not own buildings in Manhattan, they do not benefit from this kind of wealth distribution. So why aren’t we getting larger salaries? Researchers for Harvard, Princeton and the London School of Economics believe that this fall could be explained by our low productivity. Of course, even if they are correct, this just leads us to another question: why aren’t we getting more productive?

In related news, some scholars from Stanford believe that research productivity is way down. Apparently, new good ideas are getting harder to find. From my corner of the world (software), this looks like an incredible claim. I cannot even superficially keep up with even a narrow subset of the software industry. There are significant advances left and right… too much for my little brain. Speaking for myself, I certainly have no shortage of what appears to me to be good ideas. I am mostly limited by my ability to find the energy to pursue them… and by the fact that I want to spend quality time with my family. I cannot believe that all the researchers, many much smarter than I’ll ever be, are finding new ideas harder to find.

Baboons can learn to recognize English words:

It has recently been shown that Guinea baboons can be trained to discriminate between four-letter words (e.g., TORE, WEND, BOOR, TARE, KRIS) and nonwords (e.g., EFTD, ULKH, ULNX, IMMF) simply by rewarding them for correct lexicality decisions. The number of words learned by the six baboons ranged from 81 to 307 (after some 60,000 trials), and they were reported to respond correctly to both novel words and nonwords with above chance performance.

It looks like the company that is reinventing the taxi industry, Uber, might pull out from Quebec, Canada. At issue is the requirement to having several days of training before one can act as a taxi driver.

Worms live longer at lower temperatures due to different gene expressions.

My iPad Pro experiment

Years ago, I placed a bet with Greg Linden, predicting that tablets like the iPad would replace PCs. The bet did not end well for me. My own analysis is that I lost the bet primarily because I failed to foresee the market surge of expensive and (relatively) large smartphones.

Though I lost to Greg, there is no denying it, I don’t think that I was wrong regarding the fundamental trends. So, during the third quarter of 2017, Apple sold 4.3 million laptops. That’s not bad. But Apple sold 11.4 million iPads, nearly three times as many. The real story, of course, is that Apple sold over 40 million iPhones, and a large fraction of these iPhone have been top-of-the-line iPhones.

For comparison, PC shipments worldwide are at around 60 million. So 11.4 million iPads is nothing to sneeze at, but it is no PC killer. About 40 million tablets are sold every quarter. The fight between PCs and tablets has been not been very conclusive so far. Though tablet sales have stagnated, and even diminished, many PCs look a lot like tablets. A fair assessment is that we are currently at a draw.

This is somewhat more optimistic than Greg’s prediction from 2011:

I am not saying that tablets will not see some sales to some audience. What I am saying is that, in the next several years, the audience for tablets is limited, tablet sales will soon stall around the same level where netbook sales stalled, and the PC is under no threat from tablets.

Who even remember what a netbook is today?

In any case, we are not yet at the point where people are dumping their PCs (or MacBooks) in favor of tablets. I would argue that people probably use PCs a lot less than they used them in the past, relatively speaking, because they do more work on their smartphones. I tend to process much of my emails in my smartphone.

But PCs are still around.

I’m not happy about this.

Even though I have had a iPad ever since Apple made them, I never tried to make the switch professionally. This summer, I finally did so. My 12-inch iPad pro has been my primary machine for the last couple of months. I got an Apple keyboard as well as a pencil.

Let me establish the context a bit. I am a university professor and department chair. I have an active research program, with graduate students. I write code using various programming languages. I write papers using LaTeX. I have to do lots of random office work.

Here are my thoughts so far:

  • It works. You can use an iPad as a primary machine. This was not at all obvious to me when I started out.
  • It creates envy. Several people who watch me work decide to give it a try. This is a sign that I look productive and happy on my tablet.
  • Some of the worse experience I have is with email. Simply put, I cannot quickly select a large chunk of text (e.g., 500 lines) and delete it. Selecting text on an iPad is infuriating. It works well when you want to select a word or two, but there seems to be no way to select a large chunk. Why is this a problem? Because when replying to emails, I keep my answers short, so I want to delete most if not all of the original message and quote just the necessary passage.
  • The pain that is selecting text affects pretty much all applications where text is involved.
  • Copy and paste is unnecessarily difficult. I don’t know how to select just the text without formatting. Sometimes I end up selecting the link instead of the text related to the link.
  • Microsoft Office works on an iPad. I am not a heavy user, but for what I need to do, it is fine.
  • Apple has its own word processor called “Pages”. It works, but it won’t spell check in French (it does on a MacBook).
  • The hardware is nice, but more finicky than it should be. Both the pencil and the foldable keyboards tend to disconnect. The keyboard can be frustrating as it is held by magnets, but if you move the iPad around, the magnets might move and the keyboard might disconnect. It is not clear how to reconnect it systematically, so I end up “fiddling with it”. It is not as bad as I make it sound, and I don’t think anyone has ever been witness to my troubles, but they would see a very frustrated man.
  • Cloud computing is your friend. Dropbox, iCloud, Google Drive…
  • Reviewing PDF documents is nice. I use an app called “PDF Expert” which allows me to comment on the documents very handily.
  • While the iPad can multitask, I have not yet figured out how to put this ability to good use.
  • My employer expects me to use Java applets to fill out some forms. I can barely do it with my MacBook. It is a no go on the iPad.
  • Blogging works. I am using my iPad right now. However, it is not obvious how to do grammar and spell checks while typing within a web app. So I am making more mistakes than I should.
  • LaTeX works. I am using an app called TeXPad. It cannot build my documents locally, but it works once I tell it to use a cloud engine. I am also thinking that Overleaf could be solution. However, neither TeXPad on iOS nor Overleaf provide a great experience when using LaTeX. To be fair, LaTeX is hardly user friendly in the best of times.
  • iPads are not designed as developer machines. If you want to experiment with Swift, it is fairly easy to create “Swift playgrounds”, but that’s mostly for the purpose of learning the language. However, I am getting a good experience using ssh to connect to remote Linux boxes.

So my workflow is currently something of a hybrid. I have a cheap MacBook (not a MacBook pro!) that I use maybe 10% of the time. The rest of the time, I rely on the iPad.

Why do this? It is an experiment. So far, it has been instructive.

So what are the benefits? My impression is that replacing a laptop with a tablet makes me more productive at some things. For example, I spend more time reading on my iPad than I would on my laptop.

Stream VByte: first independent assessment

In an earlier post, I announced Stream VByte, claiming that it was very fast. Our paper was peer reviewed (Information Processing Letters) and we shared our code.

Still, as Feynman said, science is the belief in the ignorance of experts. It is not because I am an expert that you should trust me.

There is a high-quality C++ framework to build search engines called Trinity. Its author, Mark Papadakis, decided to take Stream VByte out for a spin to see how well it does. Here is what Mark had to say:

As you can see, Stream VByte is over 30% faster than the second fastest, FastPFOR in decoding, where it matters the most, and also the fastest among the 3 codecs in encoding (though not by much). On the flip side, the index generated is larger than the other two codecs, though not by much (17% or so larger than the smallest index generated when FastPFOR is selected).
This is quite an impressive improvement in terms of query execution time, which is almost entirely dominated by postings list access time (i.e integers decoding speed).

I was pleased that Mark found the encoding to be fast: we have not optimized this part of the implementation at all… because everyone keeps telling me that encoding speed is irrelevant. So it is “accidentally fast”. It should be possible to make it much, much faster.

Mark points out that Stream VByte does not quite compress as well, in terms of compression ratios, than other competitive alternatives. That’s to be expected because Stream VByte is a byte-oriented format, not a bit-oriented format. However, Stream VByte really shines with speed and engineering convenience.