The Harvey-Weinstein scientific model

It is widely believed that science is the process by which experts collectively decide on the truth and post it up in “peer-reviewed journals”. At that point, once you have “peer-reviewed research articles” then the truth is known.

Less naïve people raise the bar somewhat. They are aware that individual research articles could be wrong, but they believe that science is inherently self-correcting. That is, if the scientific community reports a common belief, then this belief must be correct.

Up until at least 1955, well-regarded biology textbooks reported that we had 24 chromosomes. The consensus was that we had 24 chromosomes. It had to be right! It took a young outsider to contradict the consensus. In case you are wondering, we have 23 pairs of chromosomes, a fact that is not difficult to verify.

The theory of continental drift was ridiculed for decades. Surely, continents could not move. But they do. It took until 1968 before the theory could be accepted.

Surely, that’s not how science works. Surely, correct theories win out quickly.

But that’s not what science is. In Feynman’s words:

Our freedom to doubt was born out of a struggle against authority in the early days of science. It was a very deep and strong struggle: permit us to question — to doubt — to not be sure. I think that it is important that we do not forget this struggle and thus perhaps lose what we have gained.

That is, science is about doubting everything, especially the experts.

Many people struggle with this idea… that truth should be considered to be an ideal that we can’t ever quite reach… that we have to accept that constant doubt, about pretty much everything, is how we do serious work… it is hard…

Let me take recent events as an illustration. Harvey Weinstein was a movie producer who, for decades, abused women. Even though it was widely known, it was never openly reported and denounced.

How could this be? How could nobody come forward for years and years and years… when the evidence is overwhelming?

What is quite clear is that it happens, all the time. It takes a lot of courage to face your peers and tell them that they are wrong. And you should expect their first reaction to be rejection. Rejection is costly.

The scientific model is not what you are taught in school… it is what Feynman describes… an ability to reject “what everyone knows”… to speak a different truth that the one “everyone knows”.

The public consensus was that Harvey Weinstein was a respected businessman. The consensus is not about truth. It is a social construction.

Science is not about reaching a consensus, it is about doubting the consensus. Anyone who speaks of a scientific consensus badly misunderstands how science works.

Science and Technology links (October 13th, 2017)

Rodney Brooks, who commercialized robots that can vacuum your apartment, has written a great essay on artificial intelligence. It is worth reading.

There is some concern that the computers necessary to control a self-driving car will use so much power that they will significantly increase the energy usage of our cars.

Facebook will commercialize the Oculus Go, a $200 autonomous virtual-reality headset.

Bee-level intelligence

How close are we to having software that can emulate human intelligence? It is hard to tell. One problem with human beings is that we have large brains, with an almost uncountable number of synapses. We have about 86 billion neurons. This does not seem far from the 4.3 billion transistors that can be found in the last iPhone… but a neuron cannot be compared with a transistor, a very simple thing.

We could more fairly compare transistors with synapses (connections between neurons). Human males have about 150 trillion synapses and human females about 100 trillion synapses.

We do not have any computer that approaches 100 trillion transistors. This means that even if we thought we had the algorithms to match human intelligence, we could still fall short simply because our computers are not powerful enough.

But what about bees? Bees can fly, avoid obstacles, find flowers, come back, tell other bees where to find the flowers, and so forth. Bees can specialize, they can communicate, they can adapt. They can create amazing patterns. They can defeat invaders. They can fight for their life.

I don’t know about you, but I don’t know any piece of software that exhibits this kind of intelligence.

The honey bee has less than a million neurons and it has about a billion synapses. And all of that is determined by only about 15,000 genes (and most of them have probably nothing to do with the brain). I don’t know how much power a bee’s brain requires, but it is far, far less than the least powerful computer you have ever seen.

Bees don’t routinely get confused. We don’t have to debug bees. They tend to survive even as their environment changes. So while they may be well tuned, they are not relying on very specific and narrow programs.

Our most powerful computers have 100s of billions of transistors. This is clearly not far from the computing power of a bee, no matter how you add things up. I have a strong suspicion that most of our computers have far more computing power than a bee’s brain.

What about training? Worker bees only live a few weeks. Within hours after birth, they can spring into action, all of that from very little information.

What I am really looking forward, as the next step, is not human-level intelligence but bee-level intelligence. We are not there yet, I think.

Post-Blade-Runner trauma: From Deep Learning to SQL and back

Just after posting my review of the movie Blade Runner 2049, I went to attend the Montreal Deep Learning summit. Deep Learning is this “new” artificial-intelligence paradigm that has taken the software industry by storm. Everything, image recognition, voice recognition, and even translation, has been improved by deep learning. Folks who were working on these problems have often been displaced by deep learning.

There has been a lot of “bull shit” in artificial intelligence, things that were supposed to help but did not really help. Deep learning does work, at least when it is applicable. It can read labels on pictures. It can identify a cat in a picture. Some of the time, at least.

How do we know it works for real? It works for real because we can try it out every day. For example, Microsoft has a free app for iPhones called “Seeing AI” that lets you take arbitrary pictures. It can tell you what is on the picture with remarkable accuracy. You can also go to deepl.com and get great translations, presumably based on deep-learning techniques. The standard advice I provide is not to trust the academic work. It is too easy to publish remarkable results that do not hold up in practice. However, when Apple, Google and Facebook start to put a technique in their products, you know that there is something of a good idea… because engineers who expose users to broken techniques get instant feedback.

Besides lots of wealthy corporations, the event featured talks by three highly regarded professors in the field: Yoshua Bengio (Université de Montréal), Geoffrey Hinton (University of Toronto) and Yann LeCun (New York University). Some described it as a historical event to see these three in Montreal, the city that saw some of the first contemporary work on deep learning. Yes, deep learning has Canadian roots.

For some context, here is what I wrote in my Blade Runner 2049 review:

Losing data can be tragic, akin to losing a part of yourself.

Data matters a lot. A key feature that makes deep learning work in 2017 is that we have lots of labeled data with the computers to process this data at an affordable cost.

Yoshua Bengio spoke first. Then, as I was listening to Yoshua Bengio, I randomly went to my blog… only to discover that the blog was gone! No more data.

My blog engine (wordpress) makes it difficult to find out what happened. It complained about not being able to connect to the database which sent me on a wild hunt to find out why it could not connect. Turns out that the database access was fine. Why was my blog dead?

I carried with me to the event my smartphone and an iPad. A tablet with a pen is a much better supporting tool when attending a talk. Holding a laptop on your lap is awkward.

Next, Geoffrey Hinton gave a superb talk, though I am sure non-academics will think less of him than I do. He presented recent, hands-on results. Though LeCun, Bengio and Hinton supposedly agree on most things, I felt that Hinton presented things differently. He is clearly not very happy about deep learning as it stands. One gets the impression that he feels that whatever they have “works”, but it is not because it “works” that it is the right approach.

Did I mention that Hinton predicted that computers would have common-sense reasoning within 5 years? He did not mention this prediction at the event I was at, though he did hint that major breakthroughs in artificial intelligence could happen as early as next week. He is an optimistic fellow.

Well. The smartest students are flocking to deep learning labs if only because that is where the money is. So people like Hinton can throw graduate students at problems faster than I can write blog posts.

What is the problem with deep learning? For the most part, it is a brute force approach. Throw in lots of data, lots of parameters, lots of engineering and lots of CPU cycles, and out comes good results. But don’t even ask why it works. That is not clear.

“It is supervised gradient descent.” Right. So is Newton’s method.

I once gave a talk about the Slope One algorithm at the University of Montreal. It is an algorithm that I designed and that has been widely used in e-commerce systems. In that paper, we set forth the following requirements:

  • easy to implement and maintain: all aggregated data should be easily interpreted by the average engineer and algorithms should be easy to implement and test;
  • updateable on the fly;
  • efficient at query time: queries should be fast.

I don’t know if Bengio was present when I gave this talk, but it was not well received. Every point of motivation I put forward contradicts deep learning.

It sure seems that I am on the losing side of history on this one, if you are an artificial intelligence person. But I do not do artificial intelligence, I do data engineering. I am the janitor that gets you the data you need at the right time. If I do my job right, artificial intelligence folks won’t even know I exist. But you should not make the mistake of thinking that data engineering does not matter. That would be about as bad as assuming that there is no plumbing in your building.

Back to deep learning. In practical terms, even if you throw deep learning behind your voice assistant (e.g., Siri), it will still not be able to “understand” you. It may be able to answer correctly to common queries, but anything that is unique will throw it off entirely. And your self-driving car? It relies on very precise maps, and it is likely to get confused at anything “unique”.

There is an implicit assumption in the field that deep learning has finally captured how the brain works. But that does not seem to be quite right. I submit to you that no matter how “deep” your deep learning gets, you will not pass the Turing test.

The way the leading deep-learning researchers describe it is by saying that they have not achieved “common sense”. Common sense can be described as the ability to interpolate or predict from what you know.

How close is deep learning to common sense? I don’t think we know, but I think Hinton believes that common sense might require quite different ideas.

I pulled out my iPad, and I realized after several precious minutes that the database had been wiped clean. I am unsure what happened… maybe a faulty piece of code?

Because I am old, I have seen these things happen before: I destroyed the original files of my Ph.D. thesis despite having several backup copies. So I have multiple independent backups of my blog data. I had never needed this backup data before now.

Meanwhile, I heard Yoshua Bengio tell us that there is no question now that we are going to reach human-level intelligence, as a segue into his social concerns regarding how artificial intelligence could end up in the wrong hands. In the “we are going to reach human-level intelligence”, I heard the clear indication that he included himself has a researcher. That he means to say that we are within striking distance of having software that can match human beings at most tasks.

Because it is 2017, I was always watching my twitter feed and noticed that someone I follow had tweeted about one of the talks, so I knew he was around. I tweeted him back, suggesting we meet. He tweeted me back, suggesting we meet for drinks upstairs. I replied back that I was doing surgery on a web application using an iPad.

It was the end of the day by now, everyone was gone. Well. The Quebec finance minister was giving a talk, telling us about how his government was acutely aware of the importance of artificial intelligence. He was telling us about how they mean to use artificial intelligence to help fight tax fraud.

Anyhow, I copied a blog backup file up to the blog server. I googled the right command to load up a backup file into my database. I was a bit nervous at this point. Sweating it as they say.

You see, even though I taught database courses for years, and wrote research papers about it, even designed my own engines, I still have to look up most commands whenever I actually work on a database… because I just so rarely need to do it. Database engines in 2017 are like gasoline engines… we know that they are there, but rarely have to interact directly with them.

The minister finished his talk. Lots of investment coming. I cannot help thinking about how billions have already been invested in deep learning worldwide. Honestly, at this point, throwing more money in the pot won’t help.

After a painful minute, the command I had entered returned. I loaded up my blog and there it was. Though as I paid more attention, I noticed that last entry, my Blade Running 2049 post, was gone. This makes sense because my backups are on a daily basis, so my database was probably wiped out before my script could grab a copy.

What do you do when the data is gone?

Ah. Google creates a copy of my post to serve them to you faster. So I went to Twitter, looked up the tweet where I shared my post, followed the link and, sure enough, Google served me the cached copy. I grabbed the text, copied it over and recreated the post manually.

My whole system is somewhat fragile. Securing a blog and doing backups ought to be a full-time occupation. But I am doing ok so far.

So I go meet up my friend for drinks, relaxed. I snap a picture or two of the Montreal landscape while I am at it. Did I mention that I grabbed the pictures on my phone and I immediately shared them with my wife, who is an hour away? It is all instantaneous, you know.

He suggests that I could use artificial intelligence in my own work, you know, to optimize software performance.

I answer with some skepticism. The problems we face with data engineering are often architectural problems. That is, it is not the case that we have millions of labeled instances from which to learn from. And, often, the challenge is to come up with a whole new category, a whole new concept, a whole new architecture.

As I walk back home, I listen to a podcast where people discuss the manner in which artificial intelligence can exhibit creativity. The case is clear that there is nothing magical in human creativity. Computers can write poems, songs. One day, maybe next week, they will do data engineering better than us. By then, I will be attending research talks prepared by software agents.

As I get close to home, my wife texts me. “Where are you?” I text her. She says that she is 50 meters away. I see in the distance, it is kind of dark, a lady with a dog. It is my wife with her smartphone. No word was spoken, but we walk back home together.

On Blade Runner 2049

Back in 1982, an incredible movie came out, Blade Runner. It told the story of “artificial human beings” (replicants) that could pass as human beings, but had to be hunted down. The movie was derived from a novel by Philip Dick.

It took many years for people to “get” Blade Runner. The esthetic of the movie was like nothing else we had seen at the time. It presented a credible and dystopian futuristic Los Angeles.

As a kid, I was so deeply engaged in the movie that I quietly wrote my own fan fiction, on my personal typewriter. Kids like me did not own computers at the time. To be fair, most of the characters in Blade Running also do not own computers, even if the movie is set in 2019. Like in the Blade Runner universe, I could not distribute my prose other than on paper. It has now been lost.

Denis Villeneuve made a follow-up called Blade Runner 2049. You should go see it.

One of the core point that Dick made in his original novel was that human beings could be like machines while machines could be like human beings. Villeneuve’s Blade Runner 2049 embraces this observation… to the point where we can reasonably wonder whether any one of the characters is actually human. Conversely, it could be argued that they are all human.

Like all good science fiction, Blade Runner 2049 is a commentary about the present. There is no smartphone, because Blade Runner has its own technology… like improbable dense cities and flying cars… but the authors could not avoid our present even if they wanted to.

What we find in Blade Runner 2049 are companies that manage memories, as pictures and short films. And, in turn, we find that selection of these memories has the ability to change us… hopefully for the better. Yet we find that truth can be difficult to ascertain. Did this event really happen, or is it “fake news”? Whoever is in charge of managing our memories can trick us.

Blade Runner 2049 has voice assistants. They help us choose music. They can inform us. They can be interrupted and upgraded. They come from major corporations.

In Blade Running 2049, there is a cloud (as in “cloud computing”) that makes software persistent and robust. Working outside the cloud remains possible if you do not want to be tracked, with the caveat that the information can easily be permanently destroyed.

Losing data can be tragic, akin to losing a part of yourself.

Death, life, it all comes down to data. That is, while it was easy prior to the scientific revolution to view human beings as special, as having a soul… the distinction between that which has a soul and which that does not because increasingly arbitrary in the scientific (and information) age. I am reminded of Feynman’s observation:

To note that the thing I call my individuality is only a pattern or dance, that is what it means when one discovers how long it takes for the atoms of the brain to be replaced by other atoms. The atoms come into my brain, dance a dance, and then go out – there are always new atoms, but always doing the same dance, remembering what the dance was yesterday. (Richard Feynman, The value of science, 1955)

Science and Technology links (October 6th, 2017)

In 2011, Bollen et al. published a paper with powerful claims entitled Twitter Mood Predicts the Stock Market. The paper has generated a whole academic industry. It has been cited 3000 times, lead to the creation of workshops… and so forth. Lachanski and Pav recently tried to reproduced the original claims:

Constructing multiple measures of Twitter mood using word-count methods and standard sentiment analysis tools, we are unable to reproduce the p-value pattern that they found. We find evidence of a statistically significant Twitter mood effect in their subsample, but not in the backwards extended sample, a result consistent with data snooping. We find no evidence that our measures of Twitter mood aid in predicting the stock market out of sample.

Timothy Gowers (world-famous mathematician) writes about the death and interests of (world-famous mathematician) Vladimir Voevodsky:

(…) in his field there were a number of examples of very serious work, including by him, that turned out to have important mistakes. This led him to think that the current practice of believing in a result if enough other experts believe in it was no longer sustainable.

What is most amazing, to me, is how academics are utterly convinced that their own work is beyond reproach. When you ask: “have you had a competing lab reproduce your experiments?” The answer, invariably, is that it is unnecessary. Yet even mathematicians recognize that they have a serious problem avoiding mistakes. It should be clear that of all of scholarship, mathematicians should have the least significant problems in this respect. Nobody “cheats” in mathematics, as far as the mathematical truth is concerned. Truth is not an ambiguous concept in mathematics, you are right or wrong. Yet leading mathematicians grow concerned that “truth” is hard to assess.

This week, the Nobel prizes for 2017 were awarded. No woman received a Nobel prize. At a glance, it looks like caucasian scholars dominate. No Japanese, no Chinese, no Korean researcher that I can see. On a related note, Japan seems to be losing its status as a research power. (So we are clear, I do not believe that caucasian men make better researchers as far as genetics is concerned.)

One of the winners of the medicine Nobel prize, Jeffrey Hall, is quite critical of the research establishment:

I can’t help feel that some of these ‘stars’ have not really earned their status. I wonder whether certain such anointees are ‘famous because they’re famous.’ So what? Here’s what: they receive massive amounts of support for their research, absorbing funds that might be better used by others. As an example, one would-be star boasted to me that he’d never send a paper from his lab to anywhere but Nature, Cell, or Science. These submissions always get a foot in the door, at least. And they are nearly always published in one of those magazines — where, when you see something you know about, you realize that it’s not always so great.

Authorea has published a list of eight research papers that were initially rejected, but which ended up being rewarded by a Nobel prize. This is an illustration of the fact that it is very difficult for even the best experts to recognize the significance of some work as it happens.

It appears that under the Trump presidency, the Food and Drug Administration (FDA) has been approving new drugs at twice the “normal” rate.

Google will commercialize headphones that should allow you to understand (through just-in-time translation) over 40 different languages. The pixel buds will sell for under $200.

My wife asked me, the other day, whether people in China used hyperlinks made of Chinese characters. I assumed so. It turns out that the answer involves something called punycode which is a way to encode arbitrary characters as ASCII (English-only) characters.

Breast cancer is less letal:

From 1989 to 2015, breast cancer death rates decreased by 39%, which translates to 322,600 averted breast cancer deaths in the United States.

To be clear, we are still very far from having a breast-cancer cure, let alone a cancer cure.

On a related note, over half of the new cancer drugs approved in recent years do not improve your odds of surviving cancer. What happens, apparently, is that drugs are approved on the basis of narrow measures that may or may not translate into concrete health benefits.

How likely is a medical professional to prescribe unnecessary therapies? More so than we’d like:

We find that the patient receives an overtreatment recommendation in more than every fourth visit.

One of the effect of aging is that our telomeres get shorter. With every cell division, this non-coding piece of your DNA gets shorter and shorter. At the margin, this may affect the ability of your tissues to regenerate. TRF1 protects your telomeres, and it appears that having more of it could be helpful:

We further show that increasing TRF1 expression in both adult (1-year-old) and old (2-year-old) mice using gene therapy can delay age-associated pathologies.

A common theme in politics these days is “inequality”: some people are getting richer while others are not. In turn, this inequality can be related to a falling share of income that goes toward salaries. That is, a society where most of the wealth is distributed as salaries tends to be more “equal”. Right now, it appears that a lot of wealth goes into property values in a highly desirable areas, for example. Since the poor do not own buildings in Manhattan, they do not benefit from this kind of wealth distribution. So why aren’t we getting larger salaries? Researchers for Harvard, Princeton and the London School of Economics believe that this fall could be explained by our low productivity. Of course, even if they are correct, this just leads us to another question: why aren’t we getting more productive?

In related news, some scholars from Stanford believe that research productivity is way down. Apparently, new good ideas are getting harder to find. From my corner of the world (software), this looks like an incredible claim. I cannot even superficially keep up with even a narrow subset of the software industry. There are significant advances left and right… too much for my little brain. Speaking for myself, I certainly have no shortage of what appears to me to be good ideas. I am mostly limited by my ability to find the energy to pursue them… and by the fact that I want to spend quality time with my family. I cannot believe that all the researchers, many much smarter than I’ll ever be, are finding new ideas harder to find.

Baboons can learn to recognize English words:

It has recently been shown that Guinea baboons can be trained to discriminate between four-letter words (e.g., TORE, WEND, BOOR, TARE, KRIS) and nonwords (e.g., EFTD, ULKH, ULNX, IMMF) simply by rewarding them for correct lexicality decisions. The number of words learned by the six baboons ranged from 81 to 307 (after some 60,000 trials), and they were reported to respond correctly to both novel words and nonwords with above chance performance.

It looks like the company that is reinventing the taxi industry, Uber, might pull out from Quebec, Canada. At issue is the requirement to having several days of training before one can act as a taxi driver.

Worms live longer at lower temperatures due to different gene expressions.

My iPad Pro experiment

Years ago, I placed a bet with Greg Linden, predicting that tablets like the iPad would replace PCs. The bet did not end well for me. My own analysis is that I lost the bet primarily because I failed to foresee the market surge of expensive and (relatively) large smartphones.

Though I lost to Greg, there is no denying it, I don’t think that I was wrong regarding the fundamental trends. So, during the third quarter of 2017, Apple sold 4.3 million laptops. That’s not bad. But Apple sold 11.4 million iPads, nearly three times as many. The real story, of course, is that Apple sold over 40 million iPhones, and a large fraction of these iPhone have been top-of-the-line iPhones.

For comparison, PC shipments worldwide are at around 60 million. So 11.4 million iPads is nothing to sneeze at, but it is no PC killer. About 40 million tablets are sold every quarter. The fight between PCs and tablets has been not been very conclusive so far. Though tablet sales have stagnated, and even diminished, many PCs look a lot like tablets. A fair assessment is that we are currently at a draw.

This is somewhat more optimistic than Greg’s prediction from 2011:

I am not saying that tablets will not see some sales to some audience. What I am saying is that, in the next several years, the audience for tablets is limited, tablet sales will soon stall around the same level where netbook sales stalled, and the PC is under no threat from tablets.

Who even remember what a netbook is today?

In any case, we are not yet at the point where people are dumping their PCs (or MacBooks) in favor of tablets. I would argue that people probably use PCs a lot less than they used them in the past, relatively speaking, because they do more work on their smartphones. I tend to process much of my emails in my smartphone.

But PCs are still around.

I’m not happy about this.

Even though I have had a iPad ever since Apple made them, I never tried to make the switch professionally. This summer, I finally did so. My 12-inch iPad pro has been my primary machine for the last couple of months. I got an Apple keyboard as well as a pencil.

Let me establish the context a bit. I am a university professor and department chair. I have an active research program, with graduate students. I write code using various programming languages. I write papers using LaTeX. I have to do lots of random office work.

Here are my thoughts so far:

  • It works. You can use an iPad as a primary machine. This was not at all obvious to me when I started out.
  • It creates envy. Several people who watch me work decide to give it a try. This is a sign that I look productive and happy on my tablet.
  • Some of the worse experience I have is with email. Simply put, I cannot quickly select a large chunk of text (e.g., 500 lines) and delete it. Selecting text on an iPad is infuriating. It works well when you want to select a word or two, but there seems to be no way to select a large chunk. Why is this a problem? Because when replying to emails, I keep my answers short, so I want to delete most if not all of the original message and quote just the necessary passage.
  • The pain that is selecting text affects pretty much all applications where text is involved.
  • Copy and paste is unnecessarily difficult. I don’t know how to select just the text without formatting. Sometimes I end up selecting the link instead of the text related to the link.
  • Microsoft Office works on an iPad. I am not a heavy user, but for what I need to do, it is fine.
  • Apple has its own word processor called “Pages”. It works, but it won’t spell check in French (it does on a MacBook).
  • The hardware is nice, but more finicky than it should be. Both the pencil and the foldable keyboards tend to disconnect. The keyboard can be frustrating as it is held by magnets, but if you move the iPad around, the magnets might move and the keyboard might disconnect. It is not clear how to reconnect it systematically, so I end up “fiddling with it”. It is not as bad as I make it sound, and I don’t think anyone has ever been witness to my troubles, but they would see a very frustrated man.
  • Cloud computing is your friend. Dropbox, iCloud, Google Drive…
  • Reviewing PDF documents is nice. I use an app called “PDF Expert” which allows me to comment on the documents very handily.
  • While the iPad can multitask, I have not yet figured out how to put this ability to good use.
  • My employer expects me to use Java applets to fill out some forms. I can barely do it with my MacBook. It is a no go on the iPad.
  • Blogging works. I am using my iPad right now. However, it is not obvious how to do grammar and spell checks while typing within a web app. So I am making more mistakes than I should.
  • LaTeX works. I am using an app called TeXPad. It cannot build my documents locally, but it works once I tell it to use a cloud engine. I am also thinking that Overleaf could be solution. However, neither TeXPad on iOS nor Overleaf provide a great experience when using LaTeX. To be fair, LaTeX is hardly user friendly in the best of times.
  • iPads are not designed as developer machines. If you want to experiment with Swift, it is fairly easy to create “Swift playgrounds”, but that’s mostly for the purpose of learning the language. However, I am getting a good experience using ssh to connect to remote Linux boxes.

So my workflow is currently something of a hybrid. I have a cheap MacBook (not a MacBook pro!) that I use maybe 10% of the time. The rest of the time, I rely on the iPad.

Why do this? It is an experiment. So far, it has been instructive.

So what are the benefits? My impression is that replacing a laptop with a tablet makes me more productive at some things. For example, I spend more time reading on my iPad than I would on my laptop.

Stream VByte: first independent assessment

In an earlier post, I announced Stream VByte, claiming that it was very fast. Our paper was peer reviewed (Information Processing Letters) and we shared our code.

Still, as Feynman said, science is the belief in the ignorance of experts. It is not because I am an expert that you should trust me.

There is a high-quality C++ framework to build search engines called Trinity. Its author, Mark Papadakis, decided to take Stream VByte out for a spin to see how well it does. Here is what Mark had to say:

As you can see, Stream VByte is over 30% faster than the second fastest, FastPFOR in decoding, where it matters the most, and also the fastest among the 3 codecs in encoding (though not by much). On the flip side, the index generated is larger than the other two codecs, though not by much (17% or so larger than the smallest index generated when FastPFOR is selected).
This is quite an impressive improvement in terms of query execution time, which is almost entirely dominated by postings list access time (i.e integers decoding speed).

I was pleased that Mark found the encoding to be fast: we have not optimized this part of the implementation at all… because everyone keeps telling me that encoding speed is irrelevant. So it is “accidentally fast”. It should be possible to make it much, much faster.

Mark points out that Stream VByte does not quite compress as well, in terms of compression ratios, than other competitive alternatives. That’s to be expected because Stream VByte is a byte-oriented format, not a bit-oriented format. However, Stream VByte really shines with speed and engineering convenience.

How smart is Swift with abstraction? A trivial experiment with protocols

Apple’s Swift programming language has the notion of “protocol” which is similar to an interface in Java or Go. So we can define a protocol that has a single function.

public protocol Getter {
 func get(_ index : Int) -> Int
}

We need to define at least one class that has the prescribed “get” method. For good measure, I will define two of them.

public final class Trivial1 : Getter {
  public func get(_ index : Int) -> Int {
    return 1
  }
}
public final class Trivial7 : Getter {
  public func get(_ index : Int) -> Int {
    return 7
  }
}

If you are familiar with Java, this should look very familiar.

Then we can define functions that operate on the new protocol. Let us sum 100 values:

public func sum(_ g : Getter) -> Int {
  var s = 0
  for i in 1...100 {
     s += g.get(i)
  }
  return s
}

Clearly, there are possible optimizations with the simple implementations I have designed. Is Swift smart enough?

Let us put it to the test:

public func sum17(_ g1 : Trivial1, _ g7 : Trivial7) 
            -> Int {
  return sum(g1) + sum(g7)
}

This compiles down to

  mov eax, 800

That is, Swift is smart enough to figure out, at a compile time, the answer.

To be clear, this is exactly what you want to happen. Anything less would be disappointing. This is no better than C and C++.

Still, we should never take anything for granted as programmers.

What is nice, also, is that you can verify this answer with a trivial script:

wget https://raw.githubusercontent.com/lemire/SwiftDisas/master/swiftdisas.py
swiftc -O prot.swift
python swiftdisas.py prot sum17

Compared to Java where code disassembly requires praying for the gods under a full moon, this is really nice.

Science and Technology links (September 29th, 2017)

Elon Musk presented his plans for space exploration. It is pretty close to science fiction (right out of Star Trek) with the exception that Musk has a track record of getting things done (e.g., Tesla).

In the US, women are doing much better in college than men:

Women earned majority of doctoral degrees in 2016 for 8th straight year and outnumber men in grad school 135 to 100.

Things are even more uneven in the Middle East:

At the University of Jordan, the country’s largest university, women outnumber men by a ratio of two to one—and earn higher grades in math, engineering, computer-information systems, and a range of other subjects.

Things are pretty grim on the education front in my home town:

Last year alone, only 40.6 percent of the boys (…) at the French-language Commission scolaire de Montréal graduated in five years.

Our local (Montreal) deep-learning star, Yoshia Bengio, called for the breakup or regulation of tech leaders:

We need to create a more level playing field for people and companies, (…) AI is a technology that naturally lends itself to a winner take all, the country and company that dominates the technology will gain more power with time. More data and a larger customer base give you an advantage that is hard to dislodge. Scientists want to go to the best places. The company with the best research labs will attract the best talent. It becomes a concentration of wealth and power.

Somewhat ironically, Bengio has created a successful start-up called Element AI in Montreal that is very generously funded.

There is a widely held view that “peer-reviewed research”, the kind that appears in scientific journals, can be trusted. Should you make serious decisions on the assumption that research is correct? You should not. For anything that matters, you should recheck everything:

Empirical social science research—or at least non-experimental social science research—should not be taken at face value. Among three dozen studies I reviewed, I obtained or reconstructed the data and code for eight. Replication and reanalysis revealed significant methodological concerns in seven and led to major reinterpretations of four. These studies endured much tougher scrutiny from me than they did from peer reviewers in order to make it into academic journals. Yet given the stakes in lives and dollars, the added scrutiny was worth it. So from the point of view of decision-makers who rely on academic research, today’s peer review processes fall well short of the optimal.

What people fail to appreciate, even people who should know better is that peer review mostly involves reading (sometimes quickly) what an author has written on a topic. It is enough to, sometimes, catch the most obvious errors. However, there is no way that I, as a referee, can catch methodological errors deep down in the data processing. And even if I could verify the results, I cannot very well fight against people who are not being entirely honest.

Processor speeds increase over time. Though it is true that the likes of Intel have had trouble easily milking more performance out of new engineering processes, there are regular gains. Certainly, Apple has had no trouble making iPhones faster, year after year. But what about memory? Memory is also getting faster. The new DDR4 standard memory can be about 50% faster than the previous standard DDR3. That’s pretty good. However, this gain is misleading because it only factors in “throughput” (how fast you can read data). It is true that planes are much faster than cars, but it takes a long time to get on the plane, and planes don’t necessarily land or takeoff every minute. So planes are fast, but they have a high latency. Memory is getting faster, but latency is a constant:

At this point in the discussion, we need to note that when we say “true latencies are remaining roughly the same,” we mean that from DDR3-1333 to DDR4-2666 (the span of modern memory), true latencies started at 13.5ns and returned to 13.5ns. While there are several instances in this range where true latencies increased, the gains have been by fractions of a nanosecond. In this same span, speeds have increased by over 1,300 MT/s, effectively offsetting any trace latency gains.

This means that though we can move a lot of data around, the minimal delay between the time when you request the data, and the time you get the data, is remaining the same. You can request more data than before, but if your software does not plan ahead, it will still remain stuck, idle.

The version 9 of the Java language has just been released. There has apparently been a lot of engineering done, but I see little in the way of new features I care about. However, I can happily report that they have deprecated the applet API. This means that by the next release we might finally get rid of Java applets. Meanwhile, my employer still requires me to manage my budget and place orders using a Java applet reliant on some Oracle technology. I find it encouraging to learn that even Oracle admits that Java applets are a thing of the past, better left in the past. (Disclosure: for a time, I paid my bills designing Java applets for medical imaging! It was super fun!)