Science and Technology links (October 6th, 2017)

In 2011, Bollen et al. published a paper with powerful claims entitled Twitter Mood Predicts the Stock Market. The paper has generated a whole academic industry. It has been cited 3000 times, lead to the creation of workshops… and so forth. Lachanski and Pav recently tried to reproduced the original claims:

Constructing multiple measures of Twitter mood using word-count methods and standard sentiment analysis tools, we are unable to reproduce the p-value pattern that they found. We find evidence of a statistically significant Twitter mood effect in their subsample, but not in the backwards extended sample, a result consistent with data snooping. We find no evidence that our measures of Twitter mood aid in predicting the stock market out of sample.

Timothy Gowers (world-famous mathematician) writes about the death and interests of (world-famous mathematician) Vladimir Voevodsky:

(…) in his field there were a number of examples of very serious work, including by him, that turned out to have important mistakes. This led him to think that the current practice of believing in a result if enough other experts believe in it was no longer sustainable.

What is most amazing, to me, is how academics are utterly convinced that their own work is beyond reproach. When you ask: “have you had a competing lab reproduce your experiments?” The answer, invariably, is that it is unnecessary. Yet even mathematicians recognize that they have a serious problem avoiding mistakes. It should be clear that of all of scholarship, mathematicians should have the least significant problems in this respect. Nobody “cheats” in mathematics, as far as the mathematical truth is concerned. Truth is not an ambiguous concept in mathematics, you are right or wrong. Yet leading mathematicians grow concerned that “truth” is hard to assess.

This week, the Nobel prizes for 2017 were awarded. No woman received a Nobel prize. At a glance, it looks like caucasian scholars dominate. No Japanese, no Chinese, no Korean researcher that I can see. On a related note, Japan seems to be losing its status as a research power. (So we are clear, I do not believe that caucasian men make better researchers as far as genetics is concerned.)

One of the winners of the medicine Nobel prize, Jeffrey Hall, is quite critical of the research establishment:

I can’t help feel that some of these ‘stars’ have not really earned their status. I wonder whether certain such anointees are ‘famous because they’re famous.’ So what? Here’s what: they receive massive amounts of support for their research, absorbing funds that might be better used by others. As an example, one would-be star boasted to me that he’d never send a paper from his lab to anywhere but Nature, Cell, or Science. These submissions always get a foot in the door, at least. And they are nearly always published in one of those magazines — where, when you see something you know about, you realize that it’s not always so great.

Authorea has published a list of eight research papers that were initially rejected, but which ended up being rewarded by a Nobel prize. This is an illustration of the fact that it is very difficult for even the best experts to recognize the significance of some work as it happens.

It appears that under the Trump presidency, the Food and Drug Administration (FDA) has been approving new drugs at twice the “normal” rate.

Google will commercialize headphones that should allow you to understand (through just-in-time translation) over 40 different languages. The pixel buds will sell for under $200.

My wife asked me, the other day, whether people in China used hyperlinks made of Chinese characters. I assumed so. It turns out that the answer involves something called punycode which is a way to encode arbitrary characters as ASCII (English-only) characters.

Breast cancer is less letal:

From 1989 to 2015, breast cancer death rates decreased by 39%, which translates to 322,600 averted breast cancer deaths in the United States.

To be clear, we are still very far from having a breast-cancer cure, let alone a cancer cure.

On a related note, over half of the new cancer drugs approved in recent years do not improve your odds of surviving cancer. What happens, apparently, is that drugs are approved on the basis of narrow measures that may or may not translate into concrete health benefits.

How likely is a medical professional to prescribe unnecessary therapies? More so than we’d like:

We find that the patient receives an overtreatment recommendation in more than every fourth visit.

One of the effect of aging is that our telomeres get shorter. With every cell division, this non-coding piece of your DNA gets shorter and shorter. At the margin, this may affect the ability of your tissues to regenerate. TRF1 protects your telomeres, and it appears that having more of it could be helpful:

We further show that increasing TRF1 expression in both adult (1-year-old) and old (2-year-old) mice using gene therapy can delay age-associated pathologies.

A common theme in politics these days is “inequality”: some people are getting richer while others are not. In turn, this inequality can be related to a falling share of income that goes toward salaries. That is, a society where most of the wealth is distributed as salaries tends to be more “equal”. Right now, it appears that a lot of wealth goes into property values in a highly desirable areas, for example. Since the poor do not own buildings in Manhattan, they do not benefit from this kind of wealth distribution. So why aren’t we getting larger salaries? Researchers for Harvard, Princeton and the London School of Economics believe that this fall could be explained by our low productivity. Of course, even if they are correct, this just leads us to another question: why aren’t we getting more productive?

In related news, some scholars from Stanford believe that research productivity is way down. Apparently, new good ideas are getting harder to find. From my corner of the world (software), this looks like an incredible claim. I cannot even superficially keep up with even a narrow subset of the software industry. There are significant advances left and right… too much for my little brain. Speaking for myself, I certainly have no shortage of what appears to me to be good ideas. I am mostly limited by my ability to find the energy to pursue them… and by the fact that I want to spend quality time with my family. I cannot believe that all the researchers, many much smarter than I’ll ever be, are finding new ideas harder to find.

Baboons can learn to recognize English words:

It has recently been shown that Guinea baboons can be trained to discriminate between four-letter words (e.g., TORE, WEND, BOOR, TARE, KRIS) and nonwords (e.g., EFTD, ULKH, ULNX, IMMF) simply by rewarding them for correct lexicality decisions. The number of words learned by the six baboons ranged from 81 to 307 (after some 60,000 trials), and they were reported to respond correctly to both novel words and nonwords with above chance performance.

It looks like the company that is reinventing the taxi industry, Uber, might pull out from Quebec, Canada. At issue is the requirement to having several days of training before one can act as a taxi driver.

Worms live longer at lower temperatures due to different gene expressions.

My iPad Pro experiment

Years ago, I placed a bet with Greg Linden, predicting that tablets like the iPad would replace PCs. The bet did not end well for me. My own analysis is that I lost the bet primarily because I failed to foresee the market surge of expensive and (relatively) large smartphones.

Though I lost to Greg, there is no denying it, I don’t think that I was wrong regarding the fundamental trends. So, during the third quarter of 2017, Apple sold 4.3 million laptops. That’s not bad. But Apple sold 11.4 million iPads, nearly three times as many. The real story, of course, is that Apple sold over 40 million iPhones, and a large fraction of these iPhone have been top-of-the-line iPhones.

For comparison, PC shipments worldwide are at around 60 million. So 11.4 million iPads is nothing to sneeze at, but it is no PC killer. About 40 million tablets are sold every quarter. The fight between PCs and tablets has been not been very conclusive so far. Though tablet sales have stagnated, and even diminished, many PCs look a lot like tablets. A fair assessment is that we are currently at a draw.

This is somewhat more optimistic than Greg’s prediction from 2011:

I am not saying that tablets will not see some sales to some audience. What I am saying is that, in the next several years, the audience for tablets is limited, tablet sales will soon stall around the same level where netbook sales stalled, and the PC is under no threat from tablets.

Who even remember what a netbook is today?

In any case, we are not yet at the point where people are dumping their PCs (or MacBooks) in favor of tablets. I would argue that people probably use PCs a lot less than they used them in the past, relatively speaking, because they do more work on their smartphones. I tend to process much of my emails in my smartphone.

But PCs are still around.

I’m not happy about this.

Even though I have had a iPad ever since Apple made them, I never tried to make the switch professionally. This summer, I finally did so. My 12-inch iPad pro has been my primary machine for the last couple of months. I got an Apple keyboard as well as a pencil.

Let me establish the context a bit. I am a university professor and department chair. I have an active research program, with graduate students. I write code using various programming languages. I write papers using LaTeX. I have to do lots of random office work.

Here are my thoughts so far:

  • It works. You can use an iPad as a primary machine. This was not at all obvious to me when I started out.
  • It creates envy. Several people who watch me work decide to give it a try. This is a sign that I look productive and happy on my tablet.
  • Some of the worse experience I have is with email. Simply put, I cannot quickly select a large chunk of text (e.g., 500 lines) and delete it. Selecting text on an iPad is infuriating. It works well when you want to select a word or two, but there seems to be no way to select a large chunk. Why is this a problem? Because when replying to emails, I keep my answers short, so I want to delete most if not all of the original message and quote just the necessary passage.
  • The pain that is selecting text affects pretty much all applications where text is involved.
  • Copy and paste is unnecessarily difficult. I don’t know how to select just the text without formatting. Sometimes I end up selecting the link instead of the text related to the link.
  • Microsoft Office works on an iPad. I am not a heavy user, but for what I need to do, it is fine.
  • Apple has its own word processor called “Pages”. It works, but it won’t spell check in French (it does on a MacBook).
  • The hardware is nice, but more finicky than it should be. Both the pencil and the foldable keyboards tend to disconnect. The keyboard can be frustrating as it is held by magnets, but if you move the iPad around, the magnets might move and the keyboard might disconnect. It is not clear how to reconnect it systematically, so I end up “fiddling with it”. It is not as bad as I make it sound, and I don’t think anyone has ever been witness to my troubles, but they would see a very frustrated man.
  • Cloud computing is your friend. Dropbox, iCloud, Google Drive…
  • Reviewing PDF documents is nice. I use an app called “PDF Expert” which allows me to comment on the documents very handily.
  • While the iPad can multitask, I have not yet figured out how to put this ability to good use.
  • My employer expects me to use Java applets to fill out some forms. I can barely do it with my MacBook. It is a no go on the iPad.
  • Blogging works. I am using my iPad right now. However, it is not obvious how to do grammar and spell checks while typing within a web app. So I am making more mistakes than I should.
  • LaTeX works. I am using an app called TeXPad. It cannot build my documents locally, but it works once I tell it to use a cloud engine. I am also thinking that Overleaf could be solution. However, neither TeXPad on iOS nor Overleaf provide a great experience when using LaTeX. To be fair, LaTeX is hardly user friendly in the best of times.
  • iPads are not designed as developer machines. If you want to experiment with Swift, it is fairly easy to create “Swift playgrounds”, but that’s mostly for the purpose of learning the language. However, I am getting a good experience using ssh to connect to remote Linux boxes.

So my workflow is currently something of a hybrid. I have a cheap MacBook (not a MacBook pro!) that I use maybe 10% of the time. The rest of the time, I rely on the iPad.

Why do this? It is an experiment. So far, it has been instructive.

So what are the benefits? My impression is that replacing a laptop with a tablet makes me more productive at some things. For example, I spend more time reading on my iPad than I would on my laptop.

Stream VByte: first independent assessment

In an earlier post, I announced Stream VByte, claiming that it was very fast. Our paper was peer reviewed (Information Processing Letters) and we shared our code.

Still, as Feynman said, science is the belief in the ignorance of experts. It is not because I am an expert that you should trust me.

There is a high-quality C++ framework to build search engines called Trinity. Its author, Mark Papadakis, decided to take Stream VByte out for a spin to see how well it does. Here is what Mark had to say:

As you can see, Stream VByte is over 30% faster than the second fastest, FastPFOR in decoding, where it matters the most, and also the fastest among the 3 codecs in encoding (though not by much). On the flip side, the index generated is larger than the other two codecs, though not by much (17% or so larger than the smallest index generated when FastPFOR is selected).
This is quite an impressive improvement in terms of query execution time, which is almost entirely dominated by postings list access time (i.e integers decoding speed).

I was pleased that Mark found the encoding to be fast: we have not optimized this part of the implementation at all… because everyone keeps telling me that encoding speed is irrelevant. So it is “accidentally fast”. It should be possible to make it much, much faster.

Mark points out that Stream VByte does not quite compress as well, in terms of compression ratios, than other competitive alternatives. That’s to be expected because Stream VByte is a byte-oriented format, not a bit-oriented format. However, Stream VByte really shines with speed and engineering convenience.

How smart is Swift with abstraction? A trivial experiment with protocols

Apple’s Swift programming language has the notion of “protocol” which is similar to an interface in Java or Go. So we can define a protocol that has a single function.

public protocol Getter {
 func get(_ index : Int) -> Int
}

We need to define at least one class that has the prescribed “get” method. For good measure, I will define two of them.

public final class Trivial1 : Getter {
  public func get(_ index : Int) -> Int {
    return 1
  }
}
public final class Trivial7 : Getter {
  public func get(_ index : Int) -> Int {
    return 7
  }
}

If you are familiar with Java, this should look very familiar.

Then we can define functions that operate on the new protocol. Let us sum 100 values:

public func sum(_ g : Getter) -> Int {
  var s = 0
  for i in 1...100 {
     s += g.get(i)
  }
  return s
}

Clearly, there are possible optimizations with the simple implementations I have designed. Is Swift smart enough?

Let us put it to the test:

public func sum17(_ g1 : Trivial1, _ g7 : Trivial7) 
            -> Int {
  return sum(g1) + sum(g7)
}

This compiles down to

  mov eax, 800

That is, Swift is smart enough to figure out, at a compile time, the answer.

To be clear, this is exactly what you want to happen. Anything less would be disappointing. This is no better than C and C++.

Still, we should never take anything for granted as programmers.

What is nice, also, is that you can verify this answer with a trivial script:

wget https://raw.githubusercontent.com/lemire/SwiftDisas/master/swiftdisas.py
swiftc -O prot.swift
python swiftdisas.py prot sum17

Compared to Java where code disassembly requires praying for the gods under a full moon, this is really nice.

Science and Technology links (September 29th, 2017)

Elon Musk presented his plans for space exploration. It is pretty close to science fiction (right out of Star Trek) with the exception that Musk has a track record of getting things done (e.g., Tesla).

In the US, women are doing much better in college than men:

Women earned majority of doctoral degrees in 2016 for 8th straight year and outnumber men in grad school 135 to 100.

Things are even more uneven in the Middle East:

At the University of Jordan, the country’s largest university, women outnumber men by a ratio of two to one—and earn higher grades in math, engineering, computer-information systems, and a range of other subjects.

Things are pretty grim on the education front in my home town:

Last year alone, only 40.6 percent of the boys (…) at the French-language Commission scolaire de Montréal graduated in five years.

Our local (Montreal) deep-learning star, Yoshia Bengio, called for the breakup or regulation of tech leaders:

We need to create a more level playing field for people and companies, (…) AI is a technology that naturally lends itself to a winner take all, the country and company that dominates the technology will gain more power with time. More data and a larger customer base give you an advantage that is hard to dislodge. Scientists want to go to the best places. The company with the best research labs will attract the best talent. It becomes a concentration of wealth and power.

Somewhat ironically, Bengio has created a successful start-up called Element AI in Montreal that is very generously funded.

There is a widely held view that “peer-reviewed research”, the kind that appears in scientific journals, can be trusted. Should you make serious decisions on the assumption that research is correct? You should not. For anything that matters, you should recheck everything:

Empirical social science research—or at least non-experimental social science research—should not be taken at face value. Among three dozen studies I reviewed, I obtained or reconstructed the data and code for eight. Replication and reanalysis revealed significant methodological concerns in seven and led to major reinterpretations of four. These studies endured much tougher scrutiny from me than they did from peer reviewers in order to make it into academic journals. Yet given the stakes in lives and dollars, the added scrutiny was worth it. So from the point of view of decision-makers who rely on academic research, today’s peer review processes fall well short of the optimal.

What people fail to appreciate, even people who should know better is that peer review mostly involves reading (sometimes quickly) what an author has written on a topic. It is enough to, sometimes, catch the most obvious errors. However, there is no way that I, as a referee, can catch methodological errors deep down in the data processing. And even if I could verify the results, I cannot very well fight against people who are not being entirely honest.

Processor speeds increase over time. Though it is true that the likes of Intel have had trouble easily milking more performance out of new engineering processes, there are regular gains. Certainly, Apple has had no trouble making iPhones faster, year after year. But what about memory? Memory is also getting faster. The new DDR4 standard memory can be about 50% faster than the previous standard DDR3. That’s pretty good. However, this gain is misleading because it only factors in “throughput” (how fast you can read data). It is true that planes are much faster than cars, but it takes a long time to get on the plane, and planes don’t necessarily land or takeoff every minute. So planes are fast, but they have a high latency. Memory is getting faster, but latency is a constant:

At this point in the discussion, we need to note that when we say “true latencies are remaining roughly the same,” we mean that from DDR3-1333 to DDR4-2666 (the span of modern memory), true latencies started at 13.5ns and returned to 13.5ns. While there are several instances in this range where true latencies increased, the gains have been by fractions of a nanosecond. In this same span, speeds have increased by over 1,300 MT/s, effectively offsetting any trace latency gains.

This means that though we can move a lot of data around, the minimal delay between the time when you request the data, and the time you get the data, is remaining the same. You can request more data than before, but if your software does not plan ahead, it will still remain stuck, idle.

The version 9 of the Java language has just been released. There has apparently been a lot of engineering done, but I see little in the way of new features I care about. However, I can happily report that they have deprecated the applet API. This means that by the next release we might finally get rid of Java applets. Meanwhile, my employer still requires me to manage my budget and place orders using a Java applet reliant on some Oracle technology. I find it encouraging to learn that even Oracle admits that Java applets are a thing of the past, better left in the past. (Disclosure: for a time, I paid my bills designing Java applets for medical imaging! It was super fun!)

Stream VByte: breaking new speed records for integer compression

In many information systems, we work with arrays of integers. For example, maybe you need to keep track of which records in a database contain a given value. This soft of mapping can be expressed as an array of integers.

These arrays can end up taking up a large fraction of your memory or disk space. Though storage is cheap, having too much data can hurt performance. And, at the margin, storage is not always cheap: disks fail and need to be replaced, a labor-intensive process. Adding more hardware to your system adds to the running costs and to the complexity.

So we often want to compress this data. However, our primary motivation for compression is often performance and engineering, rather than saving space per se. We want compression techniques that will enhance our overall performance and reduce cost… Getting the absolutely best compression can be a net negative if you end up with a slower system.

One should have some notion of computer architecture when dealing with these problems. So we have disks and networks from which data flow into CPU cache and memory (RAM). Commonly used data is typically stored in memory (RAM) for fast reuse, but it needs to be brought back to CPU cache before it can be processed. Moving data from CPU cache to memory (RAM) is a performance bottleneck, so you want to do as little of it as you can.

If you are doing data processing and your data is constantly moving from memory and cache and back to memory and back to cache… You are probably going to end up in a state where your CPU is wasting most of its cycles waiting for data. You do not want that.

That is where compression can help: it can be faster to bring data from RAM in compressed form and uncompress it in the cache than it can be to just copy uncompressed data from RAM to CPU cache.

So, we are not uncompressing from the disk to the disk, or even from the RAM to the RAM. In an ideal world, the important data is in RAM and we want to bring it to the CPU cache where it will be uncompressed and processed. At no point do you want to bring back the uncompressed data to RAM, as that would be wasteful.

We also prefer simple data structures and simple algorithms. This gives us more room, as engineers, to write optimized code that operates directly on the data, without relying on black-box functions.

So how do you compress integers efficiently? Most people think of something like zip or gzip. If they are somewhat more sophisticated, they think of Snappy or Zstandard.

This all works, but these techniques do not know that you are compressing integers. So you end up calling lots of code and have at least an order of magnitude less performance than you could have.

If you are really performance conscious, you are probably going to use specialized compression techniques dedicated to integers.

These techniques work with the assumption that even though programming languages dedicated a fixed 32 bits or a fixed 64 bits for each integer, there are many instances where far fewer bits would be required. That is, you can often assume that your arrays of integers all contain small integers.

What? I hear you say that you cannot assume that your identifiers are all small integers? Ah. That’s where a nifty trick comes in: though you cannot always assume that your integers are small, you can often assume that they are sorted. And the successive differences between these sorted integers are often small.

So you can either compress arrays containing small integers, or the successive differences of small integers. From the successive differences, you can reconstruct the original integers by computing a prefix sum: if you have y[0] = x[0], y[1] = x[1] - x[0], y[2] = x[2] - x[1],... you can compute y[0], y[0] + y[1], y[1] + y[2],... to recover x[0], x[1], x[2],... There are fast techniques to compute a prefix sum and, better, you can embed it within the uncompression routines so that it is all done within registers.

So how do you compress and uncompress integers? The most widespread technique is probably VByte (also known as VarInt, varint, escaping and so forth). It is what search engines like Lucene rely upon. It is also part of Google’s Protocol Buffers. And it is natively supported in the Go programming language under the name varint.

VByte is a “byte-oriented scheme”. What it does is to first try to store a given integer value using a single byte. If it cannot, it uses two bytes, and so forth. Within each byte, we reserve the most significant bit as a control bit: the bit is set to 0 when the byte is the last one in a coded integer, otherwise, it is set to 1.

This is pretty good, and it can be super fast, as long as your integers all fit in one byte. Once you start having integers that require different numbers of bytes, the performance goes down fast, due to branch mispredictions.

So what do we do?

  • According to Google’s top engineer, Jeff Dean, Google replaced VByte with something called varint-GB. The idea is simply to pack integers in blocks of four. In this manner, you can hope to drastically reduce the number of mispredicted branches.
  • Amazon’s Stepanov (who also invented C++’s STL) and his collaborators looked at vectorizing VByte. The idea is to replace the scalar instructions with vector instructions that operate on several integers at once. All modern processors have vector instructions, including the processor in your smartphone, so it makes sense.

    They failed to get good performance out of a vectorized VByte. The also failed to get good performance out of a vectorized version of varint-GB, but they did very well with their own alternative (varint-G8IU) that they patented. What varint-G8IU does is to try to pack as many integers as they can into a block having a fixed size (e.g., 8 bytes).

    Their varint-G8IU works well because it is super convenient for the processor to decode a fixed number of bytes at each iteration.

  • One of Indeed.com’s top engineers, Jeff Plaisance, wondered whether we could vectorize VByte efficiently despite what Stepanov et al. found? The benefit then is that you can have super fast decoding without having to change your data format.

    That turns out to be non-trivial engineering, but it can be done. I co-authored a scheme called Masked VByte with Jeff Plaisance and Nathan Kurz. It is not as fast as Amazon’s patented varint-G8IU, but it is much faster than good old VByte.

This left us at a stage where the fastest byte-oriented integer scheme was Stepanov et al.’s varint-G8IU algorithm. I found it tempting to ask whether we could do better. One of my motivations was that varint-G8IU being patented, it cannot be safely used in many open source projects. Yet many big-data applications are reliant on open-source software (e.g., Apache Spark). My friend Nathan Kurz came up with a pretty good design that we call Stream VByte that is not only patent-free but also faster.

We first have to understand why Stepanov et al. got poor performance out of a vectorized version of Jeff Dean’s varint-GB. We have that varint-GB stores data into chunks of four compressed integers, but these chunks have variable size. It is made of one control byte with a given number of “data bytes”. Thus you have to nearly fully decode one chunk before you can even know where the next chunk starts. This creates a bad data dependency that makes it very hard to accelerate varint-GB with fancy instructions.

Our processors are superscalar, so they can do many things at once (many instructions per CPU cycle). But for this to work, we need to keep them fed. Data dependencies stop that. They stall the processors.

So what we did was to reorder the data. Instead of interleaving the data bytes and the control bytes, Stream VByte stores the control bytes continuously, and the data bytes in a separate location. This means that you can decode more than one control byte at a time if needed. This helps keep the processor fully occupied.

So how well do we do? Let me quote from our paper:

We show that Stream VByte decoding can be up to twice as fast as varint-G8IU decoding over real data sets. In this sense, Stream VByteE establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4 GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache.

The paper was also co-authored with Christoph Rupp, the architect of the fast upscaledb key-value store.

I think our paper is nice and readable. It has been accepted for publication in Information Processing Letters, a pretty good journal for this sort of short communications.

What about the code? We have a software package written in C under a liberal license.

I should mention that unlike Amazon, we did not patent our approach. We want you to use this in your commercial and open-source projects!

How complicated is the code? Here is the gist of it using SSE instrinsics:

uint8_t C = lengthTable[control]; // C is between 4 and 16 
 __m128i Data = _mm_loadu_si128((__m128i *) databytes);
 __m128i Shuf = _mm_loadu_si128(shuffleTable[control]);
 Data = _mm_shuffle_epi8(Data, Shuf); // final decoded data  
 datasource += C;

If you are not familiar with intrinsics, this code might look intimidating, but it is really just a few simple and cheap instructions.

I should stress that if you are stuck with a machine without vector instructions, or using a programming language that does not support vector instructions, then it is not hard to code and decode that data efficiently with ordinary (scalar) code. You are never going to get the same performance as with SIMD instructions, but it won’t be terrible.

Future work:

  • The code is currently limited to x64/x86 processors, but we are looking for help to port it over to ARM processors. That should be an interesting project! (Resolved: the library now fully supports ARM processors, including NEON optimizations.)
  • Like varint-GB and varint-G8IU, we are assuming that the uncompressed integers fit in 32-bit words. That’s quite practical, but it would be interesting to extend it to 64-bit integers.
  • We have C code that can be easily used within C++ code, but it would be tempting to extend support to other programming languages like Rust, Swift and so forth. (Update: There is now a Rust port as well as a Go port.)
  • When your integers are signed, even if they are small, our approach does not work as such. But it can be made to work with zigzag encoding.

Runtime constants: Swift

In an earlier post, I reported that if you have a variable in C++ such that the compiler can determine it to be effectively constant… that is, in practice, we cannot change its value, then the C++ optimizing compiler will treat it as if it were an actual constant. The same thing is not true in a language like Go.

Some readers asked about Swift, Apple’s new language.

Let us consider a fair comparison:

fileprivate var length = 10

func sum(array : [Int]) -> Int {
    var answer = 0
    for i in 0..<length {
        answer = answer &+ array[i]
    }
    return answer
}

So we have a function that depends on a variable length that is very specifically not defined as a constant. Swift, like Go and C++, has a notion of a constant, that is, I could write let length = 10. However, I declared it with the var keyword which means that the compiler must assume that I can modify the value of length.

Yet, I defined length to be fileprivate which is the closest thing to C++’s static: it means that if the variable is accessed at all, it must be accessed within the current file (meaning the actual text file containing Swift code). This means that the Swift compiler can examine the current code contained in the file to check whether length is ever modified. In my case, it never is.

So what happens?

Like in C++, it gets compiled to a tight set of additions:

add rax, qword ptr [rdi + 32]
add rax, qword ptr [rdi + 48]
add rax, qword ptr [rdi + 56]
add rax, qword ptr [rdi + 64]
add rax, qword ptr [rdi + 72]
add rax, qword ptr [rdi + 80]
add rax, qword ptr [rdi + 88]
add rax, qword ptr [rdi + 96]
add rax, qword ptr [rdi + 104]

What this means is that the Swift compiler deduced correctly that length was a constant. You do have to turn the optimizations (-O), of course.

But is Swift safe?

Let us do something bad:

var x  = [1, 2, 3, 4, 5, 6, 7, 8, 9];

print(sum(array:x))

My array is too short. In C++, this would lead the sum function to read memory out of bound. If you compiled your code just right, you might get a warning, but the C++ language itself does not do anything to help you.

In Swift, you get the following:

$ swiftc -O mytest.swift
$ ./mytest
(crash)
$

That is, by default Swift relies on bound checking. This means that your program will crash if you try to access memory out-of-bound.

It is important to note that I do mean “crash”. It will not throw a nice exception. Your program just dies a horrible death.

Still, Swift can be used without bound checking:

$ swiftc -Ounchecked mytest.swift
$ ./mytest
45

I got “45” but you could possibly get different values.

That is, with normal optimization levels (typical of a release), you have bound checking and crashing programs. With the crazy optimization level (-Ounchecked), you get no safety.

So the Swift compiler is a state-of-the-art optimizing compiler, at least as far as deducing runtime constants. That’s not surprising, but it is nice to know.

Runtime constants: Go vs. C++

When programming in C++, you can rely on your compiler to do some interesting optimizations. Let us look at this simple code that sums up the values in an array (passed by pointer):

static int length = 10;

uint64_t sum(uint64_t * a) {
  uint64_t s = 0;
  for(int k = 0; k < length; k++)
      s += a[k];
  return s;
}

This code never says that the length variable is a constant. However, the static keyword restricts the scope of the variable. That is, the compiler can examine the “compilation unit” and if no part of the code modifies the variable, then it can be certain that no legal C++ code elsewhere can modify it because it simply does not know about the variable length.

Incredibly, maybe, a compiler like GNU GCC or LLVM’s clang has no problem generating very fast assembly code that makes use of the fact that length can be determined to be a constant:

add     rax, qword ptr [rdi]
add     rax, qword ptr [rdi + 16]
add     rax, qword ptr [rdi + 24]
add     rax, qword ptr [rdi + 32]
add     rax, qword ptr [rdi + 40]
add     rax, qword ptr [rdi + 48]
add     rax, qword ptr [rdi + 56]
add     rax, qword ptr [rdi + 64]
add     rax, qword ptr [rdi + 72]

That is, the compiler recognizes that the function is just the summation of 10 values. There is no need to declare the variable to be a constant.

Of course, for this to happen, you have to turn on optimization flags.

What about Go, the programming language developed within Google and strongly inspired by C?

We can try to do something similar…

package main
import "fmt"

var length int = 10

func Sum(x[]int) int {
    totalx := 0
    for i:= 0; i < length; i++{
        totalx += x[i]
    }
    return totalx
}


func main() {
  var x = make([]int,length)
  fmt.Println(Sum(x))
}

From this code, the Go compiler has enough knowledge to figure out that the loop is over 10 elements. In fact, it could optimize away the main function as just printing out zero. What does it do?

If you examine the assembly (go build && go tool objdump -S -s Sum gotest), you notice that Go is unable or unwilling to do this optimization. It compiles the code “naively” with lots of safety checks.

The Go wiki has a list of compiler optimizations, and there is no such optimization on the list. The list of optimizations might not be complete, but we should not be surprised that the Go compiler can’t make this optimization.

We could help Go somewhat by declaring length to be a constant by using the const keyword. We could expect a slightly more efficient loop. Even so, because Go does not do loop unrolling, the result is not even going to be close to what a C++ compiler can do, even with the const keyword.

It is worth pointing out that Go is ten years old, it is staffed by some of the best and most experienced programmers in the business, it is funded by one of the richest companies in the world. It is, reportedly, a dominant language in cloud computing.

(Before Go advocates come down hard on me, I should add that I like Go. I program in Go. But Go is not C++ or C.)

Benchmarking algorithms to visit all values in an array in random order

In an earlier post, I described how to visit all values in an array in a pseudo-random order quite fast. The idea is simple, given an array of size n, pick a value a that is coprime with n. Then for any value of b, you have that (a x + b ) modulo n will visit all values in [0,n) exactly once as x goes from 0 to n. The word “coprime” is a fancy way of saying that the two numbers do not have a non-trivial (different from 1) common divisor.

I published my Java code.

It is nice because you can visit all values while using very little memory and without changing the order of the values in your array.

After some trivial optimization, the resulting code is as simple as this application of the ternary operator

value = (value + prime < n) ? 
                  value + prime : 
                  value + prime - n;

where we initialize value to a random number in [0,n), and we set prime to be an adequately chosen random number that is coprime with n and not too small.

(The ternary operator x ? y : z just means “if x is true, return y otherwise return z“.)

Though the ternary operator looks like a branch, recent compilers are likely to compile this code, when targeting recent processors, as a conditional move. Though conditional moves used to be slow and generally a bad idea, they are now quite cheap. Very few CPU cycles are needed.

Many readers insisted that I was silly to design a method that fits values in [0,n). All one needs to do is to find the largest power of two larger or equal to n (let us call it 2L), and then visit all of these values, skipping over up to n/2 values in the process.

We have many nice algorithms to visit all values in an interval [0,n) exactly once. One approach is to use a linear congruential generator. (The term linear congruential generator or LCG, is a fancy way to say that you have recursive linear function involving a modulo reduction. There is a bit of non-trivial mathematics from the 1960s involved. Many widely used random number generators rely on an LCG.).

In the end, the code then looks something like this…

lcg = (a * lcg + 1) & (size2 - 1);
while (lcg >= size) {
      lcg = (a * lcg + 1) & (size2 - 1);
}

That is, we move to the next value within [0,2L), and must keep hoping to another value, as long as we are outside of [0,n). In the worst case, we may need to hop n/2 times, so you cannot guarantee that this code has a constant-time complexity.

More critically, this creates mispredicted branches. Mispredicted branches can cost tens of CPU cycles!

So while it looks like the clever and natural approach, padding your sizes to the nearest power of two has potential downsides.

So what is the performance?

I rewrote my code in C because I hate benchmarking in Java. My benchmark takes values from one source array and rewrite them, shuffled, into another array. I record the number of CPU cycles used per cycle. I think that the absolute minimum on an x64 processor would be one cycle, because you cannot store more than one word per cycle. It is an array of integers.

nfast ([0,n))LCG ([0,2L), [0,n))
35004.46.9
24,5004.312.9
171,5004.117.6
1,200,5004.323.1
8,403,5004.132.1

Clearly, my weakly random approach is very fast compared to the power-of-two LCG. Interestingly, my approach is consistently fast as the arrays get larger.

What could be causing such a large difference in performance? Thankfully, Linux makes it very easy to measure the number of mispredicted branches. So we can check what happens in this respect with our fast code when the array size grows…

$ ./run-fast.sh
100
             5,756      branch-misses
1000
             5,550      branch-misses
10000
             6,163      branch-misses
100000
             6,417      branch-misses
1000000
             7,448      branch-misses
10000000
             8,154      branch-misses
100000000
            13,444      branch-misses

So the number of mispredicted branches is tiny and grows very slowly. What about the approach that consists in iterating over values in a power-of-two array, using branches to fall back within [0,n)?

$ ./run.sh
100
             6,584      branch-misses
1000
             6,481      branch-misses
10000
           315,945      branch-misses
100000
         1,724,468      branch-misses
1000000
         2,519,027      branch-misses
10000000
       374,239,031      branch-misses
100000000
     1,873,408,379      branch-misses

It is an entirely different matter. For some large values of n, the processor is unable to accurately predict branches. The performance collapses.

I should add that neither of these techniques is likely to generate unbiased random shuffles. If you work for an online casino, please don’t use these techniques to shuffle cards. They might be good enough, however, if you need to select randomly where some computing jobs should go.

I should add, also, that we could adapt the LCG so that it directly work in the interval [0,n). This could potentially generate better random numbers. Sadly, while there is well established mathematics to help you find the right parameters so that your recurrence formula visits all values exactly once, I am not aware of equally convenient mathematics that you can apply to get “nice randomness”.

Science and Technology links (September 22th, 2017)

Mass is preserved. When you lose fat, where does the fat goes? It has to go somewhere. No, it is not transformed in “energy” or “heat”. Neither “energy” nor “heat” have atomic mass. I asked the question on social networks and most people got the wrong answer. This YouTube video gives the correct answer.

I had been assuming that the US consumption of electricity was on the rise. I have gadgets everywhere around me, these use electricity, right? Actually, no, electricity usage, in absolute value, is stagnant in the US, which means that the per capita usage is down:

Economic growth in advanced economies does not require increased energy consumption. Real GDP rose 12.7% in the U.S. between 2008 and 2015. During the same time period, electric consumption declined by 0.3%.

Climate scientists badly failed to predict CO2 emissions:

Global CO2 emission intensity increased despite all major scenarios projecting a decline.

Influential climate scientists have also revised their predictions:

Computer modelling used a decade ago to predict how quickly global average temperatures would rise may have forecast too much warming, a study has found. (…) The Earth warmed more slowly than the models forecast, meaning the planet has a slightly better chance of meeting the goals set out in the Paris climate agreement, including limiting global warming to 1.5C above pre-industrial levels. (…) The study, published this week in the journal Nature Geoscience, does not play down the threat which climate change has to the environment, and maintains that major reductions in emissions must be attained.(…) But the findings indicate the danger may not be as acute as was previously thought.

So you think you should take your antibiotics even after you feel fine? No. Exposing your body to antibiotics longer than needed is counterproductive, as it helps develop antibiotic resistance. Doctor Brad Spellberg is the chief medical officer for the Los Angeles County, and we writes:

There are some chronic infections, such as tuberculosis, where you do indeed have to take long courses of antibiotics, not to prevent resistance but rather to cure the infection. But for most acute bacterial infections, short courses of antibiotics result in equivalent cure rates and with less chance of causing the emergence of antibiotic resistance among the bacteria in and on your body.

Llewelyn et al. (2017) write:

However, the idea that stopping antibiotic treatment early encourages antibiotic resistance is not supported by evidence, while taking antibiotics for longer than necessary increases the risk of resistance.

(Source: HackerNews).

Uber has taken the world by storm, using a small mobile app to allow people to either offer cab services or call a cab service, without the infrastructure that normally supports cab services. The City of London will not renew Uber’s license. The government fears that Uber is a threat to Londoners’ safety and security.

Human beings’ death rate follows a Gompertz’ law which means that the probability that you will die goes up exponentially, year after year. This explains why, even though there are billions of us, none of us get to live beyond 120 years old. Not all living beings follow a Gompertz’ law. Plants do not. However, wooden utility poles do follow a Gompertz’ law just like us.

P. D. Mangan reports that exercise helps keep cancer away:

Exercise prevents cancer (…) A recent meta-analysis found up to 42% less risk for 10 different cancers, including esophageal, liver, lung, kidney, and many other cancers. (…) An interesting question is how exercise prevents cancer, and some recent research sheds light on this. (…) Fat tissue generates cytokines that promote the proliferation of cancer cells, and physical activity diminishes or abolishes the effect, which is dose-dependent, i.e. more exercise means less cancer promoting cytokines. (…) In animals (mice) that were implanted with tumor cells, voluntary running reduced tumor growth by over 60%. The researchers believe that exercise mobilized natural killer (NK) cells, which attack cancer.