How many reversible integer operations do you know?

Most operations on a computer are not reversible… meaning that once done, you can never go back. For example, if you divide integers by 2 to get a new integer, some information is lost (whether the original number was odd or even). With fixed-size arithmetic, multiplying by two is also irreversible because you lose the value of the most significant bit.

Let us consider fixed-size integers (say 32-bit integers). We want functions that take as input one fixed integer and output another fixed-size integer.

How many reversible operations do we know?

  1. Trivially, you can add or subtract by a fixed quantity. To reverse the operation, just flip the sign or switch from add to subtract.
  2. You can compute the exclusive or (XOR) with a fixed quantity. This operation is its own inverse.
  3. You can multiply by an odd integer. You’d think that reversing such a multiplication could be accomplished by a simple integer division, but that is not the case. Still, it is reversible. By extension, the carryless (or polynomial) multiplication supported by modern processors can also be reversible.
  4. You can rotate the bits right or left using the ror or rol instructions on an Intel processor or with a couple of shifts such as (x >>> (-b)) | ( x << b)) or (x << (-b)) | ( x >>> b)) in Java. To reverse, just rotate the other way. If you care about signed integers, there is an interesting variation that is also invertible: the "signed" rotate (defined as (x >> (-b)) | ( x << b)) in Java) which propagates the signed bit of two's complement encoding.
  5. You can XOR the rotations of a value as long as you have an odd number of them.
  6. You can compute the addition of a value with its shifts (e.g., x + ( x << a) ). This is somewhat equivalent to multiplication by an odd integer.
  7. You can compute the XOR of a value with its shifts (e.g., x ^ ( x >> a) or x ^ ( x << a) ). This is somewhat equivalent to a carryless (or polynomial) multiplication.
  8. You can reverse the bytes of an integer (bswap on Intel processors). This function is its own inverse. You can also reverse the order of the bits (rbit on ARM processors).
  9. (New!) Jukka Suomela points out that you can do bit interleaving (e.g., interleave the least significant 16 bits with most significant 16 bits) with instructions such as pdep on Intel processors. You can also compute the lexicographically-next-bit permutation.

You can then compose these operations, generating new reversible operations.

Pedantic note: some of these operations are not reversible on some hardware and in some programming languages. For example, signed integer overflows are undefined in C and C++.

Let us talk about the Luddite problem…

This morning I woke up to an interview on the radio (yes, I still have a radio somehow) with pharmacists who try to fulfill prescriptions by mail. This is 2016. I can probably get almost all the chemical compounds necessary to create most drugs delivered to my home from China… but I need to go wait in line if I need penicillin. For the most part, we fulfill prescriptions the way we did in the 1970s.

Medicine is largely held back to the 1970s in general. I cannot help but to gasp at doctors who proudly display their stethoscope… straight out of the nineteenth century.

Sometimes, as is the case in schools, we put a veneer of novelty… For example, many school programs proudly include a tablet these days… Oh! Come on! Strapping on a tablet to an obsolete system is not going to modernize it.

My own school, up to a few months ago, had the following “online enrollment system”. You clicked and it gave you a PDF which you printed and sent by mail. That was “online” to them because (gulp!) the form was available as a PDF online. I can afford to mock my employer because they have since fixed this silly issue… while many other schools have not.

So, evidently, we are openly refusing to move in the future… And I think it draws on public opinion. If most people thought it ridiculous that a stethoscope is the state-of-the-art in 2016, most doctors would quickly upgrade, forced to do so by social pressure.

The reason doctors still have nineteenth-century stethoscopes is the same reason we keep lecturing… even if we know that a lecture is a highly inefficient pedagogical approach (barely better than nothing).

It is what people want, it is what people expect. Many people fear what is different, new… even if they won’t admit their fear. Or rather, they prefer to paint their fear as “caution”. (Because having doctors use anything but nineteenth-century stethoscopes would be incautious!)

Recently, we have made progress regarding gene editing. It is, of course, a potential cure for tragic genetic diseases. It is a potent tool against cancer… we can take immune cells, tweak them and reinject them… to fight off currently incurable cancers. It could also be part of a cure for the diseases of aging… tweak some of your cells and instead of being a weak 75-year-old who walks with a cane and can barely carry grocery bags… you can be a 75-year-old who is as strong as a 30-year-old and just as unlikely to fall…

Yet 68% of Americans and worried about gene editing. They are not excited, they are worried.

So what is going on?

There is widespread ignorance about science and technology. For example, we have people with computer chips in their head right now to alleviate Parkinson’s symptoms. Many of us have elderly family members with cochlear implants, but most people don’t understand that these are computers hooked up to people’s brain… Sex change is a common thing nowadays, some surgery and hormone therapy does the trick. You do not need a vagina or even a penis to procreate… we have had in vitro fertilization since the 1970s… it is quite common today. People don’t understand the evolution of agriculture, they know little about pesticides… They don’t bother learning about the issues, they just want reassuring labels.

But ignorance only explains so much of the fear…

There is also a very strong Luddite agenda backed by what I call “nature worship”.

There is a vast collection of religious nuts and nature worshipers who believe that human beings should not meddle in the order of things. That they have no right to do so… And if they do, bad things will happen.

It is surprising how many echoes of nature worship we find in the media. For example, in the latest Start Trek movie, one of the character finds some biological technology that allows him stop aging and to make himself stronger. Well, he is a vampire. So we are to believe that centuries from now, human beings will cross the chasm between stars in starships… but we will be biologically identical to what we are now. We won’t be stronger. We won’t have better endurance. We will still age and need glasses. We won’t run faster or punch harder. Soldiers on the battlefield will be like us but with “phasers” instead of guns. And if someone tries to enhance himself… well… he must be some kind of evil vampire, right?

Here is the truth: Nature is only plentiful after we harness it. Almost everything we eat today was heavily modified by human beings… from the meat to every single vegetable you can think of. It is “unnatural” to drink cow milk. It is “unnatural” to have vast fields of genetically modified wheat that we turn into bread. (Hint: none of you has eaten the original wheat unmodified by human intervention.) It is natural to die of infection before the age of 5. If you want to know “what nature intended”, then maybe it is a raving bunch of starving and infected human beings…

Human technologies are only “unnatural” if you choose to define it this way. In truth, human beings emerged out of natural selection. From the very beginnings, human beings rose above other species. We cook our food (that’s “unnatural”!) which allows us to gain a lot of calories from food we could barely subsist from in the past…

Still. It was a close one. Nature is brutal. We are the only surviving human species. All others were wiped out. We were nearly wiped out… we are all descendants of a few thousand survivors.

You can be quite certain that we only survived because we made the best out of our brain and technology.

I think that what is natural for human beings is to develop ever better technology to improve the condition of mankind. And, yes, this involves creating artificial blood, putting chips in people’s brain to keep them from shaking, editing their genes so that they don’t have to live in a bubble… finding ways to regrow limbs and reverse aging.

This is the very nature of human beings… spiders create silk… we create technology… When human beings decide to forgo technology, they are like birds who decide to forgo flight…

By forgoing technology, we forgo our very existance. Without technology we would not survive. It is part of us just like silk is part of the spider.

Even knowledgeable people who are not nature worshipers often oppose technology by default. These people adopt the precautionary principle. In effect, they say that new technologies expose us to untold danger… and that we should stick with the tested and true.

It sounds like a reasonable point of view… but it is also very dangerous even if it sounds “prudent”. Back in the 1960s, we underwent the “green revolution”. Before this technological revolution, there were very serious concerns that millions would soon starve. We simply did not have the technology to feed 6 or 7 billion people. Then we exposed rice, corn, wheat to radiation, creating mutant seeds… in effect, accelerating evolution. Today, all of us, including those who eat “organic food” are fed from these mutant seeds. To go back, you would need to first get rid of billion of people. Now… that wasn’t the first time we used “risky” technology to save millions. At the turn of the twentieth century, we adopted chemical fertilizers… without this, millions more would have died.

So the precautionary principle would have led to the death of millions of us. Not so prudent, eh?

In the near future, there will be 10 billion people on Earth, if we develop the technology to feed them… or untold numbers will starve. Deciding that the current technology is good enough may very well comdemn millions to death and starvation… Is it prudent?

Today, hundred of thousands of people a day die of age-related diseases. Finding a cure for these diseases would be as important, if not more, than the green revolution. It involves tinkering with our fundamental metabolism. It might require gene editing, stem-cell technologies, having computer chips embedded in our bodies…

Each time we push forward, some people feel that we are getting “less natural”. They think that, surely, there is a limit… Of course, there are limits, but holding back technology has a tremendous cost. If you say “oh… let us pass on gene editing… it sounds dangerous…”, then you are condemning millions to die and suffer.

But what about this limit that we will reach, where the Earth would disintegrate, maybe because there are too many people on it…?

It helps to have knowledge. There are more forests today in Europe than there were centuries ago. In fact, as farming becomes more efficient, we are able to relinquish more land to the wild. By artificially maintaining our productivity low (as with “organic agriculture”), we are stuck having to use all of the available land.

If not for immigration from poorer countries, most of the advanced countries (Europe, North America, Japan) would have falling population. If you are worried at all about overpopulation, then you need not to look at where technology is plentiful, but where it is lacking: in rural Africa…

If we could, somehow, double the life expectancy of human beings, in time the population of the advanced countries would resume their decline… because it is fecundity and not longevity that drives population.

But should you even be worried about overpopulation? That’s doubtful. Some of the richest places on Earth are also the most densely populated. People are not better off, healthier or smarter in the middle of nowhere.

And, though it is currently unfeasible, it seems clear that it is only a matter of time before we start populating space and the oceans. Our bodies cannot sustain life in space nor can we swim unaided at the bottom of the ocean, but our descendants will be better, stronger…

Technology is natural for human beings. Luddites are spiders refusing to use their silk.

Combine smart people with crazily hard projects

Back in college, professors assigned crazily hard problems… and I was forced to talk with my peers to figure out how they fared… and eventually teaming up with some of them. The same pattern has repeated itself again and again in my life: smart people and really hard problems. That’s how I am able to learn new things…

It is a powerful mix. Smart people challenge you while simultaneously allowing you to build up your confidence as you eventually match their expertise. Hard problems force you to fail, which is how you learn.

Neither of these things is particularly scarce today. There are billions of people connected to the Internet… many of them much smarter than you… yet all of them a few bits away from you… And we have more hard problems at our disposal than ever before in the history of humanity…

I think that smart people meeting hard problems are more than just a learning opportunity… I think that’s how civilization moves forward.

Sadly, however, I fear that this potent mix is too often absent:

  • Most office work is organized so as to push hard problems, and failure, at the margin. The only kind of failures that are routinely present as “zero-sum failures”. Maybe the budget this year is fixed, and so only some projects will be funded… maybe your project will fail to secure funding. You will have failed, but not as a result of working on a difficult problem… it is “artificial failure”: you did not even try.
  • Smart people are often also pushed at the margins. Smart people are those who can make a dent on hard problems… but if the hard problems are pushed at the margin, then who needs them? When employers come to see me, a college professor, they never ask for “the smartest student I know”, they ask for “a good programmer”. They never say that they have hard and interesting problems to offer them… they talk about being able to offer them good salaries…

You might think that colleges would act as a safe haven… but I fear that they often do not. To succeed in academia, it helps to appear smart… but tackling hard problems is entirely optional. In fact, the secret for getting a research grant is… propose to do something you already know how to do… That’s not just a theoretical concern of mine. Just last week, I was in a meeting with a smart young professor. She explained to us, quite openly, that she intentionally did not include an important but difficult idea from her last research grant application… The implication was clearly that since it was a hard problem she did not know how to tackle, it would stack the odds against her to include it as a target. The net result is that grant applications are often conservative, if not openly boring. They must appear new and exciting… but the underlying problems must be easy if not already solved.

So where do you find smart people working on hard problems? I don’t think you find them in managed environment… I think you find them in the cracks…

And I think that it contributes to making the future really hard to predict. The future is often literally being built by friends in a dark garage… it does not get planned by Wall Street or the government.

Common sense in artificial intelligence… by 2026?

Lots of people want to judge machine intelligence based on human intelligence. It dates back to Turing who proposed his eponymous Turing test: can machines “pass” as human beings? Turing, being clever, was aware of how biased this test was:

If the man were to try and pretend to be the machine he would clearly make a very poor showing. He would be given away at once by slowness and inaccuracy in arithmetic. May not machines carry out some-thing which ought to be described as thinking but which is very different from what a man does?

I expect that we will eventually outgrow our anthropocentrism and view what machines really offer: a new kind of intelligence.

In any case, from an economics perspective, it matters a great deal whether machines can do exactly what human beings can do. Calum Chace has published a new book on this topic: The Economic Singularity, Artificial intelligence and the death of capitalism. Chace’s excellent book in the latest in a stream of books hinting that we may soon all be unemployable simply because machines are better than us at most jobs.

To replace human beings at most jobs, machines need to exhibit what we intuitively call “common sense”. For example, if someone just bought a toaster… you do not try to sell them another toaster (as so many online ad systems do today).

Common sense is basic knowledge about how the world of human beings works. It is not rule-based. It is not entirely logical. It is a set of heuristics almost all human beings quickly acquire. If computers could be granted a generous measure of common sense, many believe that they could make better employees than human beings. Whatever one might think about economics, there is an interesting objective question… can machines achieve “common sense” in the near future?

It seems that Geoff Hinton, a famous computer scientist, predicted that within a decade, we would build computers with common sense. These are not computers that are smarter than all of us at all tasks. These are not computers with a soul. They are merely computers with a working knowledge of the world of human beings… computers that know our conventions, they know that stoves are hot, that people don’t usually own twelve toasters and so forth.

Chace recently placed a bet with a famous economist, Robin Hanson, that Hinton is right at 50-to-1 odds. This means that Hanson is very confident that computers will be unable to achieve common sense in the near future.

Hanson is not exactly a Luddite who believes that technology will stall. In fact, Hanson has also an excellent book, the Age of Ems that describes a world where brains have been replaced with digital computers. Our entire civilization is made of software. I have covered some of the content of Hanson’s book on my blog before… for example, Hanson believes that software grows old and becomes senile.

I think that both Hanson and Chace are very well informed on the issues, but they have different biases.

What is my own take?

The challenge for people like Chace who allude to an economic singularity where machines take over the economy… is that we have little to no evidence that such a thing is coming. For all the talks about massive unemployment coming up… the unemployment rates are really not that high. Geoff Hinton thinks that machines will soon acquire common sense… and it looks like an easy problem? But we have no clue right now how to go about solving this problem. It is hard to even define it.

As for Hanson’s, the problem is that betting against what we can do 10 years in the future is very risky. Ten years ago, we did not have iPhones. Today’s iPhone is more powerful than a PC from ten years ago. People at the beginning of the century thought that it would take a million years to get a working aeroplane, whereas it took a mere ten years…

I must say that despite the challenge, I am with Chace. At 50-to-1 odds, I would bet for the software industry. The incentive to offer common sense is great. After all, you can’t drive a car, clean a house or serve burgers without some common sense. What the deep learning craze has taught us is that it is not necessary for us to understand how the software works for the software to be effective. With enough data, enough computing power and trial and error, there is no telling what we can find!

Let us be more precise… what could we expect from software having common sense? It is hard to define it because it is a collection of small pieces… all of which are easy to program individually. For example, if you are lying on the floor yelling “I’m hurt”, common sense dictates that we call emergency services… but it is possible that Apple’s Siri could already be able to do this.

We have the Winograd Schema Challenge but it seems to be tightly tied to natural language processing… I am not sure understanding language and common sense are the same thing. For example, many human beings are illiterate and yet they can be said to have common sense.

So I offer the following “test”. Every year, new original video games come out. Most of them come with no instruction whatsoever. You start playing and you figure it out as you… using “common sense”. So I think that if some piece of software is able to pick up a decent game from Apple’s AppStore and figure out how to play competently within minutes… without playing thousands of games… then it will have an interesting form of common sense. It is not necessary for the software to play at “human level”. For example, it would be ok if it only played simple games at the level of a 5-year-old. The key in this test is diversity. There are great many different games, and even when they have the same underlying mechanic, they can look quite a bit different.

Is it fair to test software intelligence using games? I think so. Games are how we learn about the world. And, frankly, office work is not all that different from a (bad) video game.

Accelerating PHP hashing by “unoptimizing” it

Hashing is a software trick that can map strings to fixed-length integers, such as 32-bit integers. It is ubiquitous in modern software.

Languages like Java and PHP have the ability to store strings with their corresponding hash values. Still, the hash value must be computed at least once.

How much of a burden can this be? Suppose that we use 10 cycles per byte to hash a string. For a long 100-kilobyte string, that would be about a million CPU cycles. If your CPU runs at 2 GHz, you have 2 billion cycles per second. Hence, hashing your string should take no more than half a millisecond. Put another way, you can hash 2000 such strings per second.

Simon Hardy-Francis pointed out to me that this can still represent a performance bottleneck if your PHP application needs to repeatedly load large new strings.

So what does PHP use as a hash function? It uses fundamentally the Java hash function, a simple polynomial hash with an odd multiplier… (coprime with 2)

for (int i = 0; i < len; i++) {
  hash = 33 * hash + str[i];

(Java multiplies by 31 instead of 33 but it is the same idea.)

A polynomial hash function with an odd multiplier is found everywhere and has a long history. It is the hash function used by the Karp-Rabin string search algorithm.

As I have pointed out in another post, for better performance, you want to unroll this function like so…

for (; i + 3 < len; i += 4) {
   h = 33 * 33 * 33 * 33 * h 
       + 33 * 33 * 33 * str[i] 
       + 33 * 33 * str[i + 1] 
       + 33 * str[i + 2] 
       + str[i + 3];
for (; i < len; i++) {
   h = 33 * h + str[i];

The reason this might help might be that it breaks the data dependency: instead of having to wait for the previous multiplication to finish before another one can be issued, you can issue one new multiplication per cycle for up to four cycles in a row. Unrolling more might accelerating the code further.

The PHP developers implement the hash function with an extra optimization, however. Crediting Bernstein for the idea, they point out that…

the multiply operation can be replaced by a faster operation based on just one shift plus either a single addition or subtraction operation

It is true that a shift followed by an addition might be slightly cheaper than a multiplication, but modern compilers are quite good at working this out on their own. They can transform your multiplications by a constant as they see fit.

In any case, so the PHP implementation is an optimized version of the following…

for (int i = 0; i < len; i++) {
  hash = ((hash << 5) + hash) + str[i];

The code is actually quite a bit more complicated because it is heavily unrolled, but it is algorithmically equivalent. Their code strongly discourages the compiler from ever using a multiplication.

So are the PHP developers correct? Should we work hard to avoid multiplications in C using Bernstein’s trick? Let us put this theory to the test on a recent x64 processor. As usual, my code is available.

Polynomial hashing (cycles per byte) on Intel Skylake
PHP (no multiplier)PHP (with multiplier)

The multiplication-free PHP approach is 33% slower! Gregory Pakosz pointed out that you can do even better by unrolling the version with multiplier further, reaching 1.5 cycles per byte.

Embedded processors with slow multiplications might give different outcomes. But then, where do you expect PHP processes to run? Overwhelmingly, they run on Intel processors produced in the last ten years… and these processors have fast multipliers.

So I think that the PHP developers are leaving performance on the table. They could easily optimize the computation of the hash function without changing the result of the function. What is more, the code would be more readable if they left the multiplications! If you need to multiply by 33, just do it the simplest possible manner! If it is cheaper to do a shift, the compiler can probably figure it out before you do. If you do not trust your compiler, then, at least, run benchmarks!

Let us look at the larger issue. How fast are 1.5 or 1.75 cycles per byte? Not very fast. Google’s CityHash uses about 0.25 cycles per byte whereas the state-of-the-art CLHash uses about 0.1 cycles per byte on recent Intel processors. So with a more modern algorithm, PHP developers could multiply the speed of their hash functions… but that’s for another day.

Augmented reality becomes mainstream

My go-to reference lately about the near future has been the 2006 novel Rainbows End by Vernor Vinge. The novel is set in 2025 and the author depicts a world where augmented reality is ubiquitous. Kids still go to school, but instead of presenting the working of a turbine using a PowerPoint deck, you can make a working turbine appear out of thin air in the classroom.

Augmented reality is the addition of a layer to the world using computing. One powerful and ubiquitous tool provided by modern computing is GPS: many of your devices can tell where they are on the planet within a few meters (and sometimes better). It has been used for gaming for many years. For example, I have played geocaching, Munzee, Ingress… and yesterday I played Pokémon Go.

Pokémon Go differs from the previous GPS-based games because of its massive popularity. Though the game has been released for barely a week, journalists estimate that it has 10 million users worldwide.

Some will object that Pokémon Go is not “really” an augmented reality game. Indeed, though it projects a small animal onto the image of your smartphone camera to give you the illusion that the animal is right there… the underlying software is not based on computational vision. It would simply not be possible in 2016… but all that matters to the player is that it “works”, it is convincing.

In comparison, I have put on Microsoft’s Hololens headset… It is considered to be “true” augmented reality in the sense that it realistically projects objects on top of your normal view… you can tilt your head and the object stays put. But playing Pokémon Go with Microsoft’s Hololens would be a miserable experience. For one thing, nobody would want to walk around in the street with a bulky headset. And it is debatable whether the Hololens projections feel more real than a Pokémon in Pokémon Go.

I don’t know how long Pokémon Go will thrive. Will it still be around in a year? Who knows? What really matters is that millions of people have now experienced the taste of augmented reality. There is no turning back.

The race is on to produce more convincing augmented reality hardware and software.

And this puts us right on track for the future described by Rainbows End.

Why does it matter? In the long run, augmented reality represents another pathway to extend human abilities.

Virtual Reality: First impressions with the HTC Vive

I just got my hands on some virtual-reality (VR) goggles. Specifically, we have an “HTC Vive“. We are still in the early days of VR and given that these goggles cost about a $1000, not everyone will get to try them outside a demo room. I thought I’d share my impressions.

  • There are indications that the HTC Vive needs a powerful PC. Because I am disorganized, I ended up with the HTC Vive goggles, but no corresponding powerful gaming PC. So I used what I had: my son’s gaming PC. A 5-year-old box. It fails to meet the “minimum” requirements set by HTC, but at no point did we ever encounter any performance problem. To be fair, I did not try to run any demanding game… simply because I have none to test… Still, it seems to me that the belief that VR requires very powerful machines might be overstated.
  • The HTC hardware is fantastic. It looks good and it is sturdy. I am sure that it will all look ridiculous in a few years, but it is quite usable today. It feels good. The HTC Vive comes with two great controllers.
  • Setting up the HTC Vive is a bit harder than just putting on the goggles. You need to setup a room with sensors at each end. Still, it is no big deal. The only annoying hardware issue we got was pairing the controllers with the system. It was a source of confusion. The hardest part was finding out where to click to pair the controller.
  • … which brings me to the software. The software is a bit flaky like most software tends to be. It looks good and it generally works, but once the server stopped responding and we had to “kill it” and another time, the demo would insist that we press the “system” keys whereas doing so never worked. Even so, the software is quite good already.
  • So how is the experience? Great. It simply works. Every demo I tried was convincing. It is just like I imagined it. Better than I imagined it in fact because my previous encounters (years ago) with VR were unconvincing.

So where do I see this going in the next couple of years?

  • The hardware is basically good enough. I am sure I will sound like a fool in five years when the current VR hardware looks obsolete, but I do not expect it to get a whole lot better, qualitatively speaking. What I do expect is that we will get cheaper versions that work nearly as well. Already, the upcoming Sony PlayStation VR is going to cost half as much as an HTC Vive.
  • Content is a problem right now. That is, you can get the goggles working, but you are left feeling that there ought to be something more interesting to do with them… What I hope we will see is an explosion of new applications and games.

What is next for me? I am getting a Sony PlayStation VR for my home. I was still slightly on the fence, but playing with the HTC Vive convinced me that the hardware was mature enough.

In time, I want to setup the HTC Vive so that I can program my own prototypes. As a scientist and engineer, I want to find out what else I can do with these goggles.

Fast random shuffling

In a random shuffle, you want to take the elements of a list and reorder them randomly. In a “fair” random shuffle, all possible permutations must be equally likely. It is surprisingly hard to come up with a fair algorithm. Thankfully, there is a fast and easy-to-implement algorithm: the Fisher-Yates shuffle. It is a rather intuitive algorithm and there are YouTube videos about it… so, I will just point you at a piece of C code:

for (i=size; i>1; i--) {
   int p = random_bounded(i); // number in [0,i)
   swap(array+i-1, array+p); // swap the values at i-1 and p

What can we expect to limit the speed of this algorithm? Let me assume that we do not use fancy SIMD instructions or parallelization.

If the input array is not in the cache, and we cannot fetch it in time or it is just too large, then cache faults will dominate the running time. So let us assume that the array is in the CPU’s cache.

If we have N input words, we go through the loop N – 1 times. At each iteration of the loop, you need to read two values and write two other values. A recent x64 processor can only store one value to memory per cycle, so we cannot do better than two cycles per input word. In the very next iteration, you may need to read one of the recently written values. So, two cycles per input word is probably optimistic.

What else could be the problem? The generation of the random numbers could hurt us. Let us assume that we are given a random number generation routine that we cannot change. For this blog post, I will stick with PCG.

What remains? Notice how the Fisher-Yates shuffle requires numbers in a range. The typical techniques to generate random numbers in a range involve frequent divisions.

For example, you might want to look at how the Go language handles it:

func (r *Rand) Int31n(n int32) int32 {
	max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
	v := r.Int31()
	for v > max {
		v = r.Int31()
	return v % n

This function always involves two divisions. Java, the PCG library… all involve at least one division per function call, often many more than one. Sadly, divisions are many times more expensive than any other operation, even on recent processors.

In an earlier blog post, I showed how to (mostly) get around divisions.

In general, no map from all 32-bit integers to a range can be perfectly fair. In practice, the effect is quite small unless your range is close to the maximal value of an integer. Thus you can simply use the following function:

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  return multiresult >> 32;

Maybe you feel bad about introducing a slight bias. You probably should not since the random-number generation itself is unlikely to be perfect.

Still, we can correct the bias. Recall that some of the values are mapped ceil(4294967296/range) times whereas others are mapped floor(4294967296/range) times. By sometimes redrawing a new random value, we can avoid entirely the bias (this technique is called rejection sampling):

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  leftover = (uint32_t) multiresult;
  if(leftover < range ) {
      threshold = -range % range ;
      while (leftover < threshold) {
            random32bit =  random32();
            multiresult = random32bit * range;
            leftover = (uint32_t) multiresult;
  return multiresult >> 32;

This looks quite a bit worse, but the “if” clause containing divisions is very rarely taken. Your processor is likely to mostly ignore it, so the overhead of this new function is smaller than it appears.

So how do we fare? I have implemented these functions in C, using them to compute a random shuffle. Before each shuffle, I ensure that the array is in the cache. I report the number of clock cycle used per input words, on a recent Intel processor (Skylake). As usual, my code is available.

Random shuffle timings, varying the range function
range functioncycles per input word
PCG library18.0
no division, no bias7
no division (with slight bias)6

Avoiding divisions makes the random shuffle runs twice as fast.

Could we go faster? Yes. If we use a cheaper/faster random number generator. However, keep in mind that without SIMD instructions or multi-core processing, we cannot realistically hope to reach the lower bound of 2 cycles per input words. That is, I claim that no function can be 3 times faster than the fastest function we considered.

You can save a little bit (half a cycle per input word) if you replace the 32-bit PCG calls by 64-bit calls, processing input words in pairs. Using SIMD instructions, we could go even faster, but I do not have access to a SIMD-accelerated PCG implementation… We could, of course, revisit the problem with different random-number generators.

A fast alternative to the modulo reduction

Suppose you want to pick an integer at random in a set of N elements. Your computer has functions to generate random 32-bit integers, how do you transform such numbers into indexes no larger than N? Suppose you have a hash table with a capacity N. Again, you need to transform your hash values (typically 32-bit or 64-bit integers) down to an index no larger than N. Programmers often get around this problem by making sure that N is a power of two, but that is not always ideal.

We want a map that as fair as possible for an arbitrary integer N. That is, ideally, we would want that there are exactly 232/N values mapped to each value in the range {0, 1 ,…, N – 1} when starting from all 232 32-bit integers.

Sadly, we cannot have a perfectly fair map if 232 is not divisible by N. But we can have the next best thing: we can require that there be either floor(232/N) or ceil(232/N) values mapped to each value in the range.

If N is small compared to 232, then this map could be considered as good as perfect.

The common solution is to do a modulo reduction: x mod N. (Since we are computer scientists, we define the modulo reduction to be the remainder of the division, unless otherwise stated.)

uint32_t reduce(uint32_t x, uint32_t N) {
  return x % N;

How can I tell that it is fair? Well. Let us just run through the values of x starting with 0. You should be able to see that the modulo reduction takes on the values 0, 1, …, N – 1, 0, 1, … as you increment x. Eventually, x arrives at its last value (232 – 1), at which point the cycle stops, leaving the values 0, 1, …, (232 – 1) mod N with ceil(232/N) occurrences, and the remaining values with floor(232/N) occurrences. It is a fair map with a bias for smaller values.

It works, but a modulo reduction involves a division, and divisions are expensive. Much more expensive than multiplications. A single 32-bit division on a recent x64 processor has a throughput of one instruction every six cycles with a latency of 26 cycles. In contrast, a multiplication has a throughput of one instruction every cycle and a latency of 3 cycles.

There are fancy tricks to “precompute” a modulo reduction so that it can be transformed into a couple of multiplications as well as a few other operations, as long as N is known ahead of time. Your compiler will make use of them if N is known at compile time. Otherwise, you can use a software library or work out your own formula.

But it turns out that you can do even better! That is, there is an approach that is easy to implement, and provides just as good a map, without the same performance concerns.

Assume that x and N are 32-bit integers, consider the 64-bit product x * N. You have that (x * N) div 232 is in the range, and it is a fair map.

uint32_t reduce(uint32_t x, uint32_t N) {
  return ((uint64_t) x * (uint64_t) N) >> 32 ;

Computing (x * N) div 232 is very fast on a 64-bit processor. It is a multiplication followed by a shift. On a recent Intel processor, I expect that it has a latency of about 4 cycles and a throughput of at least on call every 2 cycles.

So how fast is our map compared to a 32-bit modulo reduction?

To test it out, I have implemented a benchmark where you repeatedly access random indexes in an array of size N. The indexes are obtained either with a modulo reduction or our approach. On a recent Intel processor (Skylake), I get the following number of CPU cycles per accesses:

modulo reductionfast range

So it is four times faster! No bad.

As usual, my code is freely available.

What can this be good for? Well… if you have been forcing your arrays and hash tables to have power-of-two capacities to avoid expensive divisions, you may be able to use the fast range map to support arbitrary capacities without too much of a performance penalty. You can also generate random numbers in a range faster, which matters if you have a very fast random number generator.

So how can I tell that the map is fair?

By multiplying by N, we take integer values in the range [0, 232) and map them to multiples of N in [0, N * 232). By dividing by 232, we map all multiples of N in [0, 232) to 0, all multiples of N in [232, 2 * 232) to one, and so forth. To check that this is fair, we just need to count the number of multiples of N in intervals of length 232. This count must be either ceil(232/N) or floor(232/N).

Suppose that the first value in the interval is a multiple of N: that is clearly the scenario that maximizes the number of multiples in the interval. How many will we find? Exactly ceil(232/N). Indeed, if you draw sub-intervals of length N, then every complete interval begins with a multiple of N and if there is any remainder, then there will be one extra multiple of N. In the worst case scenario, the first multiple of N appears at position N – 1 in the interval. In that case, we get floor(232/N) multiples. To see why, again, draw sub-intervals of length N. Every complete sub-interval ends with a multiple of N.

This completes the proof that the map is fair.

For fun, we can be slightly more precise. We have argued that the number of multiples was maximized when a multiple of N appears at the very beginning of the interval of length 232. At the end, we get an incomplete interval of length 232 mod N. If instead of having the first multiple of N appear at the very beginning of the interval, it appeared at index 232 mod N, then there would not be room for the incomplete subinterval at the end. This means that whenever a multiple of N occurs before 232 mod N, then we shall have ceil(232/N) multiples, and otherwise we shall have floor(232/N) multiples.

Can we tell which outcomes occur with frequency floor(232/N) and which occurs with frequency ceil(232/N)? Yes. Suppose we have an output value k. We need to find the location of the first multiple of N no smaller than k 232. This location is ceil(k 232 / N) Nk 232 which we just need to compare with 232 mod N. If it is smaller, then we have a count of ceil(232/N), otherwise we have a count of floor(232/N).

Further reading:

(Update: I have made the proof more intuitive following a comment by Kendall Willets.)

I do not use a debugger

I learned to program with BASIC back when I was twelve. I would write elaborate programs and run them. Invariably, they would surprise me by failing to do what I expect. I would struggle for a time, but I’d eventually give up and just accept that whatever “bugs” I had created were there to stay.

It would take me a decade to learn how to produce reliable and useful software. To this day, I still get angry with people who take it for granted that software should do what they expect without fail.

In any case, I eventually graduated to something better: Turbo Pascal. Turbo Pascal was a great programming language coupled with a fantastic programming environment that is comparable, in many ways, to modern integrated development environments (IDEs). Yet it is three decades old. It had something impressive: you could use a “debugger”. What this means is that you could run through the program, line by line, watching what happened to variables. You could set breakpoints where the program would halt and give you control.

At the time, I thought that programming with a debugger was the future.

Decades later, I program in various languages, C, JavaScript, Go, Java, C++, Python… and I almost never use a debugger. I use fancy tools, and I certainly do use tools that are called debuggers (like gdb), but I almost never step through my programs line-by-line watching variable values. I almost never set breakpoints. I say “almost” because there are cases where a debugger is the right tool, mostly on simple or quick-and-dirty projects, or in contexts where my brain is overwhelmed because I do not fully master the language or the code. This being said I do not recall the last time I used a debugger as a debugger to step through the code. I have a vague recollection of doing so to debug a dirty piece of JavaScript.

I am not alone. In five minutes, I was able to find several famous programmers who took positions against debuggers or who reported barely using them.

I should make it clear that I do not think that there is one objective truth regarding tools. It is true that our tools shape us, but there is a complex set of interactions between how you work, what you do, who you work with, what other tools you are using and so forth. Whatever works for you might be best.

However, the fact that Linus Torvalds, who is in charge of a critical piece of our infrastructure made of 15 million lines of code (the Linux kernel), does not use a debugger tells us something about debuggers

Anyhow, so why did I stop using debuggers?

Debuggers were conceived in an era where we worked on moderately small projects, with simple processors (no thread, no out-of-order execution), simple compilers, relatively small problems and no formal testing.

For what I do, I feel that debuggers do not scale. There is only so much time in life. You either write code, or you do something else, like running line-by-line through your code. Doing “something else” means (1) rethinking your code so that it is easier to maintain or less buggy (2) adding smarter tests so that, in the future, bugs are readily identified effortlessly. Investing your time in this manner makes your code better in a lasting manner… whereas debugging your code line-by-line fixes one tiny problem without improving your process or your future diagnostics. The larger and more complex the project gets, the less useful the debugger gets. Will your debugger scale to hundreds of processors and terabytes of data, with trillions of closely related instructions? I’d rather not take the risk.

My ultimate goal when work on a difficult project is that when problems arise, as they always do, it should require almost no effort to pinpoint and fix the problem. Relying on a debugger as your first line of defense can be a bad investment, you should always try to improve the code first.

Rob Pike (one of the authors of the Go language) once came to a similar conclusion:

If you dive into the bug, you tend to fix the local issue in the code, but if you think about the bug first, how the bug came to be, you often find and correct a higher-level problem in the code that will improve the design and prevent further bugs.

I don’t want to be misunderstood, however. We need to use tools, better tools… so that we can program ever more sophisticated software. However, running through the code line-by-line checking the values of your variables is no way to scale up in complexity and it encourages the wrong kind of designs.