Common sense in artificial intelligence… by 2026?

Lots of people want to judge machine intelligence based on human intelligence. It dates back to Turing who proposed his eponymous Turing test: can machines “pass” as human beings? Turing, being clever, was aware of how biased this test was:

If the man were to try and pretend to be the machine he would clearly make a very poor showing. He would be given away at once by slowness and inaccuracy in arithmetic. May not machines carry out some-thing which ought to be described as thinking but which is very different from what a man does?

I expect that we will eventually outgrow our anthropocentrism and view what machines really offer: a new kind of intelligence.

In any case, from an economics perspective, it matters a great deal whether machines can do exactly what human beings can do. Calum Chace has published a new book on this topic: The Economic Singularity, Artificial intelligence and the death of capitalism. Chace’s excellent book in the latest in a stream of books hinting that we may soon all be unemployable simply because machines are better than us at most jobs.

To replace human beings at most jobs, machines need to exhibit what we intuitively call “common sense”. For example, if someone just bought a toaster… you do not try to sell them another toaster (as so many online ad systems do today).

Common sense is basic knowledge about how the world of human beings works. It is not rule-based. It is not entirely logical. It is a set of heuristics almost all human beings quickly acquire. If computers could be granted a generous measure of common sense, many believe that they could make better employees than human beings. Whatever one might think about economics, there is an interesting objective question… can machines achieve “common sense” in the near future?

It seems that Geoff Hinton, a famous computer scientist, predicted that within a decade, we would build computers with common sense. These are not computers that are smarter than all of us at all tasks. These are not computers with a soul. They are merely computers with a working knowledge of the world of human beings… computers that know our conventions, they know that stoves are hot, that people don’t usually own twelve toasters and so forth.

Chace recently placed a bet with a famous economist, Robin Hanson, that Hinton is right at 50-to-1 odds. This means that Hanson is very confident that computers will be unable to achieve common sense in the near future.

Hanson is not exactly a Luddite who believes that technology will stall. In fact, Hanson has also an excellent book, the Age of Ems that describes a world where brains have been replaced with digital computers. Our entire civilization is made of software. I have covered some of the content of Hanson’s book on my blog before… for example, Hanson believes that software grows old and becomes senile.

I think that both Hanson and Chace are very well informed on the issues, but they have different biases.

What is my own take?

The challenge for people like Chace who allude to an economic singularity where machines take over the economy… is that we have little to no evidence that such a thing is coming. For all the talks about massive unemployment coming up… the unemployment rates are really not that high. Geoff Hinton thinks that machines will soon acquire common sense… and it looks like an easy problem? But we have no clue right now how to go about solving this problem. It is hard to even define it.

As for Hanson’s, the problem is that betting against what we can do 10 years in the future is very risky. Ten years ago, we did not have iPhones. Today’s iPhone is more powerful than a PC from ten years ago. People at the beginning of the century thought that it would take a million years to get a working aeroplane, whereas it took a mere ten years…

I must say that despite the challenge, I am with Chace. At 50-to-1 odds, I would bet for the software industry. The incentive to offer common sense is great. After all, you can’t drive a car, clean a house or serve burgers without some common sense. What the deep learning craze has taught us is that it is not necessary for us to understand how the software works for the software to be effective. With enough data, enough computing power and trial and error, there is no telling what we can find!

Let us be more precise… what could we expect from software having common sense? It is hard to define it because it is a collection of small pieces… all of which are easy to program individually. For example, if you are lying on the floor yelling “I’m hurt”, common sense dictates that we call emergency services… but it is possible that Apple’s Siri could already be able to do this.

So I offer the following “test”. Every year, new original video games come out. Most of them come with no instruction whatsoever. You start playing and you figure it out as you… using “common sense”. So I think that if some piece of software is able to pick up a decent game from Apple’s AppStore and figure out how to play competently within minutes… without playing thousands of games… then it will have an interesting form of common sense. It is not necessary for the software to play at “human level”. For example, it would be ok if it only played simple games at the level of a 5-year-old. The key in this test is diversity. There are great many different games, and even when they have the same underlying mechanic, they can look quite a bit different.

Is it fair to test software intelligence using games? I think so. Games are how we learn about the world. And, frankly, office work is not all that different from a (bad) video game.

Accelerating PHP hashing by “unoptimizing” it

Hashing is a software trick that can map strings to fixed-length integers, such as 32-bit integers. It is ubiquitous in modern software.

Languages like Java and PHP have the ability to store strings with their corresponding hash values. Still, the hash value must be computed at least once.

How much of a burden can this be? Suppose that we use 10 cycles per byte to hash a string. For a long 100-kilobyte string, that would be about a million CPU cycles. If your CPU runs at 2 GHz, you have 2 billion cycles per second. Hence, hashing your string should take no more than half a millisecond. Put another way, you can hash 2000 such strings per second.

Simon Hardy-Francis pointed out to me that this can still represent a performance bottleneck if your PHP application needs to repeatedly load large new strings.

So what does PHP use as a hash function? It uses fundamentally the Java hash function, a simple polynomial hash with an odd multiplier… (coprime with 2)

for (int i = 0; i < len; i++) {
  hash = 33 * hash + str[i];
}

(Java multiplies by 31 instead of 33 but it is the same idea.)

A polynomial hash function with an odd multiplier is found everywhere and has a long history. It is the hash function used by the Karp-Rabin string search algorithm.

As I have pointed out in another post, for better performance, you want to unroll this function like so…

for (; i + 3 < len; i += 4) {
   h = 33 * 33 * 33 * 33 * h 
       + 33 * 33 * 33 * str[i] 
       + 33 * 33 * str[i + 1] 
       + 33 * str[i + 2] 
       + str[i + 3];
}
for (; i < len; i++) {
   h = 33 * h + str[i];
}

The reason this might help might be that it breaks the data dependency: instead of having to wait for the previous multiplication to finish before another one can be issued, you can issue one new multiplication per cycle for up to four cycles in a row. Unrolling more might accelerating the code further.

The PHP developers implement the hash function with an extra optimization, however. Crediting Bernstein for the idea, they point out that…

the multiply operation can be replaced by a faster operation based on just one shift plus either a single addition or subtraction operation

It is true that a shift followed by an addition might be slightly cheaper than a multiplication, but modern compilers are quite good at working this out on their own. They can transform your multiplications by a constant as they see fit.

In any case, so the PHP implementation is an optimized version of the following…

for (int i = 0; i < len; i++) {
  hash = ((hash << 5) + hash) + str[i];
}

The code is actually quite a bit more complicated because it is heavily unrolled, but it is algorithmically equivalent. Their code strongly discourages the compiler from ever using a multiplication.

So are the PHP developers correct? Should we work hard to avoid multiplications in C using Bernstein’s trick? Let us put this theory to the test on a recent x64 processor. As usual, my code is available.

Polynomial hashing (cycles per byte) on Intel Skylake
PHP (no multiplier)PHP (with multiplier)
2.351.75

The multiplication-free PHP approach is 33% slower! Gregory Pakosz pointed out that you can do even better by unrolling the version with multiplier further, reaching 1.5 cycles per byte.

Embedded processors with slow multiplications might give different outcomes. But then, where do you expect PHP processes to run? Overwhelmingly, they run on Intel processors produced in the last ten years… and these processors have fast multipliers.

So I think that the PHP developers are leaving performance on the table. They could easily optimize the computation of the hash function without changing the result of the function. What is more, the code would be more readable if they left the multiplications! If you need to multiply by 33, just do it the simplest possible manner! If it is cheaper to do a shift, the compiler can probably figure it out before you do. If you do not trust your compiler, then, at least, run benchmarks!

Let us look at the larger issue. How fast are 1.5 or 1.75 cycles per byte? Not very fast. Google’s CityHash uses about 0.25 cycles per byte whereas the state-of-the-art CLHash uses about 0.1 cycles per byte on recent Intel processors. So with a more modern algorithm, PHP developers could multiply the speed of their hash functions… but that’s for another day.

Augmented reality becomes mainstream

My go-to reference lately about the near future has been the 2006 novel Rainbows End by Vernor Vinge. The novel is set in 2025 and the author depicts a world where augmented reality is ubiquitous. Kids still go to school, but instead of presenting the working of a turbine using a PowerPoint deck, you can make a working turbine appear out of thin air in the classroom.

Augmented reality is the addition of a layer to the world using computing. One powerful and ubiquitous tool provided by modern computing is GPS: many of your devices can tell where they are on the planet within a few meters (and sometimes better). It has been used for gaming for many years. For example, I have played geocaching, Munzee, Ingress… and yesterday I played Pokémon Go.

Pokémon Go differs from the previous GPS-based games because of its massive popularity. Though the game has been released for barely a week, journalists estimate that it has 10 million users worldwide.

Some will object that Pokémon Go is not “really” an augmented reality game. Indeed, though it projects a small animal onto the image of your smartphone camera to give you the illusion that the animal is right there… the underlying software is not based on computational vision. It would simply not be possible in 2016… but all that matters to the player is that it “works”, it is convincing.

In comparison, I have put on Microsoft’s Hololens headset… It is considered to be “true” augmented reality in the sense that it realistically projects objects on top of your normal view… you can tilt your head and the object stays put. But playing Pokémon Go with Microsoft’s Hololens would be a miserable experience. For one thing, nobody would want to walk around in the street with a bulky headset. And it is debatable whether the Hololens projections feel more real than a Pokémon in Pokémon Go.

I don’t know how long Pokémon Go will thrive. Will it still be around in a year? Who knows? What really matters is that millions of people have now experienced the taste of augmented reality. There is no turning back.

The race is on to produce more convincing augmented reality hardware and software.

And this puts us right on track for the future described by Rainbows End.

Why does it matter? In the long run, augmented reality represents another pathway to extend human abilities.

Virtual Reality: First impressions with the HTC Vive

I just got my hands on some virtual-reality (VR) goggles. Specifically, we have an “HTC Vive“. We are still in the early days of VR and given that these goggles cost about a $1000, not everyone will get to try them outside a demo room. I thought I’d share my impressions.

  • There are indications that the HTC Vive needs a powerful PC. Because I am disorganized, I ended up with the HTC Vive goggles, but no corresponding powerful gaming PC. So I used what I had: my son’s gaming PC. A 5-year-old box. It fails to meet the “minimum” requirements set by HTC, but at no point did we ever encounter any performance problem. To be fair, I did not try to run any demanding game… simply because I have none to test… Still, it seems to me that the belief that VR requires very powerful machines might be overstated.
  • The HTC hardware is fantastic. It looks good and it is sturdy. I am sure that it will all look ridiculous in a few years, but it is quite usable today. It feels good. The HTC Vive comes with two great controllers.
  • Setting up the HTC Vive is a bit harder than just putting on the goggles. You need to setup a room with sensors at each end. Still, it is no big deal. The only annoying hardware issue we got was pairing the controllers with the system. It was a source of confusion. The hardest part was finding out where to click to pair the controller.
  • … which brings me to the software. The software is a bit flaky like most software tends to be. It looks good and it generally works, but once the server stopped responding and we had to “kill it” and another time, the demo would insist that we press the “system” keys whereas doing so never worked. Even so, the software is quite good already.
  • So how is the experience? Great. It simply works. Every demo I tried was convincing. It is just like I imagined it. Better than I imagined it in fact because my previous encounters (years ago) with VR were unconvincing.

So where do I see this going in the next couple of years?

  • The hardware is basically good enough. I am sure I will sound like a fool in five years when the current VR hardware looks obsolete, but I do not expect it to get a whole lot better, qualitatively speaking. What I do expect is that we will get cheaper versions that work nearly as well. Already, the upcoming Sony PlayStation VR is going to cost half as much as an HTC Vive.
  • Content is a problem right now. That is, you can get the goggles working, but you are left feeling that there ought to be something more interesting to do with them… What I hope we will see is an explosion of new applications and games.

What is next for me? I am getting a Sony PlayStation VR for my home. I was still slightly on the fence, but playing with the HTC Vive convinced me that the hardware was mature enough.

In time, I want to setup the HTC Vive so that I can program my own prototypes. As a scientist and engineer, I want to find out what else I can do with these goggles.

Fast random shuffling

In a random shuffle, you want to take the elements of a list and reorder them randomly. In a “fair” random shuffle, all possible permutations must be equally likely. It is surprisingly hard to come up with a fair algorithm. Thankfully, there is a fast and easy-to-implement algorithm: the Fisher-Yates shuffle. It is a rather intuitive algorithm and there are YouTube videos about it… so, I will just point you at a piece of C code:

for (i=size; i>1; i--) {
   int p = random_bounded(i); // number in [0,i)
   swap(array+i-1, array+p); // swap the values at i-1 and p
}

What can we expect to limit the speed of this algorithm? Let me assume that we do not use fancy SIMD instructions or parallelization.

If the input array is not in the cache, and we cannot fetch it in time or it is just too large, then cache faults will dominate the running time. So let us assume that the array is in the CPU’s cache.

If we have N input words, we go through the loop N – 1 times. At each iteration of the loop, you need to read two values and write two other values. A recent x64 processor can only store one value to memory per cycle, so we cannot do better than two cycles per input word. In the very next iteration, you may need to read one of the recently written values. So, two cycles per input word is probably optimistic.

What else could be the problem? The generation of the random numbers could hurt us. Let us assume that we are given a random number generation routine that we cannot change. For this blog post, I will stick with PCG.

What remains? Notice how the Fisher-Yates shuffle requires numbers in a range. The typical techniques to generate random numbers in a range involve frequent divisions.

For example, you might want to look at how the Go language handles it:

func (r *Rand) Int31n(n int32) int32 {
	max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
	v := r.Int31()
	for v > max {
		v = r.Int31()
	}
	return v % n
}

This function always involves two divisions. Java, the PCG library… all involve at least one division per function call, often many more than one. Sadly, divisions are many times more expensive than any other operation, even on recent processors.

In an earlier blog post, I showed how to (mostly) get around divisions.

In general, no map from all 32-bit integers to a range can be perfectly fair. In practice, the effect is quite small unless your range is close to the maximal value of an integer. Thus you can simply use the following function:

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  return multiresult >> 32;
}

Maybe you feel bad about introducing a slight bias. You probably should not since the random-number generation itself is unlikely to be perfect.

Still, we can correct the bias. Recall that some of the values are mapped ceil(4294967296/range) times whereas others are mapped floor(4294967296/range) times. By sometimes redrawing a new random value, we can avoid entirely the bias (this technique is called rejection sampling):

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  leftover = (uint32_t) multiresult;
  if(leftover < range ) {
      threshold = -range % range ;
      while (leftover < threshold) {
            random32bit =  random32();
            multiresult = random32bit * range;
            leftover = (uint32_t) multiresult;
      }
   }
  return multiresult >> 32;
}

This looks quite a bit worse, but the “if” clause containing divisions is very rarely taken. Your processor is likely to mostly ignore it, so the overhead of this new function is smaller than it appears.

So how do we fare? I have implemented these functions in C, using them to compute a random shuffle. Before each shuffle, I ensure that the array is in the cache. I report the number of clock cycle used per input words, on a recent Intel processor (Skylake). As usual, my code is available.

Random shuffle timings, varying the range function
range functioncycles per input word
PCG library18.0
Go-like20.1
Java-like12.1
no division, no bias7
no division (with slight bias)6

Avoiding divisions makes the random shuffle runs twice as fast.

Could we go faster? Yes. If we use a cheaper/faster random number generator. However, keep in mind that without SIMD instructions or multi-core processing, we cannot realistically hope to reach the lower bound of 2 cycles per input words. That is, I claim that no function can be 3 times faster than the fastest function we considered.

You can save a little bit (half a cycle per input word) if you replace the 32-bit PCG calls by 64-bit calls, processing input words in pairs. Using SIMD instructions, we could go even faster, but I do not have access to a SIMD-accelerated PCG implementation… We could, of course, revisit the problem with different random-number generators.

A fast alternative to the modulo reduction

Suppose you want to pick an integer at random in a set of N elements. Your computer has functions to generate random 32-bit integers, how do you transform such numbers into indexes no larger than N? Suppose you have a hash table with a capacity N. Again, you need to transform your hash values (typically 32-bit or 64-bit integers) down to an index no larger than N. Programmers often get around this problem by making sure that N is a power of two, but that is not always ideal.

We want a map that as fair as possible for an arbitrary integer N. That is, ideally, we would want that there are exactly 232/N values mapped to each value in the range {0, 1 ,…, N – 1} when starting from all 232 32-bit integers.

Sadly, we cannot have a perfectly fair map if 232 is not divisible by N. But we can have the next best thing: we can require that there be either floor(232/N) or ceil(232/N) values mapped to each value in the range.

If N is small compared to 232, then this map could be considered as good as perfect.

The common solution is to do a modulo reduction: x mod N. (Since we are computer scientists, we define the modulo reduction to be the remainder of the division, unless otherwise stated.)

uint32_t reduce(uint32_t x, uint32_t N) {
  return x % N;
}

How can I tell that it is fair? Well. Let us just run through the values of x starting with 0. You should be able to see that the modulo reduction takes on the values 0, 1, …, N – 1, 0, 1, … as you increment x. Eventually, x arrives at its last value (232 – 1), at which point the cycle stops, leaving the values 0, 1, …, (232 – 1) mod N with ceil(232/N) occurrences, and the remaining values with floor(232/N) occurrences. It is a fair map with a bias for smaller values.

It works, but a modulo reduction involves a division, and divisions are expensive. Much more expensive than multiplications. A single 32-bit division on a recent x64 processor has a throughput of one instruction every six cycles with a latency of 26 cycles. In contrast, a multiplication has a throughput of one instruction every cycle and a latency of 3 cycles.

There are fancy tricks to “precompute” a modulo reduction so that it can be transformed into a couple of multiplications as well as a few other operations, as long as N is known ahead of time. Your compiler will make use of them if N is known at compile time. Otherwise, you can use a software library or work out your own formula.

But it turns out that you can do even better! That is, there is an approach that is easy to implement, and provides just as good a map, without the same performance concerns.

Assume that x and N are 32-bit integers, consider the 64-bit product x * N. You have that (x * N) div 232 is in the range, and it is a fair map.

uint32_t reduce(uint32_t x, uint32_t N) {
  return ((uint64_t) x * (uint64_t) N) >> 32 ;
}

Computing (x * N) div 232 is very fast on a 64-bit processor. It is a multiplication followed by a shift. On a recent Intel processor, I expect that it has a latency of about 4 cycles and a throughput of at least on call every 2 cycles.

So how fast is our map compared to a 32-bit modulo reduction?

To test it out, I have implemented a benchmark where you repeatedly access random indexes in an array of size N. The indexes are obtained either with a modulo reduction or our approach. On a recent Intel processor (Skylake), I get the following number of CPU cycles per accesses:

modulo reductionfast range
8.12.2

So it is four times faster! No bad.

As usual, my code is freely available.

What can this be good for? Well… if you have been forcing your arrays and hash tables to have power-of-two capacities to avoid expensive divisions, you may be able to use the fast range map to support arbitrary capacities without too much of a performance penalty. You can also generate random numbers in a range faster, which matters if you have a very fast random number generator.

So how can I tell that the map is fair?

By multiplying by N, we take integer values in the range [0, 232) and map them to multiples of N in [0, N * 232). By dividing by 232, we map all multiples of N in [0, 232) to 0, all multiples of N in [232, 2 * 232) to one, and so forth. To check that this is fair, we just need to count the number of multiples of N in intervals of length 232. This count must be either ceil(232/N) or floor(232/N).

Suppose that the first value in the interval is a multiple of N: that is clearly the scenario that maximizes the number of multiples in the interval. How many will we find? Exactly ceil(232/N). Indeed, if you draw sub-intervals of length N, then every complete interval begins with a multiple of N and if there is any remainder, then there will be one extra multiple of N. In the worst case scenario, the first multiple of N appears at position N – 1 in the interval. In that case, we get floor(232/N) multiples. To see why, again, draw sub-intervals of length N. Every complete sub-interval ends with a multiple of N.

This completes the proof that the map is fair.

For fun, we can be slightly more precise. We have argued that the number of multiples was maximized when a multiple of N appears at the very beginning of the interval of length 232. At the end, we get an incomplete interval of length 232 mod N. If instead of having the first multiple of N appear at the very beginning of the interval, it appeared at index 232 mod N, then there would not be room for the incomplete subinterval at the end. This means that whenever a multiple of N occurs before 232 mod N, then we shall have ceil(232/N) multiples, and otherwise we shall have floor(232/N) multiples.

Can we tell which outcomes occur with frequency floor(232/N) and which occurs with frequency ceil(232/N)? Yes. Suppose we have an output value k. We need to find the location of the first multiple of N no smaller than k 232. This location is ceil(k 232 / N) Nk 232 which we just need to compare with 232 mod N. If it is smaller, then we have a count of ceil(232/N), otherwise we have a count of floor(232/N).

Further reading:

(Update: I have made the proof more intuitive following a comment by Kendall Willets.)

I do not use a debugger

I learned to program with BASIC back when I was twelve. I would write elaborate programs and run them. Invariably, they would surprise me by failing to do what I expect. I would struggle for a time, but I’d eventually give up and just accept that whatever “bugs” I had created were there to stay.

It would take me a decade to learn how to produce reliable and useful software. To this day, I still get angry with people who take it for granted that software should do what they expect without fail.

In any case, I eventually graduated to something better: Turbo Pascal. Turbo Pascal was a great programming language coupled with a fantastic programming environment that is comparable, in many ways, to modern integrated development environments (IDEs). Yet it is three decades old. It had something impressive: you could use a “debugger”. What this means is that you could run through the program, line by line, watching what happened to variables. You could set breakpoints where the program would halt and give you control.

At the time, I thought that programming with a debugger was the future.

Decades later, I program in various languages, C, JavaScript, Go, Java, C++, Python… and I almost never use a debugger. I use fancy tools, and I certainly do use tools that are called debuggers (like gdb), but I almost never step through my programs line-by-line watching variable values. I almost never set breakpoints. I say “almost” because there are cases where a debugger is the right tool, mostly on simple or quick-and-dirty projects, or in contexts where my brain is overwhelmed because I do not fully master the language or the code. This being said I do not recall the last time I used a debugger as a debugger to step through the code. I have a vague recollection of doing so to debug a dirty piece of JavaScript.

I am not alone. In five minutes, I was able to find several famous programmers who took positions against debuggers or who reported barely using them.

I should make it clear that I do not think that there is one objective truth regarding tools. It is true that our tools shape us, but there is a complex set of interactions between how you work, what you do, who you work with, what other tools you are using and so forth. Whatever works for you might be best.

However, the fact that Linus Torvalds, who is in charge of a critical piece of our infrastructure made of 15 million lines of code (the Linux kernel), does not use a debugger tells us something about debuggers

Anyhow, so why did I stop using debuggers?

Debuggers were conceived in an era where we worked on moderately small projects, with simple processors (no thread, no out-of-order execution), simple compilers, relatively small problems and no formal testing.

For what I do, I feel that debuggers do not scale. There is only so much time in life. You either write code, or you do something else, like running line-by-line through your code. Doing “something else” means (1) rethinking your code so that it is easier to maintain or less buggy (2) adding smarter tests so that, in the future, bugs are readily identified effortlessly. Investing your time in this manner makes your code better in a lasting manner… whereas debugging your code line-by-line fixes one tiny problem without improving your process or your future diagnostics. The larger and more complex the project gets, the less useful the debugger gets. Will your debugger scale to hundreds of processors and terabytes of data, with trillions of closely related instructions? I’d rather not take the risk.

My ultimate goal when work on a difficult project is that when problems arise, as they always do, it should require almost no effort to pinpoint and fix the problem. Relying on a debugger as your first line of defense can be a bad investment, you should always try to improve the code first.

Rob Pike (one of the authors of the Go language) once came to a similar conclusion:

If you dive into the bug, you tend to fix the local issue in the code, but if you think about the bug first, how the bug came to be, you often find and correct a higher-level problem in the code that will improve the design and prevent further bugs.

I don’t want to be misunderstood, however. We need to use tools, better tools… so that we can program ever more sophisticated software. However, running through the code line-by-line checking the values of your variables is no way to scale up in complexity and it encourages the wrong kind of designs.

How fast is tabulation-based hashing? The downsides of Zobrist…

In practice, hashing is the process of taking an input, such as a string, and turning it into an integer value. It is a fundamental tool in programming, as most software relies on hashing in one way or another. We often expect hashing to appear “random” so that any two strings are unlikely to have the same hash value.

One of the earliest and simplest forms of hashing is Zobrist hashing. Suppose you want to hash strings of up to 16 characters. For each possible 16 character positions, you generate a table made of 256 randomly chosen words. To hash a given string, you pick the first character, look up the corresponding chosen word in the first table, then you pick the second character and look up the word in the second table… and you XOR the retrieved words. In C, you get something like this:

for (size_t i = 0; i < length ; i++ )
      h ^= hashtab [ i ] [ s[ i ] ];
return h;

Zobrist hashing is an instance of a more general class of hashing said to be tabulation-based.

The benefits of Zobrist are obvious. It is simple. It can be implemented in seconds. It has excellent theoretical properties and is generally very convenient.

The first downside is also obvious: Zobrist hashing uses a lot of memory. To be able to hash all 4-byte strings to 64-bit values, you need 8 KB. But let us ignore this obvious downside.

State-of-the-art universal hash functions such as CLHash can process over 10 bytes per cycle on a recent Intel processor. The good old CityHash can process over 4 bytes per cycle.

How fast can Zobrist hashing be? To get a better idea, I implemented Zobrist hashing as a small C library. How fast is it? Not very fast. I can barely process 0.65 bytes per cycle when hashing repeatedly the same short string, taking into account the function-call overhead. Thus, tabulation-based hashing is probably about an order of magnitude slower than commonly used fast hash functions, assuming you can avoid cache misses.

Software reference. zobristhashing: Zobrist hashing in C

The strange case of the copyright of open-source software

Economists make a grave mistake when they fail to mention open-source software as one of the critical innovation of our era. Open-source software offers a great reference for the type of innovation we will get in the post-industrial era. To economists, open-source software does not matter because it does not count very much toward the GDP… but I think it is a critical hidden variable.

The Internet runs largely on open-source software from its servers (Apache) to its operating-system (Linux). The current smartphone war is being won by Android, an open-source system that uses the open-source Linux kernel. Cloud computing is largely a matter of open-source software, again based on Linux.

In our current legal system, we can own information through the concept of copyright. You would think that who owns this software would be a critical issue.

So who does? In many instances, it is one company, such as Oracle and Google. Or sometimes it might be a particular university. But in a lot of important cases, the copyright is simply owned by whoever wrote it (which might include companies):

  • Linux is copyrighted by “Linus Torvalds and others who actually wrote it”.
  • Most web browsers (Chrome, Opera, Safari) are based on WebKit. Who owns WebKit? Its authors. WebKit, like Linux, started out as one-man project (KHTML by Lars Knoll).

Why is it worth noting?

Many would assume that critical pieces of software should be owned and controlled by a large company, a reputed university or another impressive organization. For end-user software, that is often the case. But much of the underlying software is written by regular programmers without much fanfare… It is put together without permission.

My theory is that open-source software is an invisible college for programmers. Software programmers have an impact that is out of proportion with how much they get paid (and they get paid well enough) in part because of all the brilliant processes they have invented. Programmers are masters at collaboration. And open-source software is one of these ingredients.

Many people, including many professional programmers remain unaware… they often work for a corporation, and deal with other programmers working for other serious sounding companies… they never stop to consider that much of the underlying infrastructure is built through an invisible college.

I asked a few people who owned Linux. Many people told me that it was the Linux Foundation. That’s false. I asked how open Apple was regarding its browser software, and most people told me that Apple was secretive and that all their software was built in-house in a proprietary fashion. That’s false.

All these computer companies along with a good fraction of the very best independent programmers take part in the invisible college that is open-source software. This invisible college is not based on authority. Prestige buys you little good will. Contributions are what matters in the end. And the copyright ownership is a reflection of this underlying culture.

To be smart, work on problems you care about

Never do anything that bores you. My experience in science is that someone is always telling to do something that leaves you flat. Bad idea. I’m not good enough to do something I dislike. In fact, I find it hard enough to do something that I like. (James Watson)

A high school student who is preparing for college recently asked me a few questions by email. He wants to study computer science. I thought that his email was great so I will reply publicly. Let us call him John.

John says… I intend to major in Computer Science (though I’ve also considered majoring and/or minoring in neuroscience or math, of some sort). So, I’ll obviously be taking math courses. Now, for most of my life, I’ve pretty much hated math. I had a poor early educational experience in general and came to despise math, even though I typically got good grades.

I don’t think we should be surprised by John’s feelings about mathematics. We spend a great deal of time teaching stupid things like how to divide 46712 by 13 using a long division, with a pen and some paper. Or adding fractions like 3/5 and 7/9. That would be like getting students to take 5 years of Microsoft Office classes and then be surprised that they hate “computer science”.

How do you approach learning new math concepts? And, in particular, how do you understand formulas? (…) I feel like I am probably missing the bigger picture somehow, and failing to gain a true understanding of math.

I work from examples that I care about. In fact, I avoid learning new mathematical concepts for their own sake. It is a matter of motivation. Having a practical problem I care about activates my intelligence. Boredom makes me dumb.

Have you learned “new math” since obtaining your Ph.D.? When you go to work every day, do you gain new insights and learn things about math, like, say, a programmer would?

I do not use a lot of mathematics as a computer scientist beyond the basics: set theory, Boolean logic, elementary probability theory, some descriptive statistics, functions and relations, linear algebra. You can go a very long way with this small amount of mathematics.

In my experience, mathematics suffers from a diminishing return effect. The basics are very, very useful… but more advanced mathematics often turns into solutions seeking problems. It is best to learn the most advanced mathematics as needed.

John says… A lot of math involves calculations, but I’ve begun to realize that calculating isn’t really what math is about. It seems math is a far more abstract topic and calculating is just an “unfortunate” chore that must be done before having the real fun. Problem is, up to now I’ve always thought that the calculations were the math, and that was all there was to it. How do you abstract away from the rote number crunching and gain an intuitive grasp of math concepts? Must one be good at quick mental calculations and basic math to truly understand and enjoy math?

Fast and accurate number crunching might be useful to impress teachers. But computers are about a billion times better than the best human being could ever be.

John says… How do you learn new math? When you do, have you always found it necessary to work through lengthy sets of problems involving the new concept? If so, how do you learn math outside of textbooks and college courses when problem sets are not present? Have you always just learned from college courses? Do you find math as presented in textbooks dry? Is there, perhaps, another, better way to learn math?

I have gotten rid of my books several years ago. I am a university professor, but if you come to my office, I have no textbook. For some classes I teach, I expect the students to get a textbook, but that’s mostly to satisfy students’ expectations.

Solving problems is definitively the best way to learn mathematics. For most topics, there are thousands of great problems online. There are also great YouTube videos and so forth.

It is not that textbooks are bad… but learning mathematics from a handful of textbooks is like trying to learn about love by reading a handful of romance novels. You need to get out there, find your own references.

Obviously, you have a Ph.D. in math. If you could do things over again, would you get the Ph.D., or not? Would you major in another field?

Degrees, especially advanced degrees, are overrated. You sometimes need degrees to get through the door. I bet that it is a lot easier to get an interview with Google if you have a computer science degree. If you want to get a research job, the Ph.D. makes things easier. But, the degree itself is a small thing.

Think of it as being clean. It is hard to get a job if you are dirty. But if you clean yourself up, you may still never get the job you want. The fact is that is hard to get good jobs, even with a computer-science degree.

Whether you have a college degree or not, only about a third of all workers are enthusiastic about their work. A degree gives you access, potentially, to higher paying jobs… but it won’t make your career.

So don’t go just do a degree. Do something cool and fun on the side. Build a mobile app. Build a VR mini-game. Design a website. Build a new AI. Find something you like doing and get good at it. It comes down to the same idea: work on examples you care about.

Would I still get a Ph.D.? That’s a bad question. A better question: Do I advise students to get PhDs? No. The rule is that you should only get a Ph.D. is you cannot imagine living without one. Would I discourage my own boys from getting a Ph.D.? Yes. Do I encourage them to think about college? No. I repeatedly, tenaciously, discourage my kids from focusing on diplomas and degrees. I keep telling them not to wait for the teachers. Do cool stuff now, on your own.

A degree can be a useful tool, but that’s all it is. If that’s all you have, you have no career. You are at a high risk of ending up in a dead-end job if you get a job at all. There are more boring jobs out there than you know.

Do you have any suggestions or advice for a soon-to-be college student? I’m a pretty solid introvert, and mostly keep to myself – so I’m not looking for the “Don’t party, focus on your studies” advice :). I’m more looking for intellectual sort of advice. Perhaps suggestions on how to get more out of college than I otherwise might?

  • Courses, professors, and grades matter less than you might think. Make sure you get the required grade. Make sure you take the right course but don’t worry more than needed about it. In the real world, employers don’t care whether you aced your math. courses.
  • The people you interact with (and that’s not necessarily the professors) matter a great deal. If you are studying a topic, try to meet people either on campus on online who are interested in this topic.
  • Don’t focus on the degree and the courses. Build strong side interests. Acquire a demonstrable expertise. Do something that people might care about and then tell the world. Remember that it is how Linus Torvalds started. It is how the web started. Who cares whether Linus got a good great in math? Just do stuff, and do not wait passively for your teachers to tell you what to do. Again, do projects you care about, you are going to learn a lot more than you could otherwise. And learn to tell the world about what you do. Communicating your ideas is just as important as doing the cool work. Fanciful mathematics should be far down the list of priorities unless that is your thing.

What if you are not interested in anything, and you prefer to just go to your classes and maximize time spent in front of the TV? If you don’t know what to do, and you prefer being told what to do, that’s fine. Just don’t complain later when people pick your work for you.