Testing non-cryptographic random number generators: my results

In software, we use random number generators to emulate “randomness” in games, simulations, probabilistic algorithms and so on. There are many definitions of what it means to be random, but in practice, what we do is run statistical tests on the output of the random number generators.

These tests are not perfect, because even a purely random sequence could fail particular tests just out of bad luck. Extraordinarily bad events do happen on occasion. However, we can repeat these tests and if they keep failing, we can gain confidence that something is wrong. By “wrong”, we mean that the output does not quite look random.

One concern is that running these tests can be difficult, and inconvenient. To alleviate this problem, I created a GitHub repository where you can find scripts that should allow you to test several different random number generators using a single command line.

I am focusing on fast non-cryptographic random number generators, the type that most programmers use for hash tables, simulations, games, and so forth. I stress that we are not interested in cryptographic security. Cryptography is a whole other world that we are going to leave to cryptographers.

The contenders

I chose the following generators since they are widespread and fast:

  • splitmix64 is a random number generator part of the standard Java API. It produces 64-bit numbers.
  • pcg32 and pcg64 are instances of the PCG family designed by O’Neill. They produce either 32-bit or 64-bit outputs.
  • xorshift32 is a classical xorshift random number generator you can read about in textbooks.
  • Finally, xorshift128plus and xoroshiro128plus are recently proposed random number generator by Vigna et al. They produce 64-bit numbers.

No doubt I could have added many more…

Speed

First, I decided to look at raw speed on recent x64 processors:

xoroshiro128plus 0.85 cycles per byte
splitmix64 0.89 cycles per byte

xorshift128plus 0.91 cycles per byte

pcg64 1.27 cycles per byte

pcg32 1.81 cycles per byte

xorshift32 2.33 cycles per byte

Basically, splitmix64 and the generators proposed by Vigna et al. are roughly equally fast. O’Neill’s PCG schemes are slightly slower. I should point out that all of them are much faster than whatever your C library provides (rand).

Let us move on to statistical testing.

Practically Random

A convenient testing framework is Practically Random. It is a recently proposed piece of code that will eat as many random bytes as you want and check for randomness. You can let the program run for as long as you want. I went with a nice round number: I test 64GB of random bytes.

Only splitmix64 and the PCG schemes pass Practically Random.

You can, however, make xoroshiro128plus pass if you turn it into a 32-bit generator by dropping the least significant bits. Naturally, if you do so, you will diminish the speed of the generator by half. You might be able to do well by discarding fewer than 32 bits, but I did not investigate this approach because I prefer generators to produce either 32 bits or 64 bits.

TestU01

(Temporary warning: it appears that I ran the tests three times with the same seed. I will have to rerun my tests and update these results accordingly. Some of the failures I report could get discarded if they cannot be easily reproduced with other seeds. The failures I report are real, but concern just one seed.)

Another well-established framework is L’Ecuyer’s TestU01. I run TestU01 in “big crush” mode using three different seeds. Only when I see repeated failures with distinct seeds do I report a failure.

  • xoroshiro128+ fails CollisionOver, HammingIndep, MatrixRank and RandomWalk1
  • splitmix64 fails CollisionOver and MaxOft
  • xorshift128+ fails CollisionOver and MatrixRank
  • pcg32 fails CollisionOver
  • pcg64 fails Run
  • xorshift32 fails at almost everything

Evidently, L’Ecuyer’s big crush benchmark is hard to pass. I should stress that it is likely possible to cause more failures by changing the conditions of the tests. That is, it is not because I do not report a failure that one does not exist, I may simply not have detected it.

TestU01 provides p-values for test failures: I choose to go with whatever the authors of the benchmark defined as a failure.

Discussion

What should you conclude? Speaking for myself, I am quite pleased at splitmix64’s results. It is very fast, already available in Java, and so forth.

Going forward!

Part of my motivation when writing this blog post was Vigna’s remark: “Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*.”

I am hoping that this blog post helps fill this gap. Evidently, my analysis is incomplete and we need to keep investigating. However, I hope that I have given you an interest for testing random number generators. If you grab my GitHub repository, it should be easy to get started.

Speaking for myself, as an independent party, I would rather have independent assessments. It is fine for O’Neill and Vigna to have Web sites where they compare their work to other techniques, but it is probably best for all of us if we collectively review these techniques independently. Please join me. To get you started: it is possible that I made mistakes. Please apply Linus’s Law and help dig out the bugs!

What would be interesting is to help document better these random number generators. For example, Xoroshiro128+ has a wikipedia entry (that looks messy to me), but the other schemes I have considered do not, as far as I can tell, have wikipedia entries, yet they are worthy of documentation in my opinion.

Disclaimer: I have no vested interest whatsoever in the success or failure of any of these generators. As a Java programmer, I am slightly biased in favor of splitmix64, however.

Cracking random number generators (xoroshiro128+)

In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd college, advocates against using random number generators that make such predictions trivially easy.

I like to call this process “cracking” because it is akin to getting access to the secret information you are not supposed to have as a user of this function. Also, you could possibly use this approach to win at online poker, if they were silly enough to use a simple random number generator. They are not so stupid, are they?

The JavaScript engine inside the Google Chrome browser uses the XorShift128+ random number generator, created by Vigna. Douglas Goddard, a security expert, explains how one can crack this generator. He uses it to “hack the JavaScript lottery”. I’m hoping that no online casino relies on XorShift128+. Probably not.

XorShift128+ is a relatively weak random number generator. Blackman and Vigna recommend upgrading to the stronger xoroshiro128+.

You can examine the xoroshiro128+ function. It is simple enough, but it does not look like you could easily crack it at first glance.

Should you use it for your online casino? Maybe not.

I was intrigued by a blog post where John D. Cook illustrating how you could crack xoroshiro128+. John used examples provided to him by O’Neill.

O’Neill explains that the cracking process is trivial and takes much less than a second. In effect, given two consecutive 64-bit “random” numbers, you can often completely infer the inner state of the random number generator, and thus predict all future numbers.

O’Neill did not tell me how to do it, and yet I managed it in the middle of a busy Monday. So it is not nearly as hard as it may appear at first.

I’m not going to reveal O’Neill’s secret, as it would be no fun, but I can do it in about 5 lines of code. If you crack it, please don’t reveal the secret right away. Instead, consider posting simply a proof that you cracked it. (Update: at least one person did it quickly after reading my blog post.)

How can I prove that I can do it? One way to do it is to start with a string of 16 bytes, say one that spells out “Daniel Lemire” (my name) in ASCII characters with appropriate padding to make it to 16 bytes. From this string, I can infer the original seed necessary to so that the random number generator produces in sequence these 16 bytes.

I make available the short C code. It starts from an apparently random seed for the random number generator, and then calls the xoroshiro128+ function several times. From a bash shell with a C compiler, you can type cc -o name name.c && ./name|hexdump -C to execute it. Sure enough, it will display my name.

uint64_t s[2];

static inline uint64_t rotl(const uint64_t x, int k) {
	return (x << k) | (x >> (64 - k));
}

uint64_t next(void) {
	const uint64_t s0 = s[0];
	uint64_t s1 = s[1];
	const uint64_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
	s[1] = rotl(s1, 36); // c
	return result;
}
int main(int argc, char **argv) {
    freopen(NULL, "wb", stdout);
    s[0] = 0X922AC4EB35B502D9L;
    s[1] = 0XDA3AA4832B8F1D27L;
    for(int k = 0; k < 5; k++) {
        uint64_t value = next();
        fwrite((void*) &value, sizeof(value), 1, stdout);
    }
}
$ cc -o name name.c && ./name|hexdump -C
00000000  00 20 44 61 6e 69 65 6c  00 4c 65 6d 69 72 65 20  |. Daniel.Lemire |
00000010  62 c2 f9 75 7d 37 39 99  54 d8 dd a3 08 5e cd a9  |b..u}79.T....^..|

I should stress that you cannot “brute force” this problem. The state space spans 128 bits. That’s way too many degrees of freedom!

With some elementary algebra, you can bring down the degrees of freedom to 64 bits, but that’s still too many for brute force.

So you need some cleverness to crack it, but not a lot. A smart high school student can do it.

What does it mean?

  • It means that if I have access to a few outputs of xoroshiro128+, then I can infer all future and previous random numbers.
  • It also means that if I can manipulate the seed of the function, I can make it produce almost anything sequence of 16 bytes I desire.

I should point out that the same is true of most random number generators in widespread use today. Cryptographic random number generators should probably be used if you want to open a casino.

Further reading: Hacking casinos by cracking random number generators is a real issue as explained in Russians engineer a brilliant slot machine slot (Wired). You might also enjoy my post Testing non-cryptographic random number generators: my results.

Science and Technology links (August 18th, 2017)

We’d like, one day, to transplant pig organs into human beings. Sadly, this is currently very dangerous because, even though pigs are very similar to us, the small differences are cause for concern. Harvard’s George Church and his collaborators have shown that we can use gene editing (CRISPR) to make pig organs safer for human beings. It is believed that we could do the first pig-to-human transplantations within two years.

Dota 2 is a competitive video game where several human beings fight against each other. A startup called OpenAI and started by Elon Musk has defeated a single good player at the game. Details are still scarce but it seems like a worthwhile accomplishment, albeit it is not the case that human players are now obsolete.

The world’s oldest man, Israel Kristal, died. He was a Holocaust survivor and suffered tremendously, one suspects, as a young man. He was 113 years old.

You can use Google to dictate your documents in 119 languages.

Our hearts do not repair themselves very well. They inexorably age. Scientists have demonstrated that by injecting stem cells, they could rejuvenate old hearts. It works, in rats. CNN covered the study.

Arthritis is becoming more common, even after correcting for age and weight. We don’t know why.

There is no such thing as a healthy overweight person:

Conversely, irrespective of metabolic health, overweight and obese people had higher coronary heart disease risk than lean people. These findings challenge the concept of ‘metabolically healthy obesity’, encouraging population-wide strategies to tackle obesity.

Some company is making a tiny disk drive with a 50TB capacity. That’s a lot. (The American Library of Congress uses 15 TB.)

Damore, Google: my thoughts

So an engineer called Damore wrote a memo that opposed Google’s diversity policies and he was fired. Here is a questions-and-answers with me. I am the interviewer and the interviewee.

  • Did you read Damore’s memo?

    No. Not all of it. It wasn’t inspiring. To be fair, I am not sure that Damore meant it to become a best-seller.

    I did appreciate the plots where he illustrates the difference between population statistics and individual differences.

  • Wait! You actually looked a Damore’s memo? You should burn…

    No. It is fine to read anything you like.

  • Do you think that Damore should have been fired?

    Yes.

    As an employee, you have to tread carefully when opposing management and your colleagues. I’d say that this extends even to colleges. Writing an essay that effectively explains to your superiors that what they are doing is anti-science, even if you are right, is problematic. Even more so if you occupy a junior position.

    It matters that Damore wrote his memo as a Google employee. Had he written an essay about the topic on a blog, or in a scientific journal of some sort, where he would have obfuscated the context sufficiently, things would be quite different.

    Note that even if management invites comments, it is no reason to start being critical.

    That is not to say, of course, that you should never be critical in the workplace. But unless you are explicitly mandated to provide a critical counterpoint to a given policy, you should abstain.

    More significantly, Damore became both a legal and reputational liability. Irrespective of the facts, keeping Damore around could give the impression that Google did have a toxic work environment for women. It is possible to build a convincing narrative around what Damore wrote.

    Note that it does not matter one iota if Damore is right scientifically.

  • Daniel, you are ignorant of the climate at Google. People can speak freely and even insult managers!

    Maybe so. But anecdotes do not invalidate general rules. Lots of very old people have smoked for decades. It does not mean that smoking is safe.

    It is my belief that what Damore did was dangerous.

    Put yourself in the shoes of a manager. Your management practices are not perfect. You know this. But it is a compromise between what is possible and what would be ideally possible. When employees start questioning your actions, from their point of view, everything is easy. “Your diversity policies are failing, scrap them”. However, you have another point of view, from higher-up. The criticism from low-level employees is not helping you nearly as much as they think it does, and it can make running your business harder.

    What are you going to do with employees that make your life harder as a manager? If you do not somehow dispose of them, you risk having a harder and harder time being efficient.

  • But Damore only had good intentions!

    He thought that Google’s policies were stupid and he wanted to put pressure on them so that they would be abolished. His intentions were clearly to put pressure on the management of the company. Had he been interested in scholarship, he could have written a scholarly paper in his spare time.

    You can go work for another employer, or start your own company, or let the policies run their course, or work to go up in the company and write your own policies.

    Damore’s behavior after being fired exposes him somewhat. The Goolag t-shirt… the interviews with some online personalities that are believed to be anti-women… So unwise…

  • So employees should never criticize employers?

    Sure, they should and they do. But how you go about it matters.

  • So you still have to attend diversity seminars even if you think it is stupid?

    Yes, if that’s what your boss suggests you do. Or you can go find another job elsewhere.

  • But that’s terrible!

    Yes, earning a six-figure income for pushing code and occasionally attending diversity seminars is terrible.

  • So you think we should ban sexy characters in video games?

    No. I have a figurine of B2 the android on my desk.

  • Wait a minute! You are a sexist pig!

    (Me leaving the studio…)

On Melissa O’Neill’s PCG random number generator

Computers often need random numbers. Most times, random numbers are not actually random… in the sense that they are the output of a mathematical function that is purely deterministic. And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe… it seems more likely that our universe is deterministic but that our limited access to information makes randomness a useful concept. Still, very smart people have spent a lot of time defining what random means, and it turns out that mathematical functions can be said to produce “random” outputs in a reasonable sense.

In any case, many programmers have now adopted a new random number generator called PCG and designed by professor Melissa O’Neill from Harvey Mudd college.

What O’Neill did is quite reasonable. She asked herself whether we could produce better random number generators, she wrote a paper and published code. The result was quickly adopted by engineers worldwide.

She also submitted her paper for consideration in what I expect to be a good, well-managed journal.

Her manuscript became lengthy in time and maybe exceeded some people’s style sensibilities, she justifies herself in this manner:

I prefer to write papers that are broadly accessible. I’d rather write a paper that can be enjoyed by people who are interested in the topic than one that can only be understood by a tiny number of experts. I don’t agree with the philosophy that the more impenetrable the paper, the better the work must be! Describing desirable qualities in detail seemed to be necessary for the paper to make sense to anyone not deeply entrenched in the field. Doing so also seemed necessary for anyone in the field who only cared about a subset of the qualities I considered desirable—I would need to convince them that the qualities they usually didn’t care about were actually valuable too.

As I pointed out, she had a real-world impact:

While attending PLDI and TRANSACT in June of 2015, I got one of the first clues that my work had had real impact. I can’t remember the talk or the paper, but someone was saying how their results had been much improved from prior work by switching to a new, better, random number generator. At the end I asked which one. It was PCG.

Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result:

I’d be extremely careful before taking from granted any claim made about PCG generators. Wait at least until the paper is published, if it ever happens. (…) Several claims on the PCG site are false, or weasel words (…) You should also be precise about which generator you have in mind—the PCG general definition covers basically any generator ever conceived. (…) Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*.

Her paper was not accepted. She put it in those terms:

What was more interesting were the ways in which the journal reviewing differed from the paper’s Internet reception. Some reviewers found my style of exposition enjoyable, but others found it too leisurely and inappropriately relaxed. (…) An additional difference from the Internet reaction was that some of the TOMS reviewers felt that what I’d done just wasn’t very mathematically sophisticated and was thus trivial/uninteresting. (…) Finally, few Internet readers had complained that the paper was too long but, as I mentioned earlier, the length of the paper was a theme throughout all the reviewing. (…) Regarding that latter point, I am, on reflection, unrepentant. I wanted to write something that was broadly accessible, and based on other feedback I succeeded.

I emailed O’Neill questions a couple of times, but she never got back to me.

So we end up with this reasonably popular random number generator, based on a paper that you can find online. As far as I can tell, the work has not been described and reviewed in a standard peer-reviewed manner. Note that though she is the inventor, nothing precludes us to study her work and write papers about it.

John D. Cook has been doing some work in this direction on his blog, but I think that if we believe in the importance of formal scientific publications, then we ought to cover PCG in such publications, if only to say why it is not worth consideration.

What is at stake here is whether we care for formal scientific publications. I suspect that Cook and O’Neill openly do not care. The reason you would care, fifty years ago, is that without the formal publication, you would have a hard time distributing your work. That incentive is gone. As O’Neill points out, her work is receiving citations, and she has significant real-world impact.

At least in software, there has long been a relatively close relationship between engineering and academic publications. These do not live in entirely separate worlds. I do not have a good sense as to whether they are moving apart. I think that they might be. Aside from hot topics like deep learning, I wonder whether the academic publications are growing ever less relevant to practice.

Further reading: You might also enjoy my post Testing non-cryptographic random number generators: my results.

Bubbling up is lowering empathy at a civilization scale

Computer networks are a fantastic invention. When they came in my life, I remember spending hours, sometimes days, arguing with people I violently disagreed with. At first glance, it looks like a complete waste of time, but experience has taught me that it is tremendously precious. Not because it changes minds, but because it keeps minds opened.

My friend Seb Paquet invented the concept of ridiculously easy group formation. Seb observed in the early 2000s that it was now possible to create “groups” using web tools with very little effort. I have always been suspicious of groups. I prefer open networks to tightly organized groups. Still, I initially viewed Seb’s observation positively. People organizing more easily ought to foster a more collaborative society.

Maybe it did, at least initially, have very positive outcomes, but I am now very concerned with one particularly vicious side-effect: people organize along “party lines” for conflict. Instead of arguing with people we disagree, instead of seeking common ground, we “unfriend them”. In effect, we are creating cognitive bubbles of like-minded individuals around us. I believe that this might lower empathy at a civilization scale.

I don’t want to bring everything back to Donald Trump, but he is an obligatory reference. It is not that Trump is so interesting to me, but what he reveals is important. A week before the election, a student of mine asked me if Trump could get elected. I categorically said no. Trump could not get elected. How could it be given that all the smart people I know predict he won’t?

Then Trump was elected.

You can call it an accident if you want, but I don’t think it is a fair qualification. What this exposed to me is that I had bubbled up too much. I was no longer able to see the world in a sufficiently clear manner to make reasonable predictions. I should have been able to anticipate Trump’s election. I wasn’t. I failed.

I can be excused because I am Canadian. But the US is our powerful neighbor so I ought to know what is going on over there.

So what did I do? I decided to subscribe to blogs and podcasts of people who do support Trump. Scott Adams, the cartoonist behind Dilbert, is first on my list. I have been following Scott Adams and he has exposed me to a whole other way to view the world. At least as far as Trump is concerned, Scott Adams has a much better predictive record than anyone else I know.

It is not that Scott is right and other people are wrong. It would be nice if we could divide things up into right and wrong, black and white. It is never that simple, even when you think it should.

But I don’t think that’s what most people who did not see the election of Trump coming did. I suspect that many of them just doubled down on the bubbling. They started to actively seek out anyone who might not toe the party line and to exclude them from their view. So we got a more divided world.

We have lots of experience with such political bubbles. Fascism, Soviet-era Russia and Mao’s China and glaring examples of what this can lead to in the most extreme cases: brutal violence. We don’t want that. It is hell on Earth.

And it may be reassuring to think that it is your side that will crush the other side, but that’s very dangerous thinking. The very people who are “on your side” may soon either turn against you or hurt you indirectly. That’s what history taught us. Very few people did well under Mao.

We are all tempted by virtue signaling. Given any topic, we look around in our group and tend toward what is perceived as the “moral thing”. It is a powerful force. Sadly, it is also what makes fascism possible in the first place. It is at the core of our inner Nazi.

I believe that, inadvertently maybe, we have built tools (software and laws) that favor virtue signaling and bubbling up against growing empathy for diverse point of views. We create groups, disconnected networks and get stuck in a repeating feedback loop. Our positions become fossils, unable to evolve or change once set.

A friend of mine once remarked how surprisingly difficult it can be to host a blog where people come to vehemently disagree. And that’s something that we are losing out. Walled garden with algorithmically selected posts are replacing blogs. The bloggers who remain often close their comment feeds, or close it for “the other side”. We are pushing the debates at the edges, down in the YouTube comments where all we get is toxicity.

I am not arguing for a renewal of blogging, but I am arguing that “bubbling up” should become pejorative. We should not celebrate people who cater to people sharing a given dogma. We should look at these people with suspicion. And, yes, maybe we need to make it easier, through technology, for people to find diverse points of views. So, on a given issue, instead of presenting to users only whatever is most likely to please them, we should present them a spectrum of views.

And we definitively need to stop needlessly characterizing individuals. It’s not just the name calling per se that should stop, but the clustering of individuals into the acceptable and the unacceptable. Identity politics should go.

It is hard to take this path, it is much easier to continue with virtual signaling and bubbles, but I think that most of us don’t feel that we really belong in these closed groups in the first place. Most of us have nuanced point of views. Most of us don’t like to divide the world in two. Most of us are open and undecided. Most of us are interested in talking with people who have different points of views. We cherish it.

It takes social engineering to make the worse of us come out. The Nazi regime had to forcefully close down Jewish stores because Germans, even under Nazi rule, liked to do business with Jews. If we take force out of the equation, people do get along, even when they disagree.

Final quotes:

  • “He who knows only his own side of the case knows little of that. His reasons may be good, and no one may have been able to refute them. But if he is equally unable to refute the reasons on the opposite side, if he does not so much as know what they are, he has no ground for preferring either opinion… Nor is it enough that he should hear the opinions of adversaries from his own teachers, presented as they state them, and accompanied by what they offer as refutations. He must be able to hear them from persons who actually believe them…he must know them in their most plausible and persuasive form.” (John Stuart Mill)

  • Confronting, hearing, and countering offensive speech we disagree with is a skill. And one that should be considered a core requirement at any school worth its salt.

    In that regard, recent incidents suggest that colleges are fundamentally failing their students in imparting these skills. In just the past few weeks, from one campus to another and another and another, liberal students have silenced conservative speakers with violence, outrage, and threats. This collection of heckler’s vetoes is the farthest thing from a victory for the progressive causes these students champion.

    These incidents have not shut down a single bad idea. To the contrary, they’ve given their opponents’ ideas credence by adding the power of martyrdom. When you choose censorship as your substantive argument, you lose the debate.

    (Lee Rowland, Senior Staff Attorney, ACLU)

Optimizing polynomial hash functions (Java vs. Swift)

In software, hash functions are ubiquitous. They map arbitrary pieces of data (strings, arrays, …) to fixed-length integers. They are the key ingredient of hash tables which are how we most commonly implement maps between keys and values (e.g., between someone’s name and someone’s phone number).

A couple of years ago, I pointed out that you could almost double the speed of the default hash functions in Java with a tiny bit of effort. I find it remarkable that you can double the performance of standard ubiquitous functions so easily.

Richard Startin showed that this remains true today with Java 9. I used String hashing as an example, Richard makes the same demonstration using the Arrays.hashCode function, but the idea is the same.

You might object that, maybe, the performance of hash functions is irrelevant. That might be true for your application, but it gives you a hint as to how much you can speed up your software by tweaking your code.

In any case, I decided to used to experiment as a comparison point with Apple’s Swift language. Swift is the default language when building iOS applications whereas Java is the default language when building Android applications… and, in this sense, they are competitors.

I am not, for example, trying to determine whether Swift is better than Java. This sort of question is meaningless. However, I am trying to gain some perspective on the problem.

Whereas Java offers Arrays.hashCode as a way to hash arrays, I believe that Swift flat out abstains from helping. If you need to hash arrays, you have to roll your own function.

So let us write something in Swift that is equivalent to Java, a simple polynomial hash function:

func simpleHash(_ array : [Int]) -> Int {
  var hash = 0
  for x in array {
    hash = hash &* 31 &+ x
  }
  return hash
}

There are ampersands everywhere because Swift crashes on overflows. So if you have a 64-bit system and you write (1<<63)*2, then your program halts. This is viewed as being safer. You need to prefix your operators with the ampersand to keep the code running.

We can “unroll” the loop, that is process the data in blocks of four values. You can expect larger blocks to provide faster performance, albeit with diminishing returns.

Of course, if you are working with tiny arrays, this optimization is useless, but in such cases, you probably do not care too much about the performance of the hash function.

The code looks a bit more complicated, but what we have done is not sophisticated:

func unrolledHash(_ array : [Int]) -> Int {
  var hash = 0
  let l = array.count/4*4
  for i in stride(from:0,to:l,by:4) {
    hash = hash &* 31  &* 31  &* 31  &* 31  
           &+ array[i]  &* 31  &* 31  &* 31 
           &+ array[i + 1]  &* 31  &* 31    
           &+ array[i + 2]  &* 31  
           &+ array[i + 3]
  }
  for i in stride(from:l,to:array.count,by:1) {
      hash = hash &* 31 &+ array[i]
  }
  return hash
}

I have designed a little benchmark that hash a large array using Swift 3.0. I run it on a Linux box with a recent processor (Intel Skylake) running at 3.4 GHz. Like in the Java experiments, the unrolled hash function is nearly twice as fast:

simple hash0.9 ns/element
unrolled hash0.5 ns/element

In the unrolled case, we are using about 1.7 CPU cycles per element value, against 3 CPU cycles in the simple case.

Swift 4.0 is around the corner and I tried running the benchmark with a pre-release version of Swift 4.0, but the performance difference remained.

As usual, my code is available. It should run under Linux and macOS. It might possibly run under Windows if you are the adventurous type.

So, at a glance, Swift does not differ too much from Java: the relative performance gap between hand-tuned hash functions and naive hash functions is the same.

Obviously, it might be interesting to extend these investigations beyond Java and Swift. My current guess is that the gap will mostly remain. That is, I conjecture that while some optimizing compilers will be able to unroll the loop, none of them will do as well as the simple manual unrolling. In effect, I conjecture Java and Swift are not being particularly dumb.

Science and Technology links (August 11th, 2017)

It looks like the Java programming language might finally get in-language support for vector instructions, these instructions are supported by modern processors and multiply the processing speed… but they often require different algorithms. Language designers have often ignored vector instructions, relying instead on optimizing compilers to “auto-vectorize” the code. Optimizing compilers are very good, but experience shows that language support for performance features pays.

It seems that armies worldwide are toying with the idea of smart full face military helmets. These helmets are bullet proof, have night vision and heat detection, they offer augmented reality. Sounds like something from Star Wars.

Whey is a common byproduct of the transformation of milk. It is cheap. Bodybuilders use whey protein supplements to grow bigger muscles. It seems that supplementing with whey proteins might be a good strategy for all of us, young and old. Personally, I remain somewhat cautious. We don’t know the long term effects of loading up on proteins. It is a popular thing to do, but it does not make it safe.

Parkinson’s is a terrible disease that cannot be stopped or even slowed currently. A diabetes drug called exenatide appeared to have halted the progression of disease in a small trial. If this could be verified, it would be a historical event.

Science and Technology links (August 4th, 2017)

Lifting a lot of small weights and eating protein regularly builds muscle mass. There is no need for heavy weights and hormones matter less than you think.

There is some evidence for life on Saturn’s largest Moon, Titan. If there is life there, it is going to be quite different from life on Earth.

Though dementia affects millions of people, the causes remain mysterious and the cure elusive. There is mounting evidence that Alzheimer’s and dementia might have a bacterial origin. Or maybe not, we don’t know.

In Alzheimer’s, people progressively lose their memories. Are these memories lost or buried? It seems that it is the latter. In mice affected by a model of Alzheimer’s, researchers were able to bring back the lost memories.

Is Alzheimer’s a disease specific to human beings? It seems that chimpazees also suffer from Alzheimer’s.

We are able to crack safes using inexpensive robots.

Last week, I reported the US researchers modified genetically human embryos for the first time using new technology (CRISPR). The Nature paper is out and we can study the details. It turns out that they have properly corrected a genetic defect inherited by the male donor. The experiment worked, but not quite how the researchers expected: instead of replacing the faulty gene by the proposed replacement, the process duplicated the maternal gene.

Stem cells can be used make stronger bones.

There is some evidence that the hypothalamus (part of our brain) controls ageing, and that stem cell therapy in the brain could rejuvenate us. The evidence remains weak.

Science and Technology links (July 27th, 2017)

Currently, damage to the retina is largely viewed as irreversible. However, some researchers were able to generate retinal cells in mice.

Toyota is reportedly ready to commercialize, within 5 years, a new type of batteries called “solid state”. They would be lighter than the current lithium batteries and would work well over a wider range of temperatures.

It is possible that the Moon might be rich in water, after all. This would make the Moon an attractive destination since water has many useful applications.

Though it happens elsewhere in the world, scientists have edited the genes of embryos for the first time in the US. It is unlikely to lead to the production of genetically superior children on a large scale, at least for the foreseeable future.