Predicting the truncated xorshift32* random number generator

Software programmers need random number generators. For this purpose, the often use functions with outputs that appear random. Gerstmann has a nice post about Better C++ Pseudo Random Number Generator. He investigates the following generator:

uint32_t xorshift(uint64_t *m_seed) {
  uint64_t result = *m_seed * 0xd989bcacc137dcd5ull;
  *m_seed ^= *m_seed >> 11;
  *m_seed ^= *m_seed << 31;
  *m_seed ^= *m_seed >> 18;
  return (uint32_t)(result >> 32ull);
}

This “truncated xorshift32*” function returns 32-bit “random” integers, and takes in a 64-bit state function. Each time you call the function, the state is updated, so that following random integers will vary. Thus the state and the returned random numbers are distinct concepts. The PCG family of random number generators also uses this nice trick.

Gerstmann asks whether the generator is “predictable” and writes “Unknown?” as an answer. What is the missing answer? The answer is that it is predictable.

What does predictable means?

Suppose that I tell you that the first random number generated is 1 and the second is 2… can you infer what the state is? If you try to setup the probably mathematically, you may find that the problem is quite vexing.

But, in fact, it is easy. I wrote a small program that gives you the answer in 4 seconds, using brute force. And once you know what the state is, you can predict all following random integers.

How does it work? Simply put, from the first 32-bit output of the function (1 in my case), you know the equivalent of 32 bits of the state. Thus you only have a bit over 4 billion possibilities. That’s too much for a human being, but remember that your processor does billions of instructions per second… so 4 billion possibilities is not very many.

My source code is available.

To be clear, it is not an argument against this particular generator.

Science and Technology links (June 29th, 2018)

  1. Hogarth imagines that artificial intelligence (AI) could progress much faster than we might anticipate due to what he calls “AI nationalism”:

    I believe that the current government spending on AI is tiny compared to the investment we will see as they come to realise what is at stake. What if rather than spending ~£500 million of public money on AI over a number of years the UK spent something closer to its annual defence budget of £45 billion?

  2. This year alone, the game Pokemon Go made 1.1 billion dollars in revenues. That is a lot of money, but another game called Fortnite made even more: 1.2 billion dollars. On the whole the videogame industry is much larger than the movie industry. The Grand Theft Auto V made six billion dollars, more than any movie ever made.
  3. Today, in Canada, around 15% of all college-level course enrolments are online. A report finds that not only do students enjoy studying at any place, but they also crave continuous enrolment and accelerated programs. Online education is less about reaching far away students, as there are campus colleges nearly everywhere, and more about reinventing time and increasing student productivity.
  4. Old female mice treated with a specific class of antibodies live 9% longer according to an article in Nature. It is interesting for a few reasons. One of them is that the sizeable life extension happens with mice that are already old. It is obviously much easier to make mice live longer if you intervene when they are still young. Another reason it is interesting is that it only works with females. Sex differences with respect to longevity is a fascinating topic. Though most animals have some sex differences with respect to longevity, human beings are the only known species where one gender (female) has a clear cut longevity advantage. The difference is huge: in the U.S. life expectancy is 77 years for males and 82 years for females. In Canada, life expectancy is 80 and 84. So we are talking about a difference of between 4 and 6 years. I don’t think we know why there is such a difference and any explanation would need to take into account the fact that it is a feature unique to human beings. It is not uncommon however for life-extending therapies to be gender specific.
  5. Naked mole rats are among the rare mammals that “do not age” (meaning that their rate of death does not increase with age). They are also largely immune to cancer. A recent article shows that they have more efficient gene repair abilities than normal mice.
  6. The Fermi paradox is the idea that intelligent life elsewhere in the universe is likely, yet we don’t see any trace of it. Sandberg et al. argues that Fermi was wrong: intelligent life elsewhere in the universe is not likely. We are probably alone.
  7. It is believed that age-related arthritis is caused by cell senescence. UNITY, a company funded in part by Jeff Bezos (Amazon), is starting clinical trials to wipe out senescent cells. Currently, there is little that can be done to reverse arthritis. You can go for joint replacement, but it is invasive and only works for major joints.
  8. Hanley, Bains and Church argue that we should not attempt to prevent medical self-experimentation:

    We conclude that self-experimenters should not have attempts made to terminate them, bar them from use of facilities, nor be barred from using themselves or their tissues except in exceptional circumstances. Organizational uncertainty over the ethical and regulatory status of self-experimentation, and resulting fear of consequences is unjustified and may be blocking a route to human experiments that practicing scientists widely consider appropriate, and which historical precedent had shown is valuable.

  9. In an article published by Nature, Wang et al. find support for the theory that aging is caused by a “quasi-program” as opposed to the accumulation of damage.

Data processing on modern hardware

If you had to design a new database system optimized for the hardware we have today, how would you do it? And what is the new hardware you should care about? This was the topic of a seminar I attended last week in Germany at Dagstuhl.

Here are some thoughts:

  • You can try to offload some of the computation to the graphics processor (GPU). This works well for some tasks (hint: deep learning) but not so well for generic data processing tasks. There is an argument that says that if you have the GPU anyhow, you might as well use it even if it is inefficient. I do not like this argument, especially given how expensive the good GPUs are. (Update: Elias Stehle pointed out to me that GPU memory is growing exponentially which could allow GPUs to serve as smart memory subsystems.)
  • The problem, in general, with heterogeneous data processing systems is that you must, somehow, somewhen, move the data from one place to the other. Moving lots of data is slow and expensive. However, pushing the processing to the data is appealing. So you can do some of the processing within the memory subsystem (processing in memory or PIM) or within the network interface controllers. I really like the idea of applying a filter within the memory subsystem instead of doing it at the CPU level. It is unclear to me at this point whether this can be made into generically useful technology. The challenges are great.
  • There are more and more storage systems, including persistent memory, solid state drives, and so forth. There is expensive fast storage and cheaper and slower storage. How do you decide where to invest your money? Jim Gray’s 5-minute rule is still relevant, though it needs to be updated. What is this 5-minute rule? You would think that slow and inexpensive storage is always cheaper… but being half as fast and half as expensive is no bargain at all! So you always want to be comparing the price per query. Frequent queries justify more expensive but faster storage.
  • There is much talk about FPGAs: programmable hardware that can be more power efficient than generic processors at some tasks. Because you have more control over the hardware, you can do nifty things like make use of all processing units at once in parallel, feed the output of one process directly into another process, and so forth. Of course, though it is used more efficiently, the silicon you have in an FPGA costs more. If your application is a great fit (e.g., signal processing), then it is probably a good idea to go the FPGA route… but it is less obvious to me why data processing, in general, would fit in this case. And if you need to outsource only some of your processing to the FPGA, then you need to pay a price for moving data.
  • The networking folks like something called RDMA (remote direct memory access). As I understand it, it allows one machine to access “directly” the memory of another machine, without impacting the remote CPU. Thus it should allow you to take several distinct nodes and make them “appear” like a multi-CPU machine. Building software on top of that requires some tooling and it is not clear to me how good it is right now. You can use MPI if it suits you.
  • There is talk of using “blockchains” for distributed databases. We have been doing a lot of work to save energy and keep computing green. Yet it is useless because we are burning CPU cycles like madmen for bitcoin and other cryptocurrencies. People talk about all the new possible applications, but details are scarce. I do not own any bitcoin.
  • We have cheap and usable machine learning; it can run on neat hardware if you need it to. Meanwhile, databases are hard to tune. It seems that we could combine the two to tune automagically database systems. People are working on this. It sounds a bit crazy, but I am actually excited about it and I plan to do some of my own work in this direction.
  • Cloud databases are a big deal. The latest “breakthrough” appears to be Amazon Aurora: you have a cheap, super robust, extendable relational database system. I have heard it described as an “Oracle killer”. The technology sounds magical. Sadly, most of us do not have the scale that Google and Amazon have, so it is less clear how we can contribute. However, we can all use it.
  • Google has fancy tensor processors. Can you use them for things that are not related to deep learning and low-precision matrix multiplications? It seems that you can. Whether it makes sense is another story.
  • People want more specialized silicon, deeper pipelines, more memory requests in-flight. It is unclear whether vendors like Intel are willing to provide any of it. There was some talk about going toward Risc-V; I am not sure why.

I am missing many things, and I am surely misreporting much of what was said. Still, I will write some more about some of these ideas over the next few weeks.

Some subjects that people did not cover much at the seminar I was at, as far as I know:

  • Data processing on mobile devices or “Internet of things” was not discussed.
  • I felt that there was relatively little said about large chips made of many, many cores. Vectorization was touched upon, but barely.

The female representation at the event was low in number, but not in quality.

Credit: The Dagstuhl seminar I attended was organized by Peter A. Boncz, Goetz Graefe, Bingsheng He, and Kai-Uwe Sattler. The list of participants include Anastasia Ailamaki, Gustavo Alonso, Witold Andrzejewski, Carsten Binnig, Philippe Bonnet, Sebastian Breß, Holger Fröning, Alfons Kemper, Thomas Leich, Viktor Leis, myself (Daniel Lemire), Justin Levandoski, Stefan Manegold, Klaus Meyer-Wegener, Onur Mutlu, Thomas Neumann, Anisoara Nica, Ippokratis Pandis, Andrew Pavlo, Thilo Pionteck, Holger Pirk, Danica Porobic, Gunter Saake, Ken Salem, Kai-Uwe Sattler, Caetano Sauer, Bernhard Seeger, Evangelia Sitaridi, Jan Skrzypczak, Olaf Spinczyk, Ryan Stutsman, Jürgen Teich, Tianzheng Wang, Zeke Wang, and Marcin Zukowski. The beer was quite good.

Science and Technology links (June 24th, 2018)

  1. Video gamers may soon be paid more than top pro athletes. Meanwhile, if you want to stand out in a crowd of university professors, point out that you are a fan of videogames.
  2. You can get your whole genome sequenced for $500.
  3. Machines can learn to solve the Rubik’s Cube “without human knowledge”.
  4. In most countries in the world, there are more cellular subscriptions than there are human beings. In the worst case (Afghanistan), there is 0.6 cellular subscriptions per person.
  5. More than 70 per cent of Australian men are now considered overweight or obese.
  6. There have never been more hives of honey bees; there are about 90 million in the world compared with about 60 million 50 years ago. In Europe and the UK, we are near to a record number of hives.
  7. Too much physical training can lead to bone loss.
  8. Heart disease is rising in India. We do not know why.
  9. The placebo effect does not exist or is much weaker than usually claimed.
  10. Vitamin K2 makes bone stronger.

Roaring Bitmaps in JavaScript

Roaring bitmaps are a popular data structure to represents sets of integers. Given such sets, you can quickly compute unions, intersections, and so forth. It is a convenient tool when doing data processing.

I used to joke that Roaring bitmaps had been implemented in every language (Java, C, Rust, Go, and so forth), except JavaScript. This was a joke because I did not expect that one could implement Roaring bitmaps reasonably using JavaScript. I have written a few performance-oriented libraries in JavaScript (FastBitSet.js, FastPriorityQueue.js, FastIntegerCompression.js), but I could not imagine doing a whole Roaring bitmap implementation in JavaScript.

However, it turns out that many people who program in JavaScript run their code under Node.js. And Node.js supports “native addons”… which basically means that you can call a C/C++ library from JavaScript when you are working in Node.js, as long as someone packaged it for you.

And Salvatore Previti did just that for Roaring bitmaps. So you can, say, generate Roaring bitmaps in Java, then load them in Python modify them, and then load them again in Node.js before shipping them your Go program. Not that anyone is doing it, but it is possible, in theory.

Anyhow, how is the performance of Roaring bitmaps in Node.js? We can compare them with JavaScript’s Set data structure as well as against FastBitSet.js.

• suite intersection size
  262144 elements
Set                   33.26 ops/sec 
FastBitSet        14,364.56 ops/sec 
RoaringBitmap32  266,718.85 ops/sec 
  ➔ Fastest is RoaringBitmap32

• suite intersection (in place)
  65536 elements
Set                    199.99 ops/sec 
FastBitSet          93,394.64 ops/sec 
RoaringBitmap32  4,720,764.58 ops/sec 
  ➔ Fastest is RoaringBitmap32

• suite intersection (new)
  1048576 elements
Set                  3.32 ops/sec 
FastBitSet       1,436.14 ops/sec 
RoaringBitmap32  3,557.16 ops/sec
  ➔ Fastest is RoaringBitmap32

• suite union (in place)
  65536 elements
Set                  201.71 ops/sec 
FastBitSet       147,147.28 ops/sec 
RoaringBitmap32  497,687.77 ops/sec
  ➔ Fastest is RoaringBitmap32

• suite union size
  262144 elements
 Set                   22.77 ops/sec
 FastBitSet         7,766.65 ops/sec
 RoaringBitmap32  274,167.71 ops/sec
  ➔ Fastest is RoaringBitmap32

• suite union (new)
  1048576 elements
Set                  1.72 ops/sec
FastBitSet         698.26 ops/sec
RoaringBitmap32  2,033.11 ops/sec 
  ➔ Fastest is RoaringBitmap32

So Roaring bitmaps can be thousands of times faster than a native JavaScript Set. they can can be two orders of magnitude faster than FastBitSet.js. That Roaring bitmaps could beat FastBitSet.js is impressive: I wrote FastBitSet.js and it is fast!

Of course results will vary. No data structure is ever optimal in all use cases. But these numbers suggest that, in some cases, Roaring bitmaps will be useful in JavaScript.

What if you are not using Node.js. Maybe you are running your code in a browser? Salvatore Previti wrote a WebAssembly version as well, basically compiling the C code from the CRoaring library into WebAssembly. The wrapper is still incomplete and it is unclear whether WebAssembly is mature enough to give you good performance, but, one day soon, it might be possible to have fast Roaring bitmaps in the browser.

Science and Technology links (June 15th, 2018)

  1. The market for artificial-intelligence chips could reach $30bn by 2022. My guess is that NVIDIA (a graphics card maker) is the big winner, their tag line is now “Artificial Intelligence Computing Leadership”.
  2. Multiple sclerosis could be caused by polymicrobial infections.
  3. There are fungal infections in the central nervous systems of Alzheimer’s patients, but not in the rest of us.
  4. Google’s translate app can operate offline on a mobile phone using a 40 megabyte language model.
  5. People with severe mental problems are twice as likely to suffer from diabetes.
  6. The DeepMind folks have engineered a piece of software that can infer 3D topology from 2D pictures.

Emojis, Java and Strings

Emojis are funny characters that are becoming increasingly popular. However, they are probably not as simple as you might thing when you are a programmer. For a basis of comparison, let me try to use them in Python 3. I define a string that includes emojis, and then I access the character at index 1 (the second character):

>>> x= "😂😍🎉👍"
>>> len(x)
4
>>> x[1]
'😍'
>>> x[2]
'🎉'

This works well. This fails with Python 2, however. So please upgrade to Python 3 (it came out ten years ago).

What about Java and JavaScript? They are similar but I will focus on Java. You can define the string just fine…

String emostring ="😂😍🎉👍";

However, that’s where troubles begin. If you try to find the length of the string (emostring.length()), Java will tell you that the string contains 8 characters. To get the proper length of the string in terms of “unicode code points”, you need to type something like emostring.codePointCount(0, emostring.length()) (this returns 4, as expected). Not only is this longer, but I also expect it to be much more computationally expensive.

What about accessing characters? You might think that emostring.charAt(1) would return the second character (😍), but it fails. The problem is that Java uses UTF-16 encoding which means, roughly, that unicode characters can use one 16-bit word or two 16-bits, depending on the character. Thus if you are given a string of bytes, you cannot tell without scanning it how long the string is. Meanwhile, the character type in Java (char) is a 16-bit word, so it cannot represent all Unicode characters. You cannot represent an emoji, for example, using Java’s char. In Java, to get the second character, you need to do something awful like…

new StringBuilder().appendCodePoint(
  emostring.codePointAt(emostring.offsetByCodePoints(0, 1))).toString()

I am sure you can find something shorter, but that is the gist of it. And it is far more expensive than charAt.

If your application needs random access to unicode characters in a long string, you risk performance problems.

Other language implementations like PyPy use UTF-32 encoding which, unlike Java’s UTF-16 encoding, supports fast random access to individual characters. The downside is increased memory usage. In fact, it appears that PyPy wants to move to UTF-8, the dominant format on the Web right now. In UTF-8, characters are represented using 1, 2, 3 or 4 bytes.

There are different trade-offs. If memory is no object, and you expect to use emojis, and you want fast random access in long strings, I think that UTF-32 is superior. You can still get some good performance with a format like UTF-8, but you will probably want to have some form of indexing for long strings. That might be difficult to implement if your language does not give you direct access to the underlying implementation.

More annoying than performance is just the sheer inconvenience to the programmer. It is 2018. It is just unacceptable that accessing the second emoji in a string of emojis would require nearly undecipherable code.

Appendix: Yes. I am aware that different code points can be combined into one visible character so that I am simplifying matters by equating “character” and “code point”. I am also aware that different sequences of code points can generate the same character. But this only makes the current state of programming even more disastrous.

Science and Technology links (June 9th, 2018)

  1. A woman with late-stage breast cancer has been successfully cured using immunotherapy. She was preparing to die. She is now going to live hopefully many more years in good health.
  2. Deep learning generalizes poorly, in at least some cases:

    (…) when images are scaled to half their original size and randomly translated, the accuracy of deep learning is less than 50%. For comparison, the previous methods achieved over 70% accuracy, and this was considered poor performance.

  3. People with high IQ (a measure of intelligence) as well as people with fast reaction times are less likely to die.
  4. Looking young is a sign of good health: Perceived age—which is widely used by clinicians as a general indication of a patient’s health—is a robust biomarker of ageing that predicts survival.
  5. While treating various neurodegenerative disorders with stem cells, we accidentally observed a transition of hair colorations from mostly grey to black in 4 elderly patients.
  6. Ibuprofen (a common analgesic) depress testicular function, including testosterone production.
  7. Analogue non-volatile memory can accelerate the neural-network training by two orders of magnitude.
  8. Some antibodies blocks inflammation, protecting mice from hardened arteries and liver disease:

    The antibody also prolonged the life of the mice. After 15 months, all of the antibody-producing mice were alive, compared to 54 percent of the control mice.

  9. If robots are stealing our jobs, they are doing a poor job at it. There are more job openings in the US than there are people available. I’m no economist, but I’d expect to see raising wages.
  10. Asians eat less rice as they get richer.

Vectorizing random number generators for greater speed: PCG and xorshift128+ (AVX-512 edition)

Most people designing random number generators program using regular code. If they are aiming for speed, they probably write functions in C. However, our processors have fast “vectorized” (or SIMD) instructions that can allow you to go faster. These instructions do several operations at once. For example, recent Skylake-X processors from Intel can do eight 64-bit multiplications with a single instruction.

There is a vectorized version of the Mersenne Twister generator used in some C++ standard libraries, but the Mersenne Twister is not particularly fast to begin with.

What happens if we vectorize really fast generators?

I had good luck vectorizing Vigna’s xorshift128+ random number generator. A generator like it is part of some JavaScript engines. The xorshift128+ generator produces 64-bit values, but you can consider them as two 32-bit values. On my Skylake-X processor, I can generate 32-bit random integers at a rate of 2 cycles per integer using xorshift128+. That’s almost twice as fast as when using the default, scalar implementation in C.

scalar xorshift128+3.6 cycles per 32-bit word
SIMD xorshift128+1.0 cycles per 32-bit word

PCG is a family of random number generators invented by O’Neill. Can we do the same with PCG? I took a first stab at it with my simdpcg library. My vectorized PCG ends up using about 1 cycle per integer, so it has the same speed as the vectorized xorshift128+.

scalar PCG4.3 cycles per 32-bit word
SIMD PCG1.0 cycles per 32-bit word

In these tests, I simply write out the random number to a small array in cache. I only measure raw throughput. To get these good results, I “cheat” a bit by interleaving several generators in the vectorized versions. Indeed, without this interleave trick, I find that the processor is almost running idle due to data dependencies.

My C code is available:

Sadly, I expect that most of my readers do not yet have processors with support for AVX-512 instructions. They are available as part of the Skylake-X and Cannonlake processors. Intel has not been doing a great job at selling these new processors in great quantities. You may be able to have access to such processors using the cloud.

Update: In my initial version, I reported worse performance on xorshift128+. Samuel Neves pointed out to me that this is due to the lack of inlining, because I compile the xorshift128+ functions in their own object files. We can solve this particular problem using link time optimization (LTO), effectively by passing the -flto flag as part of the compile command line. As usual, results will vary depending on your compiler and processor.

Science and Technology links (June 2nd, 2018)

  1. Human hearts do not regenerate. Cardiovascular diseases are the leading cause of death in occident. Japanese doctors will graft sheets of tissue derived from reprogrammed stem cells in the hope of healing diseased human hearts. It is a small and uncertain trial.
  2. The surface of your eyes (cornea) may become damaged following an accident or a disease. You can receive a transplant, but it is difficult. On an experimental basis, we are now printing new corneas.
  3. Viagra is out of patent. You can get it cheaper. Viagra might be good for your heart. It also raises testosterone which might help you get more muscular. (I don’t take viagra, but maybe I should.)
  4. Is it a good investment to give a lot of money to a great researcher? Not really: beyond a certain point, you are wasting your money. The scientific return on research reaches a maximum at around $400,000 of annual support per researcher. Funders should limit both the maximum amount of funding per researcher. Maybe you think that $400,000 is a lot, but it is not difficult for some academics to accumulate a lot more funding due to the Matthew effect.
  5. Yellow spots in the eyes could be a symptom of dementia.
  6. Maybe unsurprisingly, you can buy your way into Harvard:

    Before he married Ivanka Trump and became a top adviser to the President, Jared Kushner was well-known for the rumors surrounding the circumstances of his admission to Harvard. Jared’s father, Charles, a New Jersey real-estate developer, donated $2.5 million to the college shortly before Jared applied. Officials at Jared’s high school told Golden that they were shocked he got in (…) And they were especially disappointed that he was admitted while his high-school classmates who were better-qualified on the merits got rejected.

    The whole point of places like Harvard is that it is hard to get in. But it is apparently much easier if you are rich and connected.

    Logically, this means that a Harvard degree should have much more weight if you are from a modest background.

  7. Donald Trump wears expensive clothes, he is rich, he graduated from a top-5 school, and he is not modest. It works for him.

    Why are some people modest? Shouldn’t you always broadcast how great you are? But advertising or burying your signals is itself a signal (a meta-signal). So if you went to Harvard, walking around with a Harvard hat and Harvard shirt is maybe less wise than it sounds.

    The really cool and powerful people don’t need to flaunt their accomplishments. If you do much flaunting, you are signaling that you are struggling with your social status.

    And some apparently negative signals can be positive ones, as they can be used to display confidence. For example, modest clothes can be a signal of power in some contexts.

    I suspect that your signals determine who you get to work with. So they are both tremendously important but also quite complicated.

  8. Reviewers are much more likely to recommend acceptance of papers from top institutions. That is, given the exact same paper, if you are told that it is from a top school, you are much more likely to recommend it, even if everything else is kept the same.

    There is an interesting consequence to this observation. When you are given a list of accepted papers at a selective event, look for the contributions from people at non-elite institutions. Their work is probably much more solid since there is a bias working against them and they are the survivors of a more difficult game.

  9. Following the second world war, copyright was temporary waived on German books. This lead them to become much more cited. It should be obvious copyright prevents access. At a civilization scale, this has a significant cost. For example, medical doctors can’t easily access the latest research.
  10. Conclusive evidence for the benefit of any supplement has not demonstrated. Please don’t take multivitamins, at best they are only a waste of time and money. At worse, they are harmful.
  11. Google is working for the military. Many of its engineers are unhappy. I don’t understand why Google does this. Why risk annoying its best engineers, possibly getting them to seek offers elsewhere? Yes, I know that it must be lucrative, but Google has enormous cash reserves: billions and billions that it does not know how to spend. Google’s most valuable asset is not its money. So I am guessing that Google is after more than money.
  12. Only half of kids ages 13-17 use Facebook. Snapchat, Instagram, and YouTube are more popular. Instagram is owned by Facebook.