Computers often need random numbers. Most times, random numbers are not actually random… in the sense that they are the output of a mathematical function that is purely deterministic. And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe… it seems more likely that our universe is deterministic but that our limited access to information makes randomness a useful concept. Still, very smart people have spent a lot of time defining what random means, and it turns out that mathematical functions can be said to produce “random” outputs in a reasonable sense.
In any case, many programmers have now adopted a new random number generator called PCG and designed by professor Melissa O’Neill from Harvey Mudd college.
What O’Neill did is quite reasonable. She asked herself whether we could produce better random number generators, she wrote a paper and published code. The result was quickly adopted by engineers worldwide.
She also submitted her paper for consideration in what I expect to be a good, well-managed journal.
Her manuscript became lengthy in time and maybe exceeded some people’s style sensibilities, she justifies herself in this manner:
I prefer to write papers that are broadly accessible. I’d rather write a paper that can be enjoyed by people who are interested in the topic than one that can only be understood by a tiny number of experts. I don’t agree with the philosophy that the more impenetrable the paper, the better the work must be! Describing desirable qualities in detail seemed to be necessary for the paper to make sense to anyone not deeply entrenched in the field. Doing so also seemed necessary for anyone in the field who only cared about a subset of the qualities I considered desirableâ€”I would need to convince them that the qualities they usually didn’t care about were actually valuable too.
As I pointed out, she had a real-world impact:
While attending PLDI and TRANSACT in June of 2015, I got one of the first clues that my work had had real impact. I can’t remember the talk or the paper, but someone was saying how their results had been much improved from prior work by switching to a new, better, random number generator. At the end I asked which one. It was PCG.
Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result:
I’d be extremely careful before taking from granted any claim made about PCG generators. Wait at least until the paper is published, if it ever happens. (…) Several claims on the PCG site are false, or weasel words (…) You should also be precise about which generator you have in mindâ€”the PCG general definition covers basically any generator ever conceived. (…) Note that (smartly enough) the PCG author avoids carefully to compare with xorshift128+ or xorshift1024*.
Her paper was not accepted. She put it in those terms:
What was more interesting were the ways in which the journal reviewing differed from the paper’s Internet reception. Some reviewers found my style of exposition enjoyable, but others found it too leisurely and inappropriately relaxed. (…) An additional difference from the Internet reaction was that some of the TOMS reviewers felt that what I’d done just wasn’t very mathematically sophisticated and was thus trivial/uninteresting. (…) Finally, few Internet readers had complained that the paper was too long but, as I mentioned earlier, the length of the paper was a theme throughout all the reviewing. (…) Regarding that latter point, I am, on reflection, unrepentant. I wanted to write something that was broadly accessible, and based on other feedback I succeeded.
I emailed O’Neill questions a couple of times, but she never got back to me.
So we end up with this reasonably popular random number generator, based on a paper that you can find online. As far as I can tell, the work has not been described and reviewed in a standard peer-reviewed manner. Note that though she is the inventor, nothing precludes us to study her work and write papers about it.
John D. Cook has been doing some work in this direction on his blog, but I think that if we believe in the importance of formal scientific publications, then we ought to cover PCG in such publications, if only to say why it is not worth consideration.
What is at stake here is whether we care for formal scientific publications. I suspect that Cook and O’Neill openly do not care. The reason you would care, fifty years ago, is that without the formal publication, you would have a hard time distributing your work. That incentive is gone. As O’Neill points out, her work is receiving citations, and she has significant real-world impact.
At least in software, there has long been a relatively close relationship between engineering and academic publications. These do not live in entirely separate worlds. I do not have a good sense as to whether they are moving apart. I think that they might be. Aside from hot topics like deep learning, I wonder whether the academic publications are growing ever less relevant to practice.
Further reading: You might also enjoy my post Testing non-cryptographic random number generators: my results.
11 thoughts on “On Melissa O’Neill’s PCG random number generator”
That’s interesting. I chose Sebastiano Vigna’s last PRNG based mostly on that it was the last thing published on the internet and that the claims made by the author were bold and generally reasonable. I never bothered to run the benchmarks myself and verify the claims. I just took them for granted. I guess the peope who chose PCG in their projects just did it when it was the last cool thing published. There is a room for accident in software engineering.
What you did is reasonable. Certainly, Vigna has been clear on how he recommends against PCG and he is a good researcher and programmer. Of course, we should remain open to the possibility that the people who went with PCG are correct in their choice. I refer you to John Cook who has been doing comparisons.
For what it’s worth I’ve run ~500 testu01 crush battery runs on the main PCG variants and xorshift128+ and xorshiro with no questionable p-values, so I feel OK using any of these with sample size per problem <= 10^5 which is way more than I ever care about. John Cook mostly talks about DIEHARDER and the NIST suites, which I personally don't have much confidence in (too easy to pass). Of course to make me feel like a jerk he's just put up a post using PractRand (which I know nothing about) on all of these…so it goes:
I’d like to clarify that I did not mean to suggest in any way that Vigna’s RNG were not good choices.
You also might be interested in Xoroshiro:
I think you’re misunderstanding the peer-review-and-publishing process. Ending up published is supposed to be at the end of a peer-review process. Though many “scientific journals” now exist where no such thing happens, as has been demonstrated, e.g. through fake papers, time and again. Not ending up published doesn’t mean no peer review happened. That happens, formally at each submission, presumably, and informally any time the author receives comments from other readers of the paper.
So what happens at “peer review”? Well, in this case it appears the paper didn’t convey enough of that “one of us”-smell for some reviewers to assent to publishing. Because the prose was too accessible, maybe? (Which is a silly proposition since science is useless if inaccessible, so its prevalence is a problem.) That doesn’t mean the work is any good, just that it is accessible.
And that is perhaps the core problem with any scientific work these days. You still have to read the paper and make up your own mind. The number of people who’ve just copied the ideas into (at least seemingly) working code says nothing about how good the presented method is, either; all it does is give some indication of popularity. But that is a different thing.
I think we are in agreement but I’d like you to consider the possibility that I am less naive about the process than you suggest.
As a someone who builds computer systems, I’m far more likely to use something like PCG — something with clear and easy to read public documentation and a public code repository — than I am to use something from a paper. If the paper is only available in a paywalled journal I’m unlikely to grab it.
My experience has been that the closer in spirit an academic project is to an open source project, the more likely it will be to have working code I can use and contribute to.
I know that academic papers are intended to describe novel ideas, not working code, but I’ve found that unimplementable ideas are plentiful. Novel ideas that also good engineering solutions are much rarer.
That’s an important point. O’Neill made it easy to use the works.
Melissa seems to be right that PRNG are largely verified benchmarks, hopefully well designed and possibly research published statistical tests.
I imagine all these tests that all these professors have devised and applied will eventually fault the PCG in some way as they fault other generators. She is also very right to advertise that many of the worst generators are still defaults in our modern compilers and programming languages.
The current default Golang math/rand.go is a very scary choice, because nobody seems to have been able to track down the creator of that algorithm, nor any publication describing it.
PCG will not disappear just because the professor has quirks when it comes to academia and publishing — loads of quirky ones out there. As you indicate also, no evidence seems to have surface to reject PCG, are academics not trying hard enough to fault the algorithm?
Someone expert in testing randomness may want to write a statistical test to fault PCG soon because it seems it may become a prime choice for Go 2.0 and future versions of C++. After all, most software developers, compiler developers and hardware developers love benchmarks, and the benchmarks that exist say this one is better. Problem is maybe that we have a selection effect here, someone just hasn’t yet written a benchmark that specifically faults PCG?
Some criticisms of PCG:
You may subscribe to this blog by email.