ARM and Intel have different performance characteristics: a case study in random number generation

In my previous post, I reviewed a new fast random number generator called wyhash. I commented that I expected it to do well on x64 processors (Intel and AMD), but not so well on ARM processors.

Let us review again wyhash:

uint64_t wyhash64_x; 


uint64_t wyhash64() {
  wyhash64_x += 0x60bee2bee120fc15;
  __uint128_t tmp;
  tmp = (__uint128_t) wyhash64_x * 0xa3b195354a39b70d;
  uint64_t m1 = (tmp >> 64) ^ tmp;
  tmp = (__uint128_t)m1 * 0x1b03738712fad5c9;
  uint64_t m2 = (tmp >> 64) ^ tmp;
  return m2;
}

(Source code)

It is only two multiplications (plus a few cheap operations like add and XOR), but these are full multiplications producing a 128-bit output.

Let us compared with a similar but conventional generator (splitmix) developed by Steele et al. and part of the Java library:

 uint64_t splitmix64(void) {
  splitmix64_x += 0x9E3779B97F4A7C15;
  uint64_t z = splitmix64_x;
  z = (z ^ (z >> 30)) * 0xBF58476D1CE4E5B9;
  z = (z ^ (z >> 27)) * 0x94D049BB133111EB;
  return z ^ (z >> 31);
}

We still have two multiplications, but many more operation. So you would expect splitmix to be slower. And it is, on my typical x64 processor.

Let me reuse my benchmark where I simply sum up 524288 random integers are record how long it takes…

Skylake x64 Skylark ARM
wyhash 0.5 ms 1.4 ms
splitmix 0.6 ms 0.9 ms

According to my tests, on the x64 processor, wyhash is faster than splitmix. When I switch to my ARM server, wyhash becomes slower.

The difference is that the computation of the most significant bits of a 64-bit product on an ARM processor requires a separate and potentially expensive instruction.

Of course, your results will vary depending on your exact processor and exact compiler.

Note: I have about half a million integers, so if you double my numbers, you will get a rough estimate of the number of nanoseconds per 64-bit integer generated.

Update 1: W. Dijkstra correctly pointed out that wyhash could not, possibly, be several times faster than splitmix in a fair competition. I initially reported bad results with splitmix, but after disabling autovectorization (-fno-tree-vectorize), the results are closer. He also points out that results are vastly different on other ARM processors like Falkor and ThunderX2.

Update 2: One reading of this blog post is that I am pretending to compare Intel vs. ARM and to qualify one as being better than the other one. That was never my intention. My main message is that the underlying hardware matters a great deal when trying to determine which code is fastest.

Update 3. My initial results made the ARM processor look bad. Switching to a more recent compiler (GNU GCC 8.3) resolved the issue.

The fastest conventional random number generator that can pass Big Crush?

In software, we sometimes want to generate (pseudo-)random numbers. The general strategy is to have a state (e.g., a 64-bit integer) and modify it each time we want a new random number. From this state, we can derive a “random number”.

How do you that you have generated something that can pass as a random number? A gold standard in this game is L’Ecuyer’s Big Crush benchmark. It is a set of statistical tests. It is not sufficient to “pass” Big Crush to be a good random number generator, but if you can’t even pass Big Crush, you are in trouble.

When I need a super fast and super simple random number that qualifies, I go for Lehmer’s generator:

__uint128_t g_lehmer64_state;

uint64_t lehmer64() {
  g_lehmer64_state *= 0xda942042e4dd58b5;
  return g_lehmer64_state >> 64;
}

(Source code)

Once compiled for an x64 processor, the generator boils down to two 64-bit multiplication instructions and one addition. It is hard to beat! The catch is that there is non-trivial data dependency between the calls when using the same state with each call: you may need to complete the two multiplications before you can start work on the next function call. Because our processors are superscalar (meaning that they can do several instructions per cycle), it is genuine concern. You can break this data dependency by having effectively two generators, using one and then the other.

Lehmer’s generator passes Big Crush. There are many other fast generators that pass basic statistical tests, like PCG64, xorshift128+, but if you want raw speed, Lehmer’s generator is great.

Recently, a new fast contender has been brought to my attention: wyhash. It is closely related to a family of random number generators and hash functions called MUM and designed by Vladimir Makarov (there is a nearly identical generator by Makarov called mum-prng). The new contender works as follow:

uint64_t wyhash64_x; 


uint64_t wyhash64() {
  wyhash64_x += 0x60bee2bee120fc15;
  __uint128_t tmp;
  tmp = (__uint128_t) wyhash64_x * 0xa3b195354a39b70d;
  uint64_t m1 = (tmp >> 64) ^ tmp;
  tmp = (__uint128_t)m1 * 0x1b03738712fad5c9;
  uint64_t m2 = (tmp >> 64) ^ tmp;
  return m2;
}

(Source code)

Wyhash (and presumably mum-prng) passes rigorous statistical tests.

On an x64 processor, the function generators two multiplications, one addition and two XOR. If you are counting, that’s only two instructions more than Lehmer’s generator. Like generators from the PCG family, wyhash updates its seed very simply, and so you can pipeline the generation of two or more random numbers with minimal data dependency between them: as soon as one addition is completed, you can start work on the second number.

Both of these generators might be relatively less performant on ARM processors due to the high cost of generating the full 128-bit product on ARM architectures. They are also both relatively harder to implement in a portable way.

This being said, which is faster on my x64 processor?

Let us run the experiments. I am going to work over sets of 524288 random numbers. I am using a skylake processor and GNU GCC 8. I make my source code available.

First, I just sum up the random numbers being generated.

wyhash 0.51 ms
Lehmer’s 0.63 ms
Lehmer’s (two gen.) 0.48 ms
Lehmer’s (three gen.) 0.37 ms

From run to run, my margin of error is about 0.02.

Next I am going to store the random numbers in an array.

wyhash 0.6 ms
Lehmer’s 0.6 ms
Lehmer’s (two gen.) 0.6 ms
Lehmer’s (three gen.) 0.4 ms

So using three Lehmer’s generators is best.

Of course, using parallel generators in this manner could be statistically unsafe. One would want to run further tests.

Credit: Monakov suggested going to three generators. The post was updated accordingly.

Further Reading: There were apparently some Hacker News comments on both a new hash function (XXH3) and wyhash.

Don’t read your data from a straw

It is common for binary data to be serialized to bytes. Data structures like indexes can be saved to disk or transmitted over the network in such a form. Many serialized data structures can be viewed as sets of ‘integer’ values. That is the case, for example, of a Roaring bitmap. We must then read back this data. An integer could be serialized as four bytes in a stream of bytes.

We are thus interested in the problem where we want to convert an array of bytes into an array of integer values.

In C or C++, you can safely convert from a stream of bytes to in-memory values using a function like ‘memcpy’. It is fast. Or you can just cheat and do a “cast”.

What do you do in Java?

A convenient approach is to wrap the byte array into an InputStream and then wrap that InputStream into a DataInputStream, like in this example where we convert an array of bytes into an array of integers:

byte[] array = ...
int[] recipient = new int[N / 4];
DataInput di = new DataInputStream(new 
            ByteArrayInputStream(array));
for(int k = 0; k < recipient.length; k++)
    recipient[k] = di.readInt();

The benefit of this approach is improved abstraction: you do not care whether the data comes from an array of bytes or from disk, it is all the same code. If you are have serialization and deserialization code, it is probably written in terms of OutputStream and InputStream anyhow, so why not reuse that perfectly good code?

However, Java offers a performance-oriented concept called a ByteBuffer to represent and array of bytes. It is not as high level as an input stream since it assumes that you do have, somewhere, an array of bytes.

You can achieve the same conversion as before using a ByteBuffer instead:

byte[] array = ...
int[] recipient = new int[N / 4];
ByteBuffer bb = ByteBuffer.wrap(s.array);
bb.asIntBuffer().get(recipient);

Here is the time required to convert 1 million 32-bit integers on my 4.2GHz 2018 iMac:

DataInputStream 10 ms
ByteBuffer 1 ms

That is, the ByteBuffer is 10x faster. My code is available.

Because I have 1 million integers, we can convert back these timings into “time per integer”: the ByteBuffer approach achieves a speed of one 32-bit integer converted per nanosecond. Given that my iMac can execute probably something like a dozen operations per nanosecond, that’s not impressive… but it is at least a bit respectable. The DataInputStream takes 10 nanosecond (or something like 40 or 50 cycles) per integer: it is grossly inefficient.

This has interesting practical consequences. In the RoaringBitmap project, Alvarado pointed out that it is faster to create a bitmap using a ByteBuffer backend, and then convert it back into an normal data structure, than to construct it directly from an input steam. And the difference is not small.

Practically, this means that it may be worth it to provide a special function that can construct a bitmap directly from a ByteBuffer or a byte array, bypassing the stream approach. (Thankfully, we have bitmaps backed with ByteBuffer to support applications such as memory-file mapping.)

Speaking for myself, I was going under the impression that Java would do a fine job reading from a byte array using an input stream. At the very least, we found that ByteArrayInputStream is not the right tool. That a ByteBuffer would be fast is not so surprising: as far as I can tell, they were introduced in the standard API precisely for performance reasons. However, a factor of ten is quite a bit more than I expected.

In any case, it is a perfectly good example of the problem whereas abstractions force you to consume data as if it went through a straw. Streams and iterators are handy abstractions but they often lead you astray with respect to performance.

Further reading. Ondrej Kokes has reproduced these results in Go.

Science and Technology links (March 16th 2019)

    1. There is mounting evidence that clearing old cells (senescent cells) from old tissues has a rejuvenating effect. There are very few such cells in most cases, but they cause a disproportionate amount of problems. Anderson et al. find that this effect might even extend to our hearts

      clearance of senescent cells in mice alleviates detrimental features of cardiac ageing, including myocardial hypertrophy and fibrosis.

      Cells in our hearts do not divide very much or at all, but the authors find that they still “age” due to genetic damage.

    2. Dyakonov, a famous physicist, is quite critical of the notion that quantum computers might be the future of computing:

The huge amount of scholarly literature that’s been generated about quantum-computing is notably light on experimental studies describing actual hardware. (…) There is a tremendous gap between the rudimentary but very hard experiments that have been carried out with a few qubits and the extremely developed quantum-computing theory, which relies on manipulating thousands to millions of qubits to calculate anything useful. That gap is not likely to be closed anytime soon.

We should always be skeptical regarding negative predictions. However, Dyakonov indicates that there is overwhelming emphasis on theory at the expense of practice. This is often a bad sign: theory is useful but disconnected theory is easily overrated.

 

Age and the productivity of professors

Professors tend to earn more as they become older. Woolley believes that professor’s contributions take a sharp turn downward after a certain point. If this is correct and professors retire at an increasingly older age, we have a problem.

What does the research says about this? We can look at published papers as a measure of productivity. A few decades ago, science professors were significantly less productive than younger colleagues, but this pattern has disappeared according to Kyvik and Olsen (Scientometrics, 2008):

In the humanities and the social sciences, productivity patterns by age have been relatively stable over the two decades. Increasing age does not seem to bring about lower levels of scientific and scholarly publishing. In the social sciences it is in fact staff over 60 years of age who are the most productive. (…) The largest changes have taken place in the natural sciences and technology. While staff over fifty years of age were significantly less productive than their colleagues aged between forty and fifty in the earliest period, in the latest period there were no such differences.

Gingras et al. (PLoS One, 2008) find that professors often remain productive until their retirement:

older professors who stay active in research keep their productivity at a high level until their retirement. We also found that the average scientific impact of professors decreases steadily from the beginning of their careers until about 50 years old, and then increases again.

It is true that professors’ salaries match their productivity, but that’s mostly because a small number of professors do most of the hard research and top-notch teaching, while the compensation plans tend to be much more egalitarians. There are also rather striking gaps between institutions which fail to reflect individual differences.

I don’t think we should be concerned with professors who choose to delay or avoid retirement. Retirement is often a bad deal for society. Our pension plans are often underwater.

Of course, we should urge people to contribute and to stay up-to-date. But that applies more broadly. You can be in your 40s and be out-of-touch.

Multiplying by the inverse is not the same as the division

In school, we learn that the division is the same as multiplying the inverse (or reciprocal), that is x / y = x * (1/y). In software, this does not work as-is when working with integers since 1/y is not an integer when y is an integer (except for y = 1 or y = -1). However, computers have floating-point numbers which are meant to emulate real numbers. It is reasonable to expect that x / y = x * (1/y) will work out with floating-point numbers. It does not.

I woke up this morning reading a blog post by Reynolds on how to compute the inverse quickly when the divisor is known ahead of time. Reynolds is concerned with performance, but his post reminded me of the fact that, in general, multiplying by the inverse is not the same as dividing.

For example, dividing by 3.1416 and multiplying by 1/3.1416 do not result in the same numbers, when doing the computation on a computer.

To prove it, let me pull out my node (JavaScript) console…

> x = 651370000000
> x / 3.1416 * 3.1416
651370000000
> invd = 1 / 3.1416
> x * invd * 3.1416
651370000000.0001

Maybe only JavaScript is broken? Let us try in Python…

>>> x = 651370000000
>>> x / 3.1416 * 3.1416
651370000000.0
>>> invd = 1 / 3.1416
>>> x * invd * 3.1416
651370000000.0001

Always keep in mind that floating-point numbers are different from real numbers… for example, half of all floating-point numbers are in the interval [-1,1].

Further readingWhat every computer scientist should know about floating-point arithmetic and Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries

Science and Technology links (March 9th 2019)

  1. In 2007-2008, the recipient of a bone-marrow therapy (Timothy Brown) was cured of HIV. Another person was cured following a bone marrow transplant. And then yet another. The trick to it seems that some of us are naturally immune to the HIV virus. If we donate some of our bone marrow to an HIV-infected individual, it appears that we cure them. So it is effectively a gene therapy. Bone-marrow transplantations are never going to be dirt cheap, but it should be possible to do genetic manipulations to achieve the same result, at much lower cost. News agencies seem to report that it makes three people who might have been cured, but there is also another patient, who might have been cured credibly earlier after receiving hydroxyurea.
  2. The majority (53%) of all doctoral degrees are granted to women in the US: women are the clear majority in health and biological sciences. The majority of medical students are women.
  3. Most start-ups that are claimed to do artificial intelligence do no such thing in practice. Basically, artificial intelligence acts a marketing buzzword.
  4. Most commodity processors have instructions that can operate over wide registers, processing a large number of bits per instruction. We call them SIMD instructions. The ARM processors in your phone have 128-bit instructions whereas Intel processors are up to 512-bit instructions. The wider the registers, the faster you can go; you can also use them to save on power usage. This week I learned that some Qualcomm processors (found in Android phones) support 1024-bit SIMD instructions. (Credit: Wunkolo)
  5. In the US, spending on music is reaching a record high, most of it digital music.
  6. The Queen, Elizabeth II, is on Instagram. She signs “Elizabeth R.”. She is in her 90s.

Should our kids use pencils or keyboards?

In Montreal, most kids have to write on paper using a pencil. They have paper dictionaries. My kids have spent an enormous amount of time learning to write in cursive using pencils, and no time at all (in school) learning to touch type with a keyboard.

I do have pens, that I use to take notes on post-its. But that is about all. All my notes are in electronic form, they are searchable, and I have an integrated spell-check everywhere. I expect that anyone who writes all day solely using pencil and paper is either an artist or someone close to retirement.

On this note, my kids swear that most of their teachers can’t touch type (type on a keyboard without looking at it) while they have been able to touch type before they could write in cursive.

When I was a kid, I owned a typewriter, and later a computer (a TRS-80). Whenever I could, I would do my assignments on a typewriter or computer. The result was invariably nicer looking and more reusable. Typing used to be a skill limited to a few secretaries, but it is now ubiquitous in all walks of life. I never imagined that 30 years later, my kids would still have to write long texts using pencil and paper.

I should qualify this statement: students with diagnosed learning disabilities, and that’s a sizeable fraction of all students these days, can use a computer and a keyboard in class. My kids are envious. Why can’t they use a computer? Why indeed!

They are now learning to do algebra, by hand, as if you couldn’t trivially solve algebraic equations on a computer. I am a computer scientist and I was trained as a mathematician, but I never solve quadratic equations on paper. The risk of mistake is too high. I rely on a computer. Literally, anyone can go to WolframAlpha, enter an equation, and get instantly the correct answer back. The challenge in modern-day mathematics is elsewhere. The dirty error-prone algebra is better done by computers. No engineer or accountant could remain employed and forbid the use of computers, the last engineers and accountants that worked without computers have long ago retired.

There may come a time when our computers stop working but if you think it is likely, you should learn to fight and hunt, not to do mathematics the way people did in 1920.

I could comment at length on what they are learning. Computing the height of a cone given its total area and apothem was a recent problem one of my kids had to solve. And the teachers are eager to set the total area to a nice value like 405.112. Just so that it will look more boring and intimidating. I swear that the problems are so artificial that you’d think these instructors live in another dimension. Who has ever had to compute the height of a cone given that its total area is 405.112. Who?

Before algebra, they had to learn to do long divisions. I had an argument with a teacher about it, when I pointed out that no adult ever does a long division. She pointed out that, sure, they will use calculators, but they need to know how the calculator does it. I objected that computers and calculators do divisions using circuits, that these circuits could be taught… but that it had little to do with doing long divisions by hand.

In any case, I am not alone in thinking that schools are out of touch. Many people are asking schools to become more modern.

In Montreal, a famous education professor, the kind of professors who consult with the government to decide what kids will do, wrote an essay to oppose such reforms. He cites some of my close colleagues who are building up the resistance.

I answered back on Twitter, with some of my usual objections. If educators want to claim that they are in sync with the rest of society, let us at least move beyond the pencil and the paper.

I was offered back a famous study reading texts on paper helps comprehension (as opposed to reading text on computers). I guess that the plan is to raise the new generation of kids of print out all they facebook feeds before reading them? Because let us face it: if you cannot read and understand electronic text you are handicapped, anywhere but in a school.

The fact is that you have to understand text on a screen. There is no choice. I find it scary that the people in charge of our kids would reject screens as a medium for text.

The typical belief of reactionary anti-science educators is that technology is bad for your brain. You will hear stories about mobile phone, the Internet or video games making kids stupid. We can only be saved with paper books and dictionaries. Before that, television was going to destroy us. Long ago, some philosophers thought that books were disastrous: only their voice could carry their wisdom. Successive layers of technology are fought back. That’s fine. What is not fine is the fabrication out of thin air of “facts” to support this gut-driven rejection.

What do more recent studies show? They show that eager students who are given time to become familiar with screens, there is no benefit to paper over screens. Other studies question the superiority of paper: Subrahmanyam et al. found reading the texts on paper did not make a significant difference, compared with computer conditions. Some researchers found that Boys and reluctant readers prefer screens. Porion et al. find that provided we meet all the paper versus computer presentation conditions (text structure and length, screen size, several types of questions measuring comprehension and memory), on-screen reading performances can be improved, and even become comparable to paper-based reading performances. Fesel et al. find that reading hypertext may foster deeper understanding in children.

I could have kept going but as you can see, it is far from an objective truth that people learn better on paper than on a screen. What is certain, however, is that many teachers are unfamiliar with computer technology and prefer to use paper.

How much of the ongoing focus on pencil and paper is a reflection on the preferences and biases of teachers and parents?

What about typing?

Cochran-Smith (1991) showed that when students have to type their documents, they produce fewer spelling errors and produce longer text. This was reaffirmed in more recent meta-analyses by Goldberg et al. (2003) and Graham et al. (2012). A key determinant of success is that students must be good trained to type: don’t compare students who have 12 years of training with pen and paper with students who hardly ever touched a keyboard. Christensen (2004) also found that typing produced better narratives.

Ouellette and Tims (2014) found no difference on a word‐recognition and spelling test between the typing and handwriting conditions. Masterson and Apel (2006) found no difference in quality of spelling between the handwriting and typing conditions.

What about paper dictionary versus electronic dictionary? Chen found that there are no significant differences between electronic dictionaries and paper dictionaries use in comprehension, production and retention of vocabulary although the speed of the former is significantly faster than the latter.

I did about one hour of research for this blog post. The scientific evidence seems clear to me. If you account for various factors, including skills and familiarity, there is no inherent superiority of the pencil over the keyboard, of the paper over the screen. The fact that schools in Montreal provide keyboards and screens to the many students with learning disabilities also supports the non-superiority view.

And yet I am told by teachers all around, by education researchers and professors that, obviously, we use pencil and paper because it is better for the students. It seems to me to be merely a reflection of their own preferences.

They invoke some mysterious knowledge, some untold studies. They dismiss my own survey made entirely of peer-reviewed scientific papers. They argue from authority. We are the educators, we know what is best. They evoke neuroscience, biology, psychology, but without ever sticking to precise falsifiable (scientific) claims.

Yes, we urgently need to reconnect schooling with reality. People use computers to do mathematics or they get fired. People use computers to type and publish documents or they never find work. We spell-check using computers, not with a paper dictionary. We no longer write letters with pen and paper.

Reactionary education experts are not fooling young boys like mines. They know you are disconnected. They know you can’t touch type. They know you can’t find answers outside of books and conferences.

Yes we also need to teach kids about entrepreneurship because they are not going to have factory jobs where they have to write reports on paper. Many of them are going to be self-employed.

Yes, we need to teach them about critical thinking because the hierarchical world of experts is gone and fakenews is everywhere.

But that’s a bit scary, isn’t it? What if the kids start asking for the scientific evidence against using screens to read? What if they actually go through it all? What if they question the conclusions that the reactionary educators have?

No, you don’t need to teach all of them about trigonometry and the volume of a cone. Is that the best you have? It is not good enough. You are failing our kids.

Enforcing the use of paper dictionaries in schools for pedagogical reasons in 2019 is as scientific as rejecting vaccination for fear of autism.

Science and Technology links (March 2nd, 2019)

  1. Shapiro in his book The New Childhood argues that we should stop worrying so much about how much time our kids spend on video games and on their phone. He calls for a pivot in how we view childhood:

    The procedural rhetoric of the classroom helped kids become accustomed to the organizational conventions of the Industrial Age meritocracy. They were primed for a world of corporate CEOs, rabbis, ministers, mayors, foremen, and bosses. They watched how classroom aides turned to teachers for validation, how teachers were required to treat condescending principals as experts, how principals acquiesced to administrators. The power dynamics of authority were obvious. And underneath, an implicitly hierarchical understanding of intelligence was simultaneously taking hold. Kids learned to imagine that information and knowledge are transmitted through pyramid schemes—disseminated downward and outward from a single point of authority. They learned to see socioeconomic status and professional achievement as indicators of superior character and wisdom. Intelligence became something you hoard, like wealth. Ideas were owned like property. And perhaps, at one time, this model was appropriate. But in a world of Google searches, Reddit forums, blog posts, and social news streams, the old steadfast devotion to an intellectual pecking order becomes a liability. Today’s learning routines need to be structured in horizontal ways that mimic the skill sets required to live a fulfilled and productive adult life in a connected world. Classrooms should be redesigned to resemble co-working spaces. A cohort of peers should be assessed collectively, their capacity to openly exchange knowledge and skills prioritized above individual achievement. Teachers should imagine themselves as Sherpas, who are available to guide students through self-directed project-based learning activities. The new educators are not experts, but rather curators of driplike and participatory intellectual engagements.

    He has a very “constructivist” take for those of us who are into educational theory.

  2. With a new release of Microsoft Excel, you can take a picture of a table (e.g., from a research article) and have it appear as spreadsheet.
  3. Professors typically do not get paid when they publish. In fact, the opposite is true. Publishers often ask the authors to pay publication fees while they ask the authors to relinguish all rights to the publisher. Furthermore, the publisher almost always sell access to the scientific publications to academic libraries. To be clear, this money almost always come from either the government (through research grants), but governments have done little to “regulate” publishers. Being a publisher can be a nice business. The University of California has decided to pull out of a subscription deal with one of the largest publisher (Elsevier). A few points are worth keeping in mind:
    • Just like we don’t want journalists to go away, we also want to maintain paid editors and support staff around academic publications. It reasonable that some of our research dollars end up in the pocket of people curating the research articles. Sadly, funding agencies are often not keen on openly funding scientific publications. That’s a mistake in my opinion.
    • The bulk of the power of publishers comes through copyright law. Copyright is granted by the state to authors. At any time governments could revise copyright law: I have long advocated that academic work should be free of copyright. What is the moral case for granting someone a nearly perpetual monopoly on a research paper funded entirely by the government? In any case, these authors, often either professors or students, turn around and grant this copyright, for free, to the publisher, knowing that the publisher will turn around and charge their own school (and government) for the right of access to this work. Even as government funding agencies ask professors to make their work accessible to the public, the compliance rate is often almost nil. Evidently, professors and students are complicit.
    • The system we have today evolved in large part thanks to librarians who supported the strict copyright policies benefiting publishers, above and beyond what the law prescribed. For example, many librarians are reticent to index academic work that does not come from “copyright holding” publisher.
    • A big deal is made of the fact that Elsevier is “for profit”. However, other organizations charge as much or more that Elsevier, such as IEEE or one of several university presses (e.g., Oxford). They use the same business model. In some cases, they can be worse. My own university once consulted me as to whether we should subscribe to IEEE journals, and I urged them not to: they charged us much more than Elsevier would. Discriminating between publishers based on whether or not they are “for profit” seems wrong: a better metric is whether they provide good value.

Parsing JSON quickly: early comparisons in the wild

JSON is arguably the standard data interchange format on the Internet. It is text-based and needs to be “parsed”: the input string needs to be transformed into a conveniently data structure. In some instances, when the data volume is large, ingesting JSON is a performance bottleneck.

Last week, we discreetly made available a new library to parse JSON called simdjson. The objective of the library is to parse large volumes of JSON data as quickly as possible, possibly reaching speeds in the gigabytes per second, while providing full validation of the input. We parse everything and still try to go fast.

As far as we can tell, it might be the first JSON parser able to process data at a rate of gigabytes per second.

The library quickly become popular, and for a few days it was the second most popular repository on GitHub. As I am writing these lines, more than a week later, the library is still the second “hottest” C++ library on GitHub, ahead of famous machine-learning libraries like tensorflow and opencv. I joked that we have beaten, for a time, deep learning: tensorflow is a massively popular deep-learning library.

We have paper full of benchmarks and my co-author (Geoff) has a great blog post presenting our work.

As is sure to happen when a piece of software becomes a bit popular, a lot of interesting questions are raised. Do the results hold to scrutiny?

One point that we have anticipated in our paper and in the documentation of the software is that parsing small JSON inputs is outside of our scope. There is a qualitative difference between parsing millions of tiny documents and a few large ones. We are only preoccupied with sizeable JSON documents (e.g., 50kB and more).

With this scope in mind, how are our designs and code holding up?

There is already a C# port by Egor Bogatov, a Microsoft engineer. He finds that in several instances, his port is several times faster than the alternatives. I should stress that his code is less than a week old.

The C++ library has been available for less than a week, so it is still early days. Nevertheless, clever programmers have published prototypical bindings in Python and Javascript (node). In both instances, we are able to beat the standard parsers in some reasonable tasks. A significant fraction of the running is spent converting raw results from C++ into either Python or JavaScript objects. But even with this obstacle, the Python wrapper can be several times faster than the standard Python JSON parser.

It is always going to require significant engineering to get great performance when you need to interface tightly with a high-level language like Python or JavaScript…  When you really need the performance, the trick is to push the computation down within the guts of the C/C++ code.

Where next? I do not know. We have many exciting ideas. Porting this design to ARM processors is among them.