On the cost of converting ASCII to UTF-16

Many programming languages like Java, JavaScript and C# represent strings using UTF-16 by default. In UTF-16, each ‘character’ uses 16 bits. To represent all 1 million unicode characters, some special ‘characters’ can be combined in pairs (surrogate pairs), but for much of the common text, one character is truly 16 bits.

Yet much of the text content processed in software is simple ASCII. Strings of numbers for example are typically just ASCII. ASCII characters can be represented using only 7 bits.

It implies that software has frequently to convert ASCII to UTF-16. In practice, it amounts to little more than to interleave our ASCII bytes with zero bytes. We can model such a function with a simple C loop.

void toutf16(const uint8_t *array, size_t N,
              uint16_t *out) {
  for (size_t i = 0; i < N; i++) {
    out[i] = array[i];

How expensive do we expect this code to be?

Compared to simple copy from N bytes to N bytes, we are writing an extra N bytes. With code that reads and writes a lot of data, it is often sensible to use as a model the number of written bytes.

In terms of instructions, an x64 processor can use SIMD instructions to accelerate the processing. However, you would hope that most processors can do this processing at high speed.

I wrote a benchmark in C and ran it on different systems. I use a small input ASCII string (10kB). I measure the throughput based on the input size.

to utf16 memcpy
AMD Zen 2 (x64), GNU GCC 8, -O3 24 GB/s 46 GB/s
Apple M1, clang 12 35 GB/s 68 GB/s

Of course results will vary and I expect that it is entirely possible to greatly accelerate my C function. However, it seems reasonable to estimate that the computational cost alone might be twice that of a memory copy. In practice, it is likely that memory allocation and structure initialization might add a substantial overhead when copying ASCII content into a UTF-16 string.

Science and Technology links (February 6th 2021)

  1. You can use artificial intelligence and satellite images to count the number of elphants found in the wild.
  2. It appears that a billion people on Earth now use an iPhone. The number would be higher if not for the pandemic.
  3. A supplement used by body builders (alpha-ketoglutarate) made old mice healthier, and they lived longer as a result. The treated mice looked biologically younger.
  4. According to an article published in Nature, the Antartic continent has not warmed in the last seven decades. The sea ice area has also grown.
  5. It appears that children with autism become less active physically over time compared to neurotypical children.
  6. We are getting a signal from the closest star to our own (Proxima Centauri) and the signal might be a sign of intelligent life.
  7. New research suggests that adding cheese and red wine to the diet daily, and lamb on a weekly basis, may also improve long-term cognitive outcomes.

Number Parsing at a Gigabyte per Second

Computers typically rely on binary floating-point numbers. Most often they span 64 bits or 32 bits. Many programming languages call them double and float. JavaScript represents all its numbers, by default, with a 64-bit binary floating-point number type.

Human beings most of often represent numbers in decimal notation, such as 0.1 or 1e-1. Thus many systems store numbers in decimal notation using ASCII text. The software must go from binary floating-point numbers to ASCII and back. There has been much work done on the serialization (from binary floating-point numbers to ASCII) but comparatively less work on the deserialization (from ASCII to binary floating-point numbers).

Typically, reading decimal numbers and converting them to binary floating-point numbers is slow. How slow? Often on the order of 200 MB/s. So much slower than your disk, if you have a fast disk. A PlayStation 5 has a disk capable of over 5 GB/s in bandwidth.

You can do much better. I finally published a manuscript that explains a better approach: Number Parsing at a Gigabyte per Second. Do not miss the acknowledgements section of the paper: this was joint work with really smart people.

The benchmarks in the paper are mostly based on the C++ library fast_float. The library requires a C++11 standard compliant compiler. It provides functions that closely emulate the standard C++ from_chars functions for float and double types. It is used by Apache Arrow and Yandex ClickHouse. It is also part of the fastest Yaml library in the world. These from_char functions are part of the C++17 standard. To my knowledge, only microsoft implemented it at this point: they are not available in GNU GCC.

On my Apple M1 MacBook, using a realistic data file (canada), we get that fast_float can far exceeds a gigabyte per second, and get close to 1.5 GB/s. The conventional C function (strtod) provided by the default Apple standard library does quite poorly on this benchmark.

What about other programming languages?

A simplified version of the approach is now part of the Go standard library, thanks to Nigel Tao and other great engineers. It accelerated Go processing while helping to provide exact parsing. Nigel Tao has a nice post entitled The Eisel-Lemire ParseNumberF64 Algorithm.

What about Rust? There is a Rust port. Unsurprisingly, the Rust version is a match for the C++ version, speed-wise. Here are the results using the same file and the same processor (Apple M1):

from_str (standard) 130 MB/s
lexical (popular lib.) 370 MB/s
fast-float 1200 MB/s

There is an R binding as well, with the same great results:

On our machine, fast_float comes out as just over 3 times as fast as the next best alternative (and this counts the function calls and all, so pure parsing speed is still a little bettter).

A C# port is in progress and preliminary results suggest we can beat the standard library by a healthy margin. I am hoping to get a Swift and Java port going this year (help and initiative are invited).

Video. Last year, I gave a talk at Go Systems Conf SF 2020 entitled Floating-point Number Parsing w/Perfect Accuracy at GB/sec. It is on YouTube.

Further reading. See my earlier posts… Fast float parsing in practice (March 2020 blog post) and Multiplying backward for profit (April 2020 blog post).

Science and Technology links (January 24th 2021)

  1. Year 2020 was great for PC makers. We are selling more and more PCs. Reportedly, Sony sold 3.4 million PlayStation 5 in only four weeks, a record. The demand for the Facebook Quest 2 VR headset is reportedly several times the demand for the original Quest. Valve, the company that makes the Index VR headset, is unable to make enough to meet demand. The demand for computer chips is so high that car manufacturers ran out of chips and had to close car factories. The demand for workers in information technology has remained strong throughout 2020.
  2. There might be a way to predict autism in children by studying the sperm of the father.
  3. As we age, we accumulate senescent cells. In chronic large quantities, these cells are harmful. Thankfully, there are now effective therapies to remove these cells, at least in mice. Doing so makes the old mice smarter. This suggests that the senescent cells makes them dumber in the first place.
  4. Supplementing old mice with a hormone that our bodies generate during execise makes them run twice as hard.
  5. Researchers rejuvenated old human cells by 30 years. The process is tricky and most be done in a laboratory but it is conceivable that human tissue could be rejuvenated in a laboratory, prior to transplantation.
  6. Cold water immersion following an endurance workout probably reduces your muscle growth. So go easy with the cold showers and baths!

Science and Technology links (January 16th 2021)

  1. You can tell people’s political affiliation by image recognition technology.
  2. There are far fewer stars and galaxies than we thought. The universe is relatively small. (The source article has been revised with different conclusions.)
  3. Dog ownership conferred a 31% risk reduction for cardiovascular death.
  4. People with high total cholesterol or LDL-C live just as long or longer than people with low cholesterol. Statin trials have been unable to lower total mortality; no statin trial has succeeded with lowering mortality in women, elderly people, or diabetics; and that cholesterol-lowering with statins has been associated with many serious side effects.
  5. Eat fat with potatoes is probably a bad idea if you are hoping to lose weight.

Science and Technology links (January 9th 2021)

  1. The Earth is spinning faster and faster: The 28 fastest days on record (since 1960) all occurred in 2020, with Earth completing its revolutions around its axis milliseconds quicker than average.
  2. We are soon getting a new Wi-Fi standard called Wi-Fi 6: it supports data transmission at over 1 GB/s, nearly three times the speed of previous standards.
  3. Eli Dourado predicts:

    By the middle of the decade, augmented reality will be widely deployed, in the same way that smart watches are today. Glasses will be computing devices. Every big tech company has a glasses project at a relatively mature stage in the lab today. The need for the glasses to understand context could result in much smarter digital assistants than today’s Siri, Alexa, and so on.

    I agree with Dourado.

  4. Vitamin D3 may reduce the risk of developing advanced cancer.
  5. High-tech running shoes improve elite marathon performance.
  6. In our brains, neurons are connected toghether by their dendrites. It seems that, at least in human beings, large dendrites are capable of computation on their own.
  7. A study of 145 journals in various fields, spanning 1.7 million authors, revealed that there was no obvious bias against authors with female names and, in fact, female authors might even be slightly favored by referees. Such a study does not show that sexism does not exist. It does not show that journals have never been biased against female authors.

Memory access on the Apple M1 processor

When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks.

Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I read a byte at memory address x and then I read a byte at memory address x+1, I can assume that the second byte comes “for free”.

This naive memory-access model is often sensible. However, you should always keep in mind that it is merely a model. A model can fail to predict real performance.

How might it fail? A CPU core can issue multiple memory requests at once. So if I need to access 7 memory locations at once, I can issue 7 memory requests and wait for them. It it is likely that waiting for 7 memory requests is slower than waiting for a single memory request, but is it likely to be 7 times slower?

The latest Apple laptop processor, the M1, has apparently a lot of memory-level parallelism. It looks like a single core has about 28 levels of memory parallelism, and possibly more.

Such a high degree of memory-level parallelism makes it less likely that our naive random-memory model applies.

To test it out, I designed the following benchmark where I compare three functions. The first one just grabs pairs of randomly selected bytes and it computes a bitwise XOR between them before adding them to a counter:

  for(size_t i = 0; i < 2*M; i+= 2) {
    answer += array[random[i]] ^ array[random[i + 1]];

We compare against a 3-wise version of this function:

  for(size_t i = 0; i < 3*M; i+= 3) {
    answer += array[random[i]] ^ array[random[i + 1]] 
              ^ array[random[i + 2]];

Our naive memory-access cost model predicts that the second function should be 50% more expensive. However many other models (such as a simple instruction count) would also predict a 50% overhead.

To give our naive memory-access model a run for its money, let us throw in a 2-wise version that also accesses nearby values (with one-byte offset):

  for(size_t i = 0; i < 2*M; i+= 2) {
    int idx1 = random[i];
    int idx2 = random[i + 1];
    answer += array[idx1] ^ array[idx1 + 1] 
           ^ array[idx2]  ^ array[idx2 + 1];

Our naive memory-access cost model would predict that first and last function should have about the same running time while the second function should be 50% more expensive.

Let us measure it out. I use a 1GB array and I report the average time spent in nanosecond on each iteration.

2-wise 8.9 ns
3-wise 13.0 ns
2-wise + 12.5 ns

At first glance, our naive memory-access model is validated: the 3-wise function is 46% more expensive than the 2-wise function. Yet we should not be surprised because most reasonable models would make such a prediction since in almost every way, the function does 50% more work.

It is more interesting to compare the two 2-wise function… the last one is 40% more expensive than the first 2-wise function. It contradicts our prediction. And so, at least in this instance, our simple memory-access cost model fails us on the Apple M1 processor.


  1. My source code is available. The run-to-run variability is relatively high on such a test, but the conclusion is robust, on my Apple M1 system.
  2. I posted the assembly online.
  3. Importantly, I do not predict that other systems will follow the same pattern. Please do not run this benchmark on your non-M1 PC and expect comparable results.
  4. This benchmark is meant to be run on an Apple MacBook with the M1 processor, compiled with Apple’s clang compiler. It is not meant to be used on other systems.

Peer-reviewed papers are getting increasingly boring

The number of researchers and peer-review publications is growing exponentially.  It has been estimated that the number of researchers in the world doubles every 16 years and the number of research outputs is increasing even faster.

If you accept that published research papers are an accurate measure of our scientific output, then we should be quite happy. However, Cowen and Southwood take an opposing point of view and represent this growth as a growing cost without associated gains.

(…) scientific inputs are being produced at a high and increasing rate, (…) It is a mistake, however, to infer that increases in these inputs are necessarily good news for progress in science (…) higher inputs are not valuable p​er se​, but instead they are a measure of cost, namely how much is being invested in scientific activity. The higher the inputs, or the steeper the advance in investment, presumably we might expect to see progress in science be all the more impressive. If not, then perhaps we should be worried all the more.

So are these research papers that we are producing in greater numbers… the kind of research papers that represent real progress? Bhattacharya and Packalen conclude that though we produce more papers, science itself is stagnating because of the worse incentives which focuses the research on low-risk/no-reward ventures as opposed to genuine progress:

This emphasis on citations in the measurement of scientific productivity shifted scientist rewards and behavior on the margin toward incremental science and away from exploratory projects that are more likely to fail, but which are the fuel for future breakthroughs. As attention given to new ideas decreased, science stagnated.

Thurner et al. concur in the sense that they find that “out-of-the-box” papers are getting harder to find:

over the past decades the fraction of mainstream papers increases, the fraction of out-of-the-box decreases

Surely, the scientists themselves have incentives to course correct and encourage themselves to produce more important and exciting research papers?

Collison and Nielsen challenge scientists and institutions to tackle this perceived diminishing scientific productivity:

Most scientists strongly favor more research funding. They like to portray science in a positive light, emphasizing benefits and minimizing negatives. While understandable, the evidence is that science has slowed enormously per dollar or hour spent. That evidence demands a large-scale institutional response. It should be a major subject in public policy, and at grant agencies and universities. Better understanding the cause of this phenomenon is important, and identifying ways to reverse it is one of the greatest opportunities to improve our future.

If we believe that research papers are becoming worse, that fewer of them convey important information, then the rational approach is to downplay them. Whenever you encounter a scientist and they tell you about how many papers they have published or where they were published, or how many citations they got… you should not mock the scientist in question, but you ought to bring the conversation at another level. What is the scientist working on and why is it important work? Dig below the surface.

Importantly, it does not mean that we should discourage people from publishing a lot of papers not anymore than we generally discourage programmer from writing many lines of code. Everything else being equal, people who love what they are doing, and who are good at it, will do more of it. But nobody would mistake someone who writes a lot as a good writer if they aren’t.

We need to challenge the conventional peer-reviewed research paper, by which I refer to a publication was reviewed by 2 to 5 peers before getting published. It is a relatively recent innovation that may not always be for the best. People like Einstein did not go through this process, at least not in their early years. Research used to be more more like “blogging”. You would write up your ideas and share them. People could read them and criticize them. This communication process can be done with different means: some researchers broadcast their research meetings online.

The peer-reviewed research papers allow you to “measure” productivity. How many papers in top-tier venues did research X produce? And that is why it grew so strong.

There is nothing wrong with people seeking recognition. Incentives are good. But we should reward people for the content of their research, not for the shallow metadata we can derive from their resume. If you have not read and used someone’s work, you have no business telling us whether they are good or bad.

The other related problem is the incestious relationship between researchers and assessment. Is the work on theory X important? “Let us ask people who work on theory X”. No. You have to have customers, users, people who have incentives to provide honest assessments. A customer is someone who uses your research in an objective way. If you design a mathematical theory or a machine-learning algorithm and an investment banker relies on it, they are your customer (whether they are paying you or not). If it fails, they will stop using it.

It seems like the peer-review research papers establish this kind of customer-vendor relationship where you get a frank assessment. Unfortunately, it fails as you scale it up. The customers of the research paper are the independent readers, that is true, but they are the readers who have their own motivations.

You cannot easily “fake” customers. We do so sometimes, with movie critics, say. But movie critics have an incentive to give your recommendations you can trust.

We could try to emulate the movie critic model in science. I could start reviewing papers on my blog. I would have every incentive to be a good critic because, otherwise, my reputation might suffer. But it is an expensive process. Being a movie critic is a full time job. Being a research paper critic would also be a full time job.

What about citations? Well, citations of often granted by your nearest peers. If they are doing work that resembles yours, they have no incentive to take it down.

In conclusion, I do find it credible that science might be facing a sort of systemic stagnation brought forth by a set of poorly aligned incentives. The peer-reviewed paper accepted at a good venue as the ultimate metric seems to be at the core of the problem. Further, the whole web of assessment in modern science often seems broken. It seems that, on an individual basis, researchers ought to adopt the following principles:

  1. Seek objective feedback regarding the quality of your own work using “customers”: people who would tell you frankly if your work was not good. Do not mislead citations or “peer review” for such an assessment.
  2.  When assessing another research, try your best to behave as a customer who has some distance from the research. Do not count inputs and outputs as a quality metric. Nobody would describe Stephen King as a great writer because he published many books. If you are telling me that Mr Smith is a great researcher, then you should be able to tell me about the research and why it is important.

Further reading:

My Science and Technology review for 2020

    1. The original PlayStation game console (1994) was revolution thanks in part to its CD drive that could read data at an astonishing 0.3 MB/s. In 2020, the PlayStation 5 came out with 5 GB/s of disk bandwidth, so over 15,000 more bandwidth than the original PlayStation.
    2. The Samsung S10+ phone can be purchased with 1 TB of storage. It is enough storage to record everything you hear in daily life for ten years or to store everything you see for several weeks. It is relatively easy to buy a 4 TB SSD for your PC, but difficult to go much higher.
    3. Drones are used to keep Europeans in check during the COVID 19 pandemics, they take people’s temperature and issue fines.
    4. In the state of New York, people can get married legally by videoconference.
    5. Virtually all kids and college students have taken online classes in 2020 in the developed world. It is now a widely held view (though not uncontested) that the future of colleges is online.
    6. UCLA researchers have achieved widespread rejuvenation in old mice through blood plasma dilution, a relatively simple process, they plan to conduct clinical trials in human beings “soon”. (Other reference.)

Science and Technology links (December 26th 2020)

  1. Researchers used a viral vector to manipulate eye cells genetically to improve the vision of human beings.
  2. Seemingly independently, researchers have reported significant progress regarding the solution of the Schrödinger equation using deep learning: Puppin et al., Hermann et al.
  3. The Dunning-Kruger Effect Is Probably Not Real. I am becoming quite upset with respect to the many effects in psychology that fail to be independently verified. And I’d feel better if it were only a problem in psychology.
  4. Can the technology behind COVID-19 vaccines lead to other breakthroughs?

In 2011, I predicted that the iPhone would have 1TB of storage in 2020

Someone reminded me of a prediction I made in 2011:

At the time, an iPhone could hold at most 32 GB of data, so 1 TB sounded insane.

Unfortunately, Google Plus is no more so you cannot see the plot showing my projection and I lost it as well. Yet we can build a table:

2010 iPhone 4 32 GB
2012 iPhone 5 64 GB
2014 iPhone 6 128 GB
2016 iPhone 7 256 GB
2018 iPhone XS 512 GB
2019 iPhone 11 Pro 512 GB
2020 iPhone 12 Pro 512 GB

How did my prediction fare ? I got it wrong, of course, but I think it was remarkably prescient. It seems obvious that Apple could have gone with 1 TB but chose not to. The Samsun Galaxy S10+ comes with 1 TB of storage.

Some analysts predict that the iPhone 13 might have 1 TB of storage.

Science and Technology links (December 19th 2020)

    1. The Flynn effect is the idea that people get smarter over time (generation after generation). The negative effect is the recent observations that people are getting dumber. It seems that there is no negative Flynn effect after all. We are not getting dumber.
    2. Year 536 was one of the worse years to be alive. Temperatures fell 1.5C to 2.5C during the summer and crops failed. Mass starvation soon followed. Cold weather is deadly.
    3. A drug reversed age-related cognitive decline in mice within a few days.
    4. Glucosamine, a popular supplement, reduces mortality. It may not do much against joint pain, however.
    5. Singapore will have flying electric taxi services.
    6. Japan’s population is projected to fall from a peak of 128 million in 2017 to less than 53 million by the end of the century.
    7. NASA spent $23.7 billion on the Orion spacecraft, which flew once. Meanwhile, the private company SpaceX received less than $20 billion in funding and executed more than 100 launches to orbit, it made vertical landing work, and more.
    8. We are working far fewer hours.

Cognitive biases

One-sided bet: People commonly assume implicitly that their actions may only have good outcomes. For example, increasing the minimum wage in a country may only benefit the poor. Taking a lottery ticket only has the upside of possibly winning a lot of money. Believing in God can only have benefits. And so forth. In truth, most actions are two-sided. They have good and bad effects.

Politician’s syllogism: We must do something, this is something so we must do it. “We must fight climate change, we can tax oil, so we must tax oil.” If there is a problem, it is important to assess the actions we could take and not believe that because they are actions in response to a real problem, they are intrinsically good.

Confirmation bias: “I believe that there are extraterrestrials, I have collected 1000 reports confirming their presence” (but I am blind to all of the negative evidence). People tend to make up their mind first and then to seek to rationalize their opinion whereas they should do the opposite.

Historical pessimism bias: “Human life was so much better 2 centuries ago!” Yet by almost any measure, human beings have better lives today.

Virtual reality… millions but not tens of millions… yet

In February 2016, I placed a bet against Greg Linden in these terms:

within the next three years, starting in March of this year, we would sell at least 10 million VR units a year (12 continuous months) worldwide.

According to some sources, around 5 million units have been sold each year in 2019 and 2020. Strictly nobody is claiming that near 10 million units were sold in a single year. Thus I conceded the bet against Greg and paid $100 to the Wikipedia foundation. Greg has a blog post on this bet.

I believe that both Greg and myself agree that though we have not reached the 10-million-unit threshold yet, we will in a few short years. You should expect a non-linear growth: as more headsets are sold, more applications are built, and thus more headsets are sold…

It is important to put yourself in the context where this bet was made. At the time, three VR headsets were about to be released (Facebook’s Oculus Rift, HTV Vive and the PlayStation VR). As far I know, neither Greg nor myself had any experience whatsoever with these headsets. The Oculus Rift was to ship with a game controller so we had reasons to be skeptical about the hardware quality.

I expected that selling 10 million units a year had long odds. I expected, at best, a close call. Yet I still expected that we would sell millions of units even if I lost, which I believe is what happened. I expected that at least one of the current players (Oculus, Sony and HTC) would fold while at least one new player would enter the market. It seems that HTC bet the farm initially on this market but reduced its presence over time while the Valve Index was a nice surprise.

I acquired several headsets. It turns out that the hardware exceeded my expectations. People who complain about the bulky headsets have often not followed through the various iterations. Hardware can always be lighter and finer, but the progress has exceeded by expectations.

I also built a few software prototypes of my own, and it was remarkably easy. Both of the software and the hardware aspect worked out much better than I expected, but the killer applications have not emerged yet.

My own laboratory acquired headsets and built prototypes. It took me months to reach rather elementary realizations. Explaining VR is harder than it sounds. No, it is not like having moving from a 2D surface to a 3D surface. It is an embodied experience. And that is where I conjecture the real difficulty lies. We are all familiar with video games and movies, and the web. But we have a much harder time thinking about VR and what it can and cannot do.

Let me revisit critically my statements from 2016:

  1. Virtual reality is a major next step so that backers will be generous and patient.
    It is unclear to me how much truth there was in this statement. Certainly Facebook, Valve and HTC have invested a lot but I kept hearing about start-up folding up early. The fact that hardly anyone made a lot of money did not help. Meanwhile, a lot of the people working in VR can quickly switch to more profitable non-VR projects, so the talented individuals do not stick around.
  2. I’d be surprised if the existing Oculus Rift sold more than a few hundred thousand units. It is just too expensive. It just not going to be on sale at Walmart.
    The Oculus Rift is on sale at Walmart for $300. But I am correct regarding the unit sales: the Oculus Rift did not sell in the millions of units.
  3. But within two years, we can almost guarantee that the hardware will either be twice as good or cost half as much. With any luck, in two years, you will be able to buy a computer with a good VR headset for a total of less than $1000 at Walmart.
    I did not foresee that standalone headsets like the Oculus Quest would essentially match the original PC headsets at a fraction of the cost. The Oculus Quest is under 500$. Cheaper than a game console. It is light (500g), it has high resolution ( 1832×1920 per eye). It has a low-latency 72 Hz display. Six degrees of freedom. Sadly, you must tie it to your Facebook account which is a turn off for many people. There are rumours of very good Chinese headsets but they have not been commercialized yet where I live.
  4. A company like Sony has more than enough time in three years to bring the prices down and get game designers interested. Will the technology be good enough to attract gamers? If it is, then it might just be possible to sell 10 million units in a year.
    Sony released the PlayStation 5 without stressing VR. Half-Life: Alyx was one of the best-selling game of 2019 but it did not sell in the millions. There are good VR video games but very few high-budget ventures.

Conclusion. VR did not see the same kind of explosive growth that other technologies have seen. But the infrastructure has been built and the growth will happen. Prices have fallen and quality has jumped up. Sooner than you think, VR will enter your life if it hasn’t yet.

Converting floating-point numbers to integers while preserving order

Many programming languages have a number type corresponding to the IEEE binary64. In many languages such as Java or C++, it is called a double. A double value uses 64 bits and it represents a significand (or mantissa) multiplied by a power of two: m * 2p. There is also a sign bit.

A simpler data type is the 64-bit unsigned integer. It is a simple binary representation of all numbers between 0 to 264.

In a low-level programming language like C++, you can access a double value as if it were an unsigned integer. After all, bits are bits. For some applications, it can be convenient to regard floating-point numbers as if they were simple 64-bit integers.

In C++, you can do the conversion as follows:

uint64_t to_uint64(double x) {
    uint64_t a;
    return a;

Though it looks expensive, an optimizing compiler might turn such code into something that is almost free.

In such an integer representation, a double value looks as follows:

  • The most significant bit is the sign bit. It has value 1 when the number is negative, and it has value 0 otherwise.
  • The next 11 bits are usually the exponent code (which determines p).
  • The other bits (52 of them) are the significand.

If you omit infinite values and not-a-number code, a comparison between two floating-point numbers is almost trivially the same as a comparison two integer values.

If you know that all of your numbers are positive and finite, then you are done. They are already in sorted order. The following comparison function should suffice:

bool is_smaller(double x1, double x2) {
    uint64_t i1 = to_uint64(x1);
    uint64_t i2 = to_uint64(x2);
    return i1 < i2;

If your values can be negative, then you minimally need to reverse the sign bit, since it is wrong: we want large values to have their most significant bits set, and small values to have it unset. But just flipping one bit is not enough, you want negative values having a large absolute value to become small. To do so, you need to negate all bits, but only when the sign bit is set. It turns out that some clever programmer has worked up an efficient solution:

uint64_t sign_flip(uint64_t x) {
   // credit http://stereopsis.com/radix.html
   // when the most significant bit is set, we need to
   // flip all bits
   uint64_t mask = uint64_t(int64_t(x) >> 63);
   // in all case, we need to flip the most significant bit
   mask |= 0x8000000000000000;
   return x ^ mask;

You have now an efficient comparator between two floating-point values using integer arithmetic:

bool generic_comparator(double x1, double x2) {
    uint64_t i1 = sign_flip(to_uint64(x1));
    uint64_t i2 = sign_flip(to_uint64(x2));
    return i1 < i2;

For finite numbers, we have shown how to map floating-point numbers to integer values while preserving order. The map is also invertible.

Sometimes you are working with floating-point numbers but would rather process integers. If you only need to preserve order, you can use such a map.

My source code is available.

ARM MacBook vs Intel MacBook: a SIMD benchmark

In my previous blog post, I compared the performance of my new ARM-based MacBook Pro with my 2017 Intel-based MacBook Pro. I used a number parsing benchmark. In some cases, the ARM-based MacBook Pro was nearly twice as fast as the older Intel-based MacBook Pro.

I think that the Apple M1 processor is a breakthrough in the laptop industry. It has allowed Apple to sell the first ARM-based laptop that is really good. It is not just the chip, of course. It is everything around it. For example, I fully expect that most people who buy these new ARM-based laptops to never realize that they are not Intel-based. The transition is that smooth.

I am excited because I think it will drive other laptop to rethink their designs. You can buy a thin laptop from Apple with a 20-hour battery life and the ability to do intensive computations like a much larger and heavier laptop would.

(This blog post has been updated after a corrected a methodological mistake. I was running the Apple M1 processor under x64 emulation.)

Yet I did not think that the new Apple processor is better than Intel processors in all things. One obvious caveat is that I am comparing the Apple M1 (a 2020 processor) with an older Intel processor (released in 2017). But I thought that even the older Intel processors can have an edge over the Apple M1 in some tasks and I wanted to make this clear. I did not think it was controversial. Yet I was criticized for making the following remark:

In some respect, the Apple M1 chip is far inferior to my older Intel processor. The Intel processor has nifty 256-bit SIMD instructions. The Apple chip has nothing of the sort as part of its main CPU. So I could easily come up with examples that make the M1 look bad.

This rubbed many readers the wrong way. They pointed out that ARM processors do have 128-bit SIMD instructions called NEON. They do. In some ways, the NEON instruction set is nicer than the x64 SSE/AVX one. Recent Apple ARM processors have four execution units capable of SIMD processing while Intel processors only have three. Furthermore, the Intel execution units have more restrictions. Thus 64-bit ARM NEON routines will outperform comparable SSE2 (128-bit SIMD) Intel routines despite the fact that they both work over 128-bit registers. In fact, I have a blog post making this point by using the iPhone’s processor.

But it does not follow that the 128-bit ARM NEON instructions are generally a match for the 256-bit SIMD instructions Intel and AMD offer.

Let us test out the issue. The simdjson library offers SIMD-heavy functions to minify JSON and validate UTF-8 inputs. I wrote a benchmark program that loads a file in memory and then repeatedly calls the minify and validate function, looking for the best possible speed. Anyone with a MacBook and Xcode should be able to reproduce my results.

The vectorized UTF-8 validation algorithm is described in Validating UTF-8 In Less Than One Instruction Per Byte (published in Software: Practice and Experience).

The simdjson library relies on an abstraction layer so that functions are implemented using higher-level C++ which gets translated into efficient SIMD intrinsic functions specific to the targeted system. That is, we are not comparing different hand-tuned assembly functions. You can check out the UTF-8 validation code for yourself online.

Let us look at the results:

minify UTF-8 validate
Apple M1 (2020 MacBook Pro) 6.6 GB/s 33 GB/s
Intel Kaby Lake (2017 MacBook Pro) 7.7 GB/s 29 GB/s
Intel/M1 ratio 1.2 0.9

As you can see, the older Intel processor is slightly superior to the Apple M1 in the minify test.

Of course, it is only one set of benchmarks. There are many confounding factors. Did the algorithmic choices favour the AVX2 ISA? It is possible. Thankfully all of the source code is available so any such bias can be assessed.

ARM MacBook vs Intel MacBook

Up to yesterday, my laptop was a large 15-inch MacBook Pro. It contains an Intel Kaby Lake processor (3.8 GHz). I just got a brand-new 13-inch 2020 MacBook Pro with Apple’s M1 ARM chip (3.2 GHz).

How do they compare? I like precise data points.

Recently, I have been busy benchmarking number parsing routines where you convert a string into a floating-point number. That seems like an interesting comparison. In my basic tests, I generate random floating-point numbers in the unit interval (0,1) and I parse them back exactly. The decimal significand spans 17 digits.

I run the same benchmarking program on both machines. I am compiling both benchmarks identically, using Apple builtin’s Xcode system with the LLVM C++ compiler. Evidently, the binaries will differ since one is an ARM binary and the other is a x64 binary. Both machines have been updated to the most recent compiler and operating system.

My results are as follows:

Intel x64 Apple M1 difference
strtod 80 MB/s 115 MB/s 40%
abseil 460 MB/s 580 MB/s 25%
fast_float 1000 MB/s 1800 MB/s 80%

My benchmarking software is available on GitHub. To reproduce, install Apple’s Xcode (with command line tools), CMake (install for command-line use) and type cmake -B build && cmake --build build && ./build/benchmarks/benchmark. It uses the the default Release mode in CMake (flags -O3 -DNDEBUG).

I do not yet understand why the fast_float library is so much faster on the Apple M1. It contains no ARM-specific optimization.

Note: I dislike benchmarking on laptops. In this case, the tests are short and I do not expect the processors to be thermally constrained.

Update. The original post had the following statement:

In some respect, the Apple M1 chip is far inferior to my older Intel processor. The Intel processor has nifty 256-bit SIMD instructions. The Apple chip has nothing of the sort as part of its main CPU. So I could easily come up with examples that make the M1 look bad.

This turns out to be false. See my post ARM MacBook vs Intel MacBook: a SIMD benchmark

Science and Technology (December 5th 2020)

  1. Researchers find that older people can lose weight just as easily as younger people.
  2. Google DeepMind claims to have solved the protein folding problem, an important problem in medicine. This breakthrough could greatly accelerate drug development and lead to new cures. Yet,not everyone is convinced that they actually solved the problem.
  3. “Indian Americans have risen to become the richest ethnicity in America, with an average household income of $126,891 (compared to the US average of $65,316). (…) Almost 40% of all Indians in the United States have a master’s, doctorate, or other professional degree, which is five times the national average.” (source)
  4. There is a popular idea in the US currently: we should just forgive all student debts. Catherine and Yannelis find that “universal and capped forgiveness policies are highly regressive, with the vast majority of benefits accruing to high-income individuals.”
  5. Researchers successfully deployed advanced genetic engineering techniques (based on CRISPR) against cancer in mice.
  6. Researchers rejuvenated the cells in the eyes old mice, restauring their vision. (Source: Nature.)
  7. Remember all these studies claiming that birth order determined your fate, with older siblings going more in science and younger siblings going for more artistic careers? It seems that these results do not replicate very well given a re-analysis. The effects are much weaker than initially believed and they do not necessarily go in the expected direction.
  8. Older people (over 70) have less zinc in their blood. Their zinc level predicts their mortality rate. The more zinc, the less likely they are to die.
  9. Shenzhen (China) has truly driveless cars on the roads.
  10. Centanarians have low levels of blood sugar, and they are less likely to suffer from diabetes than adults in general.
  11. We have an actual treatment to help people suffering from progeria, a crippling disease.
  12. Eating eggs is quite safe.
  13. The state-of-the-art in image processing includes convolutional neural networks (CNN). Though it gives good results, it is a computationally expensive approach. Google has adapted a technique from natural-language processing called transformers to the task and they report massive gains in computational efficiency.

Interview by Adam Gordon Bell

A few weeks ago, Adam Gordon Bell had me on his podcast. You can listen to it. Here is the abstract:

Did you ever meet somebody who seemed a little bit different than the rest of the world? Maybe they question things that others wouldn’t question or said things that others would never say. Daniel is a world-renowned expert on software performance, and one of the most popular open source developers. If you measure by GitHub followers. Today, he’s gonna share his story. It involves time at a research lab, teaching students in a new way. it will also involve upending people’s assumptions about IO performance. Elon Musk And Julia Roberts will come up a little bit more than you might expect.

I would not describe myself as “world renowned” about anything, but Adam needs to do the a bit of promotion. My interview is right after an interview with Brian Kernighan: he is world renowned.

I also do not think that I am “different from the rest of the world” though I have maybe given more thought than most to the need to be different. I have always preoccupied about trying to do work that others do not do: sadly, it is much harder than it sounds.

I usually talk mostly about my work, but Adam wanted to go a bit personal, like how I was initially struggling at school.


Further reading: After giving this interview, I read Paul Graham’s latest essay. If you liked my interview, you will probably enjoy Graham’s essay. You might enjoy his essay in any case.

Java Buffer types versus native arrays: which is faster?

When programming in C, one has to allocate and de-allocate memory by hand. It is an error prone process. In contrast, newer languages like Java often manage their memory automatically. Java relies on garbage collection. In effect, memory is allocated as needed by the programmer, and then Java figures out that some piece of data is no longer needed, and it retrieves the corresponding memory. The garbage collection process is fast and safe, but it is not free: despite decades of optimization, it can still cause major headaches to developers.

Java has native arrays (e.g., the int[] type). These arrays are typically allocated on the “Java heap”. That is, they are allocated and managed by Java as dynamic data, subject to garbage collection.

Java also has Buffer types such as the IntBuffer. These are high-level abstractions that can be backed by native Java arrays but also by other data sources, including data that is outside of the Java heap. Thus you can use Buffer types to avoid relying so much on the Java heap.

But my experience is that it comes with some performance penalty compared to native arrays. I would not say that Buffers are slow. In fact, given a choice between a Buffer and a stream (DataInputStream), you should strongly favour Buffer types. However, they are not as fast as native arrays in my experience.

I can create an array of 50,000 integers, either with “new int[50000]” or as “IntBuffer.allocate(50000)”. The latter should essentially create an array (on the Java heap) but wrappred with an IntBuffer “interface”.

A possible intuition is that wrapping an array with an high-level interface should be free. Though it is true that high level abstractions can come with no performance penalty (and sometimes, even, performance gains), whether they do is an empirical matter. You should never just assume that your abstraction comes for free.

Because I am making an empirical statement, let us test it out empirically with the simplest test I can imagine. I am going to add one to every element in the array/IntBuffer.

for(int k = 0; k  < s.array.length; k++) { 
    s.array[k] += 1;
for(int k = 0; k  < s.buffer.limit(); k++) { 
    s.buffer.put(k, s.buffer.get(k) + 1);

I get the following results on my desktop (OpenJDK 14, 4.2 GHz Intel processor):

int[] 2.5 mus
IntBuffer 12 mus

That is, arrays are over 4 times faster than IntBuffers in this test.

You can run the benchmark yourself if you’d like.

My expectation is that many optimizations that Java applies to arrays are not applied to Buffer types.

Of course, this tells us little about what happens when Buffers are used to map values from outside of the Java heap. My experience suggests that things can be even worse.

Buffer types have not made native arrays obsolete, at least not as far as performance is concerned.