Science and Technology links (November 18th 2018)

  1. It seems that reducing your carbohydrate (sugar) intake might be a good way to lose weight:

    lowering dietary carbohydrate increased energy expenditure during weight loss maintenance. This metabolic effect may improve the success of obesity treatment, especially among those with high insulin secretion.

    I should warn that this study refers to “lowering sugar” not getting rid of it entirely.

  2. 85% of the more than $100bn a year spent on medical research globally is wasted avoidably
  3. Collison and Nielsen write:

    science has slowed enormously per dollar or hour spent. That evidence demands a large-scale institutional response. It should be a major subject in public policy, and at grant agencies and universities

    While I accept their demonstration, it is not clear what (if anything in particular) is causing this lack of productivity.

    Collison and Nielsen fall short of offering a solution. Maybe we ought to reinvent discovery?

  4. A man is going to court so that he can be considered 20 years younger than what his birth date indicates.

Simple table size estimates and 128-bit numbers (Java Edition)

Suppose that you are given a table. You know the number of rows, as well as how many distinct value each column has. For example, you know that there are two genders (in this particular table). Maybe there are 73 distinct age values. For a concrete example, take the standard Adult data set which is made of 48842 rows.

How many distinct entries do you expect the table to have? That is, if you remove all duplicate rows, what is the number of rows left?

There is a standard formula for this problem: Cardenas’ formula. It uses the simplistic model where there is no relationship between the distinct columns. In practice, it will tend to overestimate the number of rows. However, despite it is simplicity, it often works really well.

Let p be the product of all column cardinalities, and let n be number of rows, then the Cardenas estimate is p * (1 – (1 – 1/p)n). Simple right?

You can implement in Java easily enough…

double cardenas64(long[] cards, int n) {
    double product = 1;
    for(int k = 0;  k < cards.length; k++) {
      product *= cards[k];
    }
    return product 
         * (1- Math.pow(1 - 1.0/product,n));
 }

So let us put in the numbers… my column cardinalities are 16,16,15,5,2,94,21648,92,42,7,9,2,6,73,119; and I have 48842 rows. So what is Cardenas’ prediction?

Zero.

At least, that’s what the Java function returns.

Why is that? The first problem is that 1 – 1/p is 1 when p is that large. And even if you could compute 1 – 1/p accurately enough, taking it to the power of 48842 is a problem.

So what do you do?

You can switch to something more accurate than double precision, that is quadruple precision (also called binary128). There is no native 128-bit floats in Java, but you can emulate them using the BigDecimal class. The code gets much uglier. Elegance aside, I assumed it would be a walk in the park, but I found that the implementation of the power function was numerically unstable, so I had to roll my own (from multiplications).

The core function looks like this…

double cardenas128(long[] cards, int n) {
    BigDecimal product = product(cards);
    BigDecimal oneover = BigDecimal.ONE.divide(product,
       MathContext.DECIMAL128);
    BigDecimal proba = BigDecimal.ONE.subtract(oneover,
    MathContext.DECIMAL128);
    proba = lemirepower(proba,n);
    return product.subtract(
       product.multiply(proba, MathContext.DECIMAL128),
        MathContext.DECIMAL128).doubleValue();
    }

It scales up to billions of rows and up to products of cardinalities that do not fit in any of Java’s native type. Though the computation involves fancy data types, it is probably more than fast enough for most applications.

My source code is available.

Update: You can avoid 128-bit numbers by using the log1p(x) and expm1(x) functions; they compute log(x + 1) and exp(x) – 1 in a numerically stable manner. The updated code is as follow:

 double cardenas64(long[] cards, int n) {
  double product = 1;
  for(int k = 0;  k < cards.length; k++) {
    product *= cards[k];
  }
  return product * 
    -Math.expm1(Math.log1p(-1.0/product) * n);
}

(Credit: Harold Aptroot)

Memory-level parallelism: Intel Skylake versus Apple A12/A12X

Modern processors execute instructions in parallel in many different ways: multi-core parallelism is just one of them. In particular, processor cores can have several outstanding memory access requests “in flight”. This is often described as “memory-level parallelism”. You can measure the level of memory-level parallelism your processors has by traversing an array randomly either by following one path, or by following several different “lanes”. We find that recent Intel processors have about “10 lanes” of memory-level parallelism.

It has been reported that Apple’s mobile processors are competitive (in raw power) with Intel processors. So a natural question is to ask whether Apple’s processors have more or less memory-level parallelism.

The kind of memory-level parallelism I am interested in has to do with out-of-cache memory accesses. Thus I use a 256MB block of memory. This is large enough not to fit into a processor cache. However, because it is so large, we are likely to suffer from a virtual-memory-related fault. This can significantly limit memory-level parallelism if the page sizes are too small. By default on the Linux distributions I use, the pages span 4kB (whether on 64-bit ARM or x64). Empirically, that is too small. Thankfully, it is easy to reconfigure the pages so that they span 2MB or more (“huge pages”). On Apple’s devices, whether it be an iPhone or an iPad Pro, I believe that the pages always span 16kB and that this cannot be easily reconfigured.

Before I continue, let me present the absolute timings (in second) using a single lane (thus no memory-level parallelism). Apple makes two version of its most recent processor, the A12 (in the iPhone) and the A12X (in the iPad Pro).

Intel skylake (4kB pages)0.73 s
Intel skylake (2MB pages)0.61 s
Apple A12 (16kB pages)0.96 s
Apple A12X (16kB pages)0.97 s
Apple A10X (16kB pages)1.15 s

According to these numbers, the Intel server has the upside over the Apple mobile devices. But that’s only part of the story. What happens as you increase the number of lanes (while keeping the code single threaded) is interesting. As you increase the number of lanes, Apple processors start to beat the Intel Skylake in absolute, raw speed.

Another way to look at the problem is to measure the “speedup” due to the memory-level parallelism: we divide the time it takes to traverse the array using 1 lane by the time it takes to do so using X lane. We see that the Intel Skylake processor is limited to about a 10x or 11x speedup whereas the Apple processors go much higher.

Thoughts:

  1. I’d be very interested in knowing how Qualcomm and Samsung processors compare.
  2. It goes without saying that my server-class Skylake machine uses a lot more power than the iPhone.
  3. If I could increase the page size on iOS, we would get even better numbers for the Apple devices.
  4. The fact that the A12 has higher timings when using a single lane suggests that its memory subsystem has higher latency than a Skylake-based PC. Why is that? Could Apple just crank up the frequency of the DRAM memory and beat Intel throughout?
  5. Why is Intel limited to 10x memory-level parallelism? Why can’t they do what Apple does?

Credit: I owe much of the design of the experiment and C++ code to Travis Downs, with help from Nathan Kurz. The initial mobile app for Apple devices was provided by Benoît Maison, you can find it on GitHub along with the raw results and a “console” version that runs under macOS and Linux. I owe the A12X numbers to Stuart Carnie and the A12 numbers to Victor Stewart.

Further reading: Memory Latency Components

Science and Technology links (November 10th, 2018)

  1. It already takes more energy to operate Bitcoin than to mine actual gold. Cryptocurrencies are responsible for millions of tons of CO2 emissions. (Source: Nature)
  2. Half of countries have fertility rates below the replacement level, so if nothing happens the populations will decline in those countries” (source:BBC)
  3. According to Dickenson et al., 8.6% of us (7.0% of women and 10.3% of men) have difficulty controlling sexual urges and behaviors.
  4. A frequently prescribed drug family (statins) can increase your risk of suffering from ALS by a factor of 10 or 100.
  5. Countries were people are expected to live longest in 2040 are Spain, Japan, Singapore, Switzerland, Portugual, Italy, Israel, France, Luxembourgh, Australia. Not included in this list is the USA.
  6. Smart mirrors could monitor your mood, fitness, anxiety levels, heart rate, skin condition, and so forth.
  7. When you are trying to determine whether a drug is effective, it is tempting to look at published papers and see whether they all agree on the efficacity of the drug. This may be quite wrong: Turner et al. show a strong bias whereas negative results are never published.

    Studies viewed by the FDA as having negative or questionable results were, with 3 exceptions, either not published (22 studies) or published in a way that, in our opinion, conveyed a positive outcome (11 studies). According to the published literature, it appeared that 94% of the trials conducted were positive. By contrast, the FDA analysis showed that 51% were positive. Separate meta-analyses of the FDA and journal data sets showed that the increase in effect size ranged from 11 to 69% for individual drugs and was 32% overall.

    Simply put, it is far easier and profitable to publish positive results so that’s what you get.

    This means that, by default, you should always downgrade the optimism of the litterature.

    Simply put: don’t be too quick to believe what you read, even if it is comes in the form of a large set of peer-reviewed research papers.

  8. Richard Jones writes “Motivations for some of the most significant innovations weren’t economic“.
  9. Cable and satellite TV is going away.
  10. “What if what students really want is not to be learners, but alumni?” People will prefer an academically useless program from Harvard to a complete graduate program from a lowly school because they badly want to say that they went to Harvard.
  11. Drinking coffee abundantly protects from neurodegenerative diseases.

Measuring the memory-level parallelism of a system using a small C++ program?

Our processors can issue several memory requests at the same time. In a multicore processor, each core has an upper limit on the number of outstanding memory requests, which is reported to be 10 on recent Intel processors. In this sense, we would like to say that the level of memory-level parallelism of an Intel processor is 10.

To my knowledge, there is no portable tool to measure memory-level parallelism so I took fifteen minutes to throw together a C++ program. The idea is simple: we visit N random locations in a big array. We make sure that the processor cannot tell which location we will visit next before the previous location has been visited. There is a data dependency between memory accesses. We can break this memory dependency by dividing up the task between different “lanes”. Each lane is independent (a bit like a thread). The total number of data accesses is fixed. Up to some point, having more lane should speed things up due to memory-level parallelism. I used the term “lane” so that there is no confusion with “threads” and multicore processing: my code is entirely single-threaded.

  size_t howmanyhits_perlane 
         = howmanyhits / howmanylanes;
  for (size_t counter = 0; 
      counter < howmanyhits_perlane; counter++) {
    for (size_t i = 0; i < howmanylanes; i++) {
      size_t laneindexes = hash(lanesums[i] + i);
      lanesums[i] += bigarray[laneindexes];
    }
  }

Methodologically, I increase the number of lanes until adding one more benefits the overall speed by less than 5%. Why 5%? No particular reason: I needed a threshold of some kind. I suspect that I slightly underestimate the maximal amount of memory-level parallelism: it would take a finer analysis to make a more precise measure.

I run the test three times and check that it gives three times the same integer value. Here are my (preliminary) results:

Intel Haswell7
Intel Skylake9
ARM Cortex A575

My code is available.

On a multicore systems, there is more memory-level parallelism, so a multithreaded version of this test could deliver higher numbers.

Credit: The general idea was inspired by an email from Travis Downs, though I take all of the blame for how crude the implementation is.

Science and Technology links (November 3rd, 2018)

  1. Bitcoin, the cryptocurrency, could greatly accelerate climate change, should it succeed beyond its current speculative state.
  2. Crows can solve novel problems very quickly with tools they have never seen before.
  3. The new video game Red Dead Redemption 2 made $725 million in three days.
  4. Tesla, the electric car company, is outselling Mercedes Benz and BMW while making a profit.
  5. Three paralyzed men are able to walk again thanks to spinal implants (source: New York Times). There are nice pictures.
  6. Human beings live longer today than ever. In the developed world, between 1960 and 2010, life expectancy at birth went up by nearly 20 years. It consistently goes up by about 0.12 years per year. However, it is not yet clear how aging and death have evolved over time. Some believe that there is a “compression” effect: more and more of us reach a maximum, and then we suddenly all die at around the same age. This would be consistent with a hard limit on human lifespan and I think it is the scenario most biologists would expect. There is also the opposite model: while most of us die at around the same age, some lucky ones survive much longer. According to Zuo et al. (PNAS) both models are incorrect statistically. Instead, the curve is advancing as a wave front. This means that as far as death is concerned, being 68 today is much like being 65 a generation ago. This is surprising.

    (…) we find no support for an approaching limit to human lifespan. Nor do our results suggest that endowments, biological or other, are a principal determinant of old-age survival.

    Assuming that Zuo et al. are correct, I do not think we have a biological model at the ready to explain this statistical phenomenon.

  7. Suppose that you gave a cocktail of drugs approved for human consumption to worms. By how much do you think you could extend their lifespan? The answer is at least by a factor of two. They tried their best cocktails with fruit flies and showed benefits there as well. It is much harder to manipulate the lifespan of large mammals like human beings, but these results support the theory that drug cocktails could increase human lifespans. They may already being doing so.
  8. Amazon is hiring fewer workers, maybe because it is getting better at automation. (speculative) It seems that Amazon is mostly denying the story, hinting that they are still creating more and more jobs.
  9. No primate except for human beings, undergoes menopause. Very few animals have menopause: primarily some whales and human beings. I don’t think we know why menopause evolved.
  10. Total direct greenhouse gas emissions from U.S. livestock have declined 11.3 percent since 1961, while production of livestock meat has more than doubled.
  11. Male and female animals respond very differently to anti-aging strategies and they age very differently:

    One particularly odd thing in humans is that though women live longer, they are nonetheless more prone to miserable but non-deadly ailments such as arthritis (…) Lethal illnesses such as heart disease and cancer strike men more often. Although Alzheimer’s strikes women more than men, for unknown reasons.

    We do not know why there is such a sharp difference between males and females regarding health and longevity. However, some believe that the current historical fact that women live many years more than men is due to the fact that antibiotics disproportionally helped the health of women.

  12. Vegans more frequently suffer from bone fractures.
  13. Teaching by presenting worked examples seems to be most efficient. Students get the best grades with the least work.This appears self-evident to me. It is curious why worked examples are not more prevalent in teaching.
  14. A company called Grifols claims to have a drug that can measurably slow down the progression of Alzheimer’s. For context, we currently have no therapy to slow or reverse Alzheimer’s, so even a small positive effect would be a tremendous breakthrough. However, there has been many, many false news regarding Alzheimer’s and this report appears quite preliminary.

Science and Technology links (October 28th, 2018)

  1. If you take kids born in the 1980s, who do you think did better, the rich kids or the poor kids? The answer might surprise you:

    The children from the poorest families ended up twice as well-off as their parents when they became adults. The children from the poorest families had the largest absolute gains as well. Children raised in the top quintile did no better or worse than their parents once those children became adults.

  2. Some of our cells become senescent: they are disfunctional and create trouble. We believe that it contributes to age-related diseases. Fisetin is a drug (available a supplement) that kills senescent cells and extends (median and maximal) lifespan in mice. I do not recommend taking fisetin at this time, unless you are a mice.
  3. Vegetarians report lower self-esteem, lower psychological adjustment, less meaning in life, and more negative moods. I have no idea what to make of this, apparently robust, finding. I was a vegetarian in my 20s and I was also subject to depression. I would never think that I was depressive because I ate no meat.
  4. The sea rises at a rate of 3 mm per year. It has been rising for thousands of years. Taking into account the acceleration that we anticipate due to climate change, we can expect the sea to have risen by 65 cm in 2100. Does that mean that islands will go under? Maybe not: in a study, only 14% of islands exhibited a reduction in area whereas 43% increased in size.
  5. Most processors today, outside the tiny embedded ones, use a 64-bit architecture, which means that they can process data in chunks on 64 bits very quickly. This has all sorts of benefits. A 32-bit processor, for example, has trouble counting to 5 billion. It is difficult, if not impossible, for a 32-bit software application to use more than 4GB of memory. Microsoft still publishes Windows in two editions, the 32-bit edition and the 64-bit edition. The purpose of the 32-bit edition is to support legacy applications. The two major graphics card makers (AMD and NVIDIA) have now stopped producing drivers for 32-bit operating systems. Thus, at least as far as gaming is concerned, 32-bit Windows is dying. Microsoft has promoted a 64-bit Windows by default on new computers since at least 2009.
  6. It seems that 70% of the American soldiers are “overweight”. I find it hard to believe that 60% of all American marines are overweight. Because this was determined using the body-mass-index approach, it is also possible that American soldiers are simply very muscular. Yet another statistics tells us that nearly 40% of all soldiers have a chronic medical condition and 8.6% take sleeping pills. So maybe American soldiers are not as fit as I would expect.
  7. It is often believed that men who have more testosterone have an easier time building muscle mass. It turns out that this is false, the amount of testosterone is not relevant in healthy young men.
  8. In the USA, health care costs are predicted to continue to grow at a rate of over 4%. The economy as a whole is predicted to grow at a rate between 1.4% and 2% a year on the long term. The net result is a gap of about 2% a year. If sustained over many decades, this gap would lead to the bulk of the American economy invested in health spending. People who are 65-year old or older account for a third of all health spending while young female (19 to 44) spend twice as much as their male counterparts.
  9. Cheese and yogurt are correlated with fewer cardiovascular diseases.
  10. The Haruhi Problem seeks the smallest string containing all permutations of a set of n elements. The first known solution to this problem was published anynomously on an anime posting board. A formal analysis is being written up.
  11. Cardiorespiratory fitness is associated with longevity:

    In this cohort study of 122 007 consecutive patients undergoing exercise treadmill testing, cardiorespiratory fitness was inversely associated with all-cause mortality without an observed upper limit of benefit. Extreme cardiorespiratory fitness (≥2 SDs above the mean for age and sex) was associated with the lowest risk-adjusted all-cause mortality compared with all other performance groups.

Is WebAssembly faster than JavaScript?

Most programs running on web sites are written in JavaScript. There are still a few Java applets and other plugins hanging around, but they are considered obsolete at this point.

While JavaScript is superbly fast, some people feel that we ought to do better. That’s where WebAssembly comes in. It is a binary (“pre-compiled”) format that is made to load quickly. It still needs to get compiled or interpreted, but, at least, you do not need to parse JavaScript source code.

The general idea is that you write your code in C, C++ or Rust, then you compile it to WebAssembly. In this manner, you can port existing C or C++ programs so that they run on Web pages. That’s obviously useful if you already have the C and C++ code, but less appealing if you are starting a new project from scratch. It is far easier to find JavaScript front-end developers in almost any industry, except maybe gaming.

I think it is almost surely going to be more labor intensive to program web applications using WebAssembly.

In any case, I like speed so I was interested so I asked a student of mine (M. Fall) to work on the problem. We picked small problems with hand-crafted code in C and JavaScript.

Here are the preliminary conclusions:

  1. In all cases we considered, the total WebAssembly files were larger than the corresponding JavaScript source code, even without taking into account that the JavaScript source code can be served in compressed form. This means that if you are on a slow network connection, JavaScript programs will start faster.

    The story may change if you build large projects. Moreover, we compared against human-written JavaScript, and not automatically generated JavaScript.

  2. Once the WebAssembly files are in the cache of the browser, they load faster than the corresponding JavaScript source code, but the difference is small. Thus if you are frequently using the same application, or if the web application resides on your machine, WebAssembly will start faster. However, the gain is small. One reason why the gain is small is that JavaScript loads and starts very quickly.
  3. WebAssembly (compiled with full optimization) is often slower than JavaScript during execution, and when WebAssembly is faster, the gain is small. Browser support is also problematic: while Firefox and Chrome have relatively fast WebAssembly execution (with Firefox being better), we found Microsoft Edge to be quite terrible. WebAssembly on Edge is really slow.

    Our preliminary results contradict several reports, so you should take them with a grain of salt. However, benchmarking is ridiculously hard especially when a language like JavaScript is involved. Thus anyone reporting systematically better results with WebAssembly should look into how well optimized the JavaScript really is.

While WebAssembly might be a compelling platform if you have a C++ game you need to port to the Web, I would bet good money that WebAssembly is not about to replace JavaScript for most tasks. Simply put, JavaScript is fast and convenient. It is going to be quite difficult to do better in the short run.

It is still deserving of attention since the uptake on WebAssembly has been fantastic. For online games, it has surely a bright future.

More content: WebAssembly and the Death of JavaScript (video) by Colin Eberhardt

Further reading: Egorov’s Maybe you don’t need Rust and WASM to speed up your JS; Haas et al., Bringing the Web up to Speed with WebAssembly; Herrera et al., WebAssembly and JavaScript Challenge: Numerical program performance using modern browser technologies and devices.

Science and Technology links (October 20th, 2018)

  1. Should we stop eating meat to combat climate change? Maybe not. White and Hall worked out what happened if the US stopped using farm animals:

    The modeled system without animals (…) only reduced total US greenhouse gas emissions by 2.6 percentage units. Compared with systems with animals, diets formulated for the US population in the plants-only systems (…) resulted in a greater number of deficiencies in essential nutrients. (source: PNAS)

    Of concern when considering farm animals are methane emissions. Methane is a potent greenhouse gas, with the caveat that it is short-lived in the atmosphere unlike CO2. Should we be worried about methane despite its short life? According to the American EPA (Environmental Protection Agency), total methane emissions have been falling consistently for the last 20 years. That should not surprise us: greenhouse gas emissions in most developed countries (including the US) have peaked some time ago. Not emissions per capita, but total emissions.

    So beef, at least in the US, is not a major contributor to climate change. But we could do even better. Several studies like Stanley et al. report that well managed grazing can lead to carbon sequestration in the grassland. Farming in general could be more environmentally effective.

    Of course, if people consume less they will have a smaller environmental footprint, but going vegan does not imply that one consumes less. If you save in meat but reinvest in exotic fruits and trips to foreign locations, you could keep your environmental footprint the same.

    There are certainly countries were animal grazing is an environmental disaster. Many industries throughout the world are a disaster and we should definitively put pressure on the guilty parties. But, in case you were wondering, if you live in a country like Canada, McDonald’s is not only serving only locally-produced beef, but they also require that it be produced in a sustainable manner.

    In any case, there are good reasons to stop eating meat, but in the developed countries like the US and Canada, climate change seems like a bogus one.

    There also good reasons to keep farm animals. For example, it is difficult to raise an infant without cow milk and in most countries, it is illegal to sell human milk. Several parents have effectively killed their children by trying to raise them vegan (1, 2). It is relatively easy to match protein and calories with a vegan diet, but meat and milk are nutrient-dense food: it requires some expertise to do away with them.

    Further reading: No, giving up burgers won’t actually save the planet (New York Post).

    (Special thanks to professor Leroy for providing many useful pointers.)

  2. News agencies reported this week that climate change could bring back the plague and the black death that wiped out Europe. The widely reported prediction was made by Professor Peter Frankopan while at the Cheltenham Literary Festival. Frankopan is a history professor at Oxford.
  3. There is a reverse correlation between funding and scientific output, meaning that beyond a certain point, you start getting less science for your dollars.

    (…) prestigious institutions had on average 65% higher grant application success rates and 50% larger award sizes, whereas less-prestigious institutions produced 65% more publications and had a 35% higher citation impact per dollar of funding. These findings suggest that implicit biases and social prestige mechanisms (…) have a powerful impact on where (…) grant dollars go and the net return on taxpayers investments.

    It is well documented that there is diminishing returns in research funding. Concentrating your research dollars into too few individuals is wasteful. My own explanation for this phenomenon is that, Elon Musk aside, we have all have cognitive bottlenecks. One researcher might carry fruitfully two, three major projects at the same time, but once they supervise too many students and assistants, they become a “negative manager”, meaning that make other researchers no more productive and often less productive. They spend less and less time optimizing the tools and instruments.

    If you talk with graduate students who work in lavishly funded laboratories, you will often hear (when the door is closed) about how poorly managed the projects are. People are forced into stupid directions, they do boring and useless work to satisfy project objectives that no longer make sense. Currently, “success” is often defined by how quickly you can acquire and spend money.

    But how do you optimally distribute research dollars? It is tricky because, almost by definition, almost all research is worthless. You are mining for rare events. So it is akin to venture capital investing. You want to invest into many start ups that have a high potential.

  4. A Nature columns tries to define what makes a good PhD student:

    the key attributes needed to produce a worthy PhD thesis are a readiness to accept failure; resilience; persistence; the ability to troubleshoot; dedication; independence; and a willingness to commit to very hard work — together with curiosity and a passion for research. The two most common causes of hardship in PhD students are an inability to accept failure and choosing this career path for the prestige, rather than out of any real interest in research.

Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)

When receiving bytes from the network, we often assume that they are unicode strings, encoded using something called UTF-8. Sadly, not all streams of bytes are valid UTF-8. So we need to check the strings. It is probably a good idea to optimize this problem as much as possible.

In earlier work, we showed that you could validate a string using a little as 0.7 cycles per byte, using commonly available 128-bit SIMD registers (in C). SIMD stands for Single-Instruction-Multiple-Data, it is a way to parallelize the processing on a single core.

What if we use 256-bit registers instead?

Reference naive function10 cycles per byte
fast SIMD version (128-bit)0.7 cycles per byte
new SIMD version (256-bit)0.45 cycles per byte

That’s good, almost twice as fast.

A common problem is that you receive as inputs ASCII characters. That’s a common scenario. It is much faster to check that a string in made of ASCII characters than to check that it is made of valid UTF-8 characters. Indeed, to check that it is made of ASCII characters, you only have to check that one bit per byte is zero (since ASCII uses only 7 bits per byte).

It turns out that only about 0.05 cycles are needed to check that a string is made of ASCII characters. Maybe up to 0.08 cycles. That makes us look bad.

You could start checking the file for ASCII characters and then switch to our function when non-ASCII characters are found, but this has a problem: what if the string starts with a non-ASCII character followed by a long stream of ASCII characters?

A quick solution is to add an ASCII path. Each time we read a block of 32 bytes, we check whether it is made of 32 ASCII characters, and if so, we take a different (fast) path. Thus if it happens frequently that we have long streams of ASCII characters, we will be quite fast.

The new numbers are quite appealing when running benchmarks on ASCII characters:

new SIMD version (256-bit)0.45 cycles per byte
new SIMD version (256-bit), w. ASCII path0.088 cycles per byte
ASCII check (SIMD + 256-bit)0.051 cycles per byte

My code is available.