Is WebAssembly faster than JavaScript?

Most programs running on web sites are written in JavaScript. There are still a few Java applets and other plugins hanging around, but they are considered obsolete at this point.

While JavaScript is superbly fast, some people feel that we ought to do better. That’s where WebAssembly comes in. It is a binary (“pre-compiled”) format that is made to load quickly. It still needs to get compiled or interpreted, but, at least, you do not need to parse JavaScript source code.

The general idea is that you write your code in C, C++ or Rust, then you compile it to WebAssembly. In this manner, you can port existing C or C++ programs so that they run on Web pages. That’s obviously useful if you already have the C and C++ code, but less appealing if you are starting a new project from scratch. It is far easier to find JavaScript front-end developers in almost any industry, except maybe gaming.

I think it is almost surely going to be more labor intensive to program web applications using WebAssembly.

In any case, I like speed so I was interested so I asked a student of mine (M. Fall) to work on the problem. We picked small problems with hand-crafted code in C and JavaScript.

Here are the preliminary conclusions:

  1. In all cases we considered, the total WebAssembly files were larger than the corresponding JavaScript source code, even without taking into account that the JavaScript source code can be served in compressed form. This means that if you are on a slow network connection, JavaScript programs will start faster.

    The story may change if you build large projects. Moreover, we compared against human-written JavaScript, and not automatically generated JavaScript.

  2. Once the WebAssembly files are in the cache of the browser, they load faster than the corresponding JavaScript source code, but the difference is small. Thus if you are frequently using the same application, or if the web application resides on your machine, WebAssembly will start faster. However, the gain is small. One reason why the gain is small is that JavaScript loads and starts very quickly.
  3. WebAssembly (compiled with full optimization) is slower than JavaScript during execution. Browser support is also problematic: while Firefox and Chrome have relatively fast WebAssembly execution (with Firefox being better), we found Microsoft Edge to be quite terrible. WebAssembly on Edge is really slow.

    Our preliminary results contradict several reports, so you should take them with a grain of salt. However, benchmarking is ridiculously hard especially when a language like JavaScript is involved. Thus anyone reporting systematically better results with WebAssembly should look into how well optimized the JavaScript really is.

While WebAssembly might be a compelling platform if you have a C++ game you need to port to the Web, I would bet good money that WebAssembly is not about to replace JavaScript for most tasks. Simply put, JavaScript is fast and convenient. It is going to be quite difficult to do better in the short run.

It is still deserving of attention since the uptake on WebAssembly has been fantastic. For online games, it has surely a bright future.

More content: WebAssembly and the Death of JavaScript (video) by Colin Eberhardt

Further reading: Haas et al., Bringing the Web up to Speed with WebAssembly; Herrera et al., WebAssembly and JavaScript Challenge: Numerical program performance using modern browser technologies and devices.

Science and Technology links (October 20th, 2018)

  1. Should we stop eating meat to combat climate change? Maybe not. White and Hall worked out what happened if the US stopped using farm animals:

    The modeled system without animals (…) only reduced total US greenhouse gas emissions by 2.6 percentage units. Compared with systems with animals, diets formulated for the US population in the plants-only systems (…) resulted in a greater number of deficiencies in essential nutrients. (source: PNAS)

    Of concern when considering farm animals are methane emissions. Methane is a potent greenhouse gas, with the caveat that it is short-lived in the atmosphere unlike CO2. Should we be worried about methane despite its short life? According to the American EPA (Environmental Protection Agency), total methane emissions have been falling consistently for the last 20 years. That should not surprise us: greenhouse gas emissions in most developed countries (including the US) have peaked some time ago. Not emissions per capita, but total emissions.

    So beef, at least in the US, is not a major contributor to climate change. But we could do even better. Several studies like Stanley et al. report that well managed grazing can lead to carbon sequestration in the grassland. Farming in general could be more environmentally effective.

    Of course, if people consume less they will have a smaller environmental footprint, but going vegan does not imply that one consumes less. If you save in meat but reinvest in exotic fruits and trips to foreign locations, you could keep your environmental footprint the same.

    There are certainly countries were animal grazing is an environmental disaster. Many industries throughout the world are a disaster and we should definitively put pressure on the guilty parties. But, in case you were wondering, if you live in a country like Canada, McDonald’s is not only serving only locally-produced beef, but they also require that it be produced in a sustainable manner.

    In any case, there are good reasons to stop eating meat, but in the developed countries like the US and Canada, climate change seems like a bogus one.

    There also good reasons to keep farm animals. For example, it is difficult to raise an infant without cow milk and in most countries, it is illegal to sell human milk. Several parents have effectively killed their children by trying to raise them vegan (1, 2). It is relatively easy to match protein and calories with a vegan diet, but meat and milk are nutrient-dense food: it requires some expertise to do away with them.

    Further reading: No, giving up burgers won’t actually save the planet (New York Post).

    (Special thanks to professor Leroy for providing many useful pointers.)

  2. News agencies reported this week that climate change could bring back the plague and the black death that wiped out Europe. The widely reported prediction was made by Professor Peter Frankopan while at the Cheltenham Literary Festival. Frankopan is a history professor at Oxford.
  3. There is a reverse correlation between funding and scientific output, meaning that beyond a certain point, you start getting less science for your dollars.

    (…) prestigious institutions had on average 65% higher grant application success rates and 50% larger award sizes, whereas less-prestigious institutions produced 65% more publications and had a 35% higher citation impact per dollar of funding. These findings suggest that implicit biases and social prestige mechanisms (…) have a powerful impact on where (…) grant dollars go and the net return on taxpayers investments.

    It is well documented that there is diminishing returns in research funding. Concentrating your research dollars into too few individuals is wasteful. My own explanation for this phenomenon is that, Elon Musk aside, we have all have cognitive bottlenecks. One researcher might carry fruitfully two, three major projects at the same time, but once they supervise too many students and assistants, they become a “negative manager”, meaning that make other researchers no more productive and often less productive. They spend less and less time optimizing the tools and instruments.

    If you talk with graduate students who work in lavishly funded laboratories, you will often hear (when the door is closed) about how poorly managed the projects are. People are forced into stupid directions, they do boring and useless work to satisfy project objectives that no longer make sense. Currently, “success” is often defined by how quickly you can acquire and spend money.

    But how do you optimally distribute research dollars? It is tricky because, almost by definition, almost all research is worthless. You are mining for rare events. So it is akin to venture capital investing. You want to invest into many start ups that have a high potential.

  4. A Nature columns tries to define what makes a good PhD student:

    the key attributes needed to produce a worthy PhD thesis are a readiness to accept failure; resilience; persistence; the ability to troubleshoot; dedication; independence; and a willingness to commit to very hard work — together with curiosity and a passion for research. The two most common causes of hardship in PhD students are an inability to accept failure and choosing this career path for the prestige, rather than out of any real interest in research.

Validating UTF-8 bytes using only 0.45 cycles per byte (AVX edition)

When receiving bytes from the network, we often assume that they are unicode strings, encoded using something called UTF-8. Sadly, not all streams of bytes are valid UTF-8. So we need to check the strings. It is probably a good idea to optimize this problem as much as possible.

In earlier work, we showed that you could validate a string using a little as 0.7 cycles per byte, using commonly available 128-bit SIMD registers (in C). SIMD stands for Single-Instruction-Multiple-Data, it is a way to parallelize the processing on a single core.

What if we use 256-bit registers instead?

Reference naive function10 cycles per byte
fast SIMD version (128-bit)0.7 cycles per byte
new SIMD version (256-bit)0.45 cycles per byte

That’s good, almost twice as fast.

A common problem is that you receive as inputs ASCII characters. That’s a common scenario. It is much faster to check that a string in made of ASCII characters than to check that it is made of valid UTF-8 characters. Indeed, to check that it is made of ASCII characters, you only have to check that one bit per byte is zero (since ASCII uses only 7 bits per byte).

It turns out that only about 0.05 cycles are needed to check that a string is made of ASCII characters. Maybe up to 0.08 cycles. That makes us look bad.

You could start checking the file for ASCII characters and then switch to our function when non-ASCII characters are found, but this has a problem: what if the string starts with a non-ASCII character followed by a long stream of ASCII characters?

A quick solution is to add an ASCII path. Each time we read a block of 32 bytes, we check whether it is made of 32 ASCII characters, and if so, we take a different (fast) path. Thus if it happens frequently that we have long streams of ASCII characters, we will be quite fast.

The new numbers are quite appealing when running benchmarks on ASCII characters:

new SIMD version (256-bit)0.45 cycles per byte
new SIMD version (256-bit), w. ASCII path0.088 cycles per byte
ASCII check (SIMD + 256-bit)0.051 cycles per byte

My code is available.

Validating UTF-8 bytes (Java edition)

Strings are just made of bytes. We send and receive bytes over the network all the time. If you know that the bytes you are receiving form a string, then chances are good that it is encoded as UTF-8. Sadly not all streams of bytes can be valid UTF-8 strings. Thus you should check that your bytes can be safely parsed as strings.

In earlier work, I showed that you needed as little as 0.7 cycles per byte to do just that validation. That was in C using fancy instructions.

What about Java?

Designing a good benchmark is difficult. I keep things simple. I generate 1002-byte UTF-8 string made of random (non-ASCII) characters. Then I try to check how quickly different functions validate it.

Here are the contenders:

  1. You can use the standard Java API to entirely decode the bytes, and report false when there is an error:

    CharsetDecoder decoder = 
      StandardCharsets.UTF_8.newDecoder();
    try {
      decoder.decode(
           ByteBuffer.wrap(mydata));		           
    } catch (CharacterCodingException ex) {		        
          return false;
    } 
    return true;
    
  2. You can try a branchless finite-state-machine approach:

    boolean isUTF8(byte[] b) {
        int length = b.length;
        int s = 0;
        for (int i = 0; i < length; i++) {
          s = utf8d[256 + s 
               + (utf8d[b[i] & 0xFF] & 0xFF) ];
        }
        return s == 0;
    }
    

    … where utf8d is some array where the finite-state machine is coded.

  3. Finally, you can use the isWellFormed from the Guava library. It simply tries to find the first non-ASCII character and then it engages into what is a straight-forward series of if/then/else.
  4. Here are the timings in nanoseconds per 1002-byte strings. I estimate that my processor runs at about 3.4 GHz on average during the test (verified with perf).

    Java API2000 ns6.7 cycles per byte
    branchless2200 ns7.5 cycles per byte
    Guava’s if/then/else750 ns2.6 cycles per byte

    My code is available.

    The most obvious limitation in my benchmark is that Guava’s if/then/else approach is sensitive to branch mispredictions while my benchmark might not be rich enough to trigger difficult-to-predict branches.

Nobel-prize winner Romer on innovation and higher education

Romer was one of the the winners of the Nobel prize in economics this year (2018). He wrote about higher education and innovation. One of his proposals is the introduction of more generous scholarships for students in science and engineering.

His starting point is that to innovate and get richer, we need more and more people doing R&D. Investing more in R&D is not sufficient. You might think that, in the long run, it could be. If firms spend more on R&D, then the wages of scientists and engineers will go up, and this will attract more scientists. However, students lack this information.

One the one hand, giving money to undergraduate students who want to pursue science and engineering is a good way to signal to these students that society wants more scientists and engineers. At the graduate level, though the current funding might be considered adequate, it is very much in the end of (older) professors who can be quite directive. The general intuition, I believe, is that Romer wants to give back to young and bright students some freedom and power in deciding what they want to do. In particular, I think Romer believes that some of these students might be interested in doing work that is less “conventional” (i.e., maybe harder to publish in conventional venues) but more “useful”. In effect, he wants to fight against stagnation in higher education.

In his own words:

Unfortunately, in the last 20 years, innovation policy in the United States has almost entirely ignored the structure of our institutions of higher education. As a result, government programs that were intended to speed up the rate of technological progress may in fact have had little positive effect.

This pattern of outcomes, increased numbers of Ph.D. recipients and steadily worsening academic job prospects, can be explained by increased subsidies for Ph.D. training.

The picture that emerges from this evidence is one dominated by undergraduate institutions that are a critical bottleneck in the training of scientists and engineers, and by graduate schools that produce people trained only for employment in academic institutions (…)

I am not exactly sure why all student unions are not pushing his ideas to governments. It sure sounds attractive: give engineering students more money and get better growth.

I love universities. I have spent almost all my life in them. But, like Romer, I fear that there is a bit too much stagnation. There is insufficient pressure to innovate and too many incentives favouring excess conservatism. Giving back power, not to younger professors, but to actual students, is probably a wise move.

Yet I am somewhat skeptical of Romer’s overall view that increasing the supply of engineers is key. Great many students trained in engineering do nothing even vaguely resembling R&D, and when they do, they do not do it for long. Simply put, graduates go where jobs and money is. It is not the case that we live in a Spiderman world where smart engineering students graduate to go work in an industrial lab developing new robotics.

We need to make the spiderman world possible: make it so that young engineers can start a small company building prototypical exoskeletons for the weaker people to walk again. Do that, and you won’t have a shortage of young new engineers for long.

Science and Technology links (October 13th, 2018)

  1. Chronic exposure to canola oil results in a significant increase in body weight and impairments in working memory… in mice. (Source: Nature)
  2. Belly fat is not just fat in reserve, it signals your body through hormones. Some of these signals might be helpful, but some are not. Visceral fat is strongly associated with prostate cancer and breast cancer. Losing fat in your belly is a good way to reduce your cancer risks.
  3. Computerized Tomography (CT) scans are used to diagnose blot clots, tumours, fractures. Algorithms based on “deep learning” can accurately identify CT scan abnormalities requiring urgent attention. (Source: Lancet)
  4. Does innovation bring prosperity or does prosperity bring innovation? You would think that it is the former, but it seems that there is a strong case for the latter. Indeed, if you come from a prosperous society, you will be more “future-oriented” as per the life history theory. That is, if you are rich, you are more likely to be optimistic and to invest in the future, thus to contribute in new inventions.This seems a tad simplistic to me. While I don’t deny that the life history theory might be valid, I think facing difficulties with the right frame of mind is what is most likely to push you to innovate.
  5. Open source is the idea that we can broadly collaborate on building new technology as if it were a public good. It has been repeatedly at odds with other conventional industrial models where technology is assumed to be a private good. For example, Microsoft has long openly fought open source as if it were a threat to its business model. Bill Gates, one of Microsoft’s founders, famously compared people who share software freely to thieves.To this day, government agencies have trouble funding open source software while many research grants go toward funding private software projects.

    In the last 20 years, most corporations have changed their stance on open source and come to realize that it is not a threat to corporate profits. I assume governments are not far behind.

    Patents are another problem for open source: patents are granted by the state to keep an invention from being part of a public good. Thus patents (and thus governments) can hinder open source. The proper fix would be to change laws so that patents cannot harm open source projects… but that is difficult to achieve politically.

    Patents can be used is several different ways but corporations like Microsoft mostly use patents in a defensive manner: when another entity seeks to sue Microsoft for patent infringement, Microsoft can threaten to sue back.

    Open source projects can be outgunned as part of such a fight. To bring some balance back several companies have joined forces to create the Open Invention Network which creates a large defensive pool of patents. Should a company try to shut down an open source project on the basis of a patent, the Open Invention Network can intervene.

    This week Microsoft joined the Open Invention Network. Other members include Google, IBM, Sony, Toyota.

Smart bracelet: my experience with the Mi Band 3

I was an early adopter of fitness bracelets with heart-rate monitoring. I use them to track my heart rate at rest. You heart rate should be around 65 if you are fit. If it goes much higher than 65, you might be getting unfit or you might have a medical condition. I also like to track the quality of my sleep.

I don’t really need this data right now in the sense that I am in good health. However, it is hard to keep good habits on the long term. I can easily monitor my weight by looking in the mirror, but it is easier to let my physical condition slide. I have ever saw bad numbers, it might nudge me into correcting my habits.

I have never tried the Apple Watch, which I hear is great. However, I am looking for something that I do not need to recharge every day. Also, I prefer to pay less if I can.

I have owned a couple of Fitbit watches. Fitbit bracelets and watches have been a major disappointment. They are expensive, the software is often buggy, the hardware tends to crash periodically, it is not durable, and so forth. Though on paper the Fitbit watches have many features like notifications and so on, they are finicky to setup and often fail mysterious. Fitbit has good social networking features and it makes good looking products, but that is all.

When my last Fitbit watch broke apart, I decided that I would never buy a Fitbit product anymore. Too expensive, too buggy.

I should say that one of my sons as well as my wife swear by their Fitbit bracelets. They seem to have good luck with them. The only difference between me and them that I can see is that I sleep with my bracelets whereas they do not.

So I tried my luck with a Chinese product, the Mi Band. It is $35. It is almost an order of magnitude less expensive than equivalent Fitbit products.

I have been wearing it for about a month now. Here are my thoughts. Firstly, the negatives:

  • When I got it, it was in Chinese with a Chinese-only manual. I panicked slightly. Thankfully the mobile app was available in pretty much all languages. I setup the app and had trouble initially pairing my phone with the bracelet (which remained dark), until I realized that I needed to recharge the bracelet. Then it worked but the bracelet kept its display in Chinese. I was annoyed. I tried many things, until, mysteriously, the mobile app downloaded an update and after syncing with the bracelet, everything has been in English.
  • The Mi Band does not look as nice as Fitbit products. It is slightly ugly. However, I should say that I don’t find the Apple Watch particularly pretty.

    Note that this is my personal taste. Several people have commented that my Mi Band looks good.

  • It is entirely possible that the Communist government of China can access some of my data.

Secondly the positives:

  • The mobile app has been rock solid. It looks good. It seems less buggy than the Fitbit app which I often needed to restart.
  • Notifications actually work! That is, if someone calls me or if I have a meeting, my bracelet will vibrate. In theory, Fitbit watches do that as well, but I never could get it to work reliably. It really improves my quality of life!
  • The bracelet works, it is responsive. I find it easy to use and configure.
  • Even though the bracelet continuously record my heart rate, the battery life is excellent.
  • Did I mention that it is only 35$? I have the Mi Band 3. If the Mi Band 4 comes out next year, I will be eager to purchase. I would gladly pay 35$ a year for a smart bracelet.
  • For a tiny bit of extra money, you can purchase third-party products to change the look of the Mi Band. I bought a dark steel enclosure for 12$. It is great.
  • I have not checked the accuracy of the device. However the heart-rate and sleep numbers are quite similar to what I was getting with the Fitbit watches.

Science and Technology links (October 6th, 2018)

  1. Ibuprofen extends lifespan in many species including, maybe, human beings.
  2. Donna Strickland is the third women to win a Nobel prize in Physics. It appears that she won the prize for one 1985 article. Her scientific publications are available on Google Scholar. She is maybe one more example of the fact that it only takes one paper to make a remarkable contribution.
  3. People who suffer from dementia often had high blood sugar many years before their sickness.
  4. U.S. median household income is setting new records in both nominal and real terms.
  5. The Nobel prize in medicine was awarded to researchers who developed cancer immunotherapy. I am pleased that we are rewarding scientists who helped cure actual people.

Quickly parsing eight digits

In my previous post, I described how we can quickly determine whether eight characters are made of digits (e.g., ‘13223244’). It takes a bit over 2 cycles. Suppose that you want to turn this string into an integer value.

Most programmers would implement it as a simple loop:

uint32_t parse_eight_digits(const unsigned char *chars) {
  uint32_t x = chars[0] - '0';
  for (size_t j = 1; j < 8; j++)
    x = x * 10 + (chars[j] - '0');
  return x;
}

Can you do better?

If you are willing to use SIMD instructions through Intel intrinsics, then Muła has worked out as nice function that looks like this:

uint32_t parse_eight_digits_ssse3(const char *chars) {
  __m128i ascii0 = _mm_set1_epi8('0');
  __m128i mul_1_10 =
      _mm_setr_epi8(10, 1, 10, 1, 10, 1, 10, 1,
                        10, 1, 10, 1, 10, 1, 10, 1);
  __m128i mul_1_100 = _mm_setr_epi16(100, 1, 100, 1,
                       100, 1, 100, 1);
  const __m128i mul_1_10000 =
      _mm_setr_epi16(10000, 1, 10000, 1, 10000, 1, 10000, 1);
  __m128i input = _mm_sub_epi8(
               _mm_loadl_epi64((__m128i *)chars), ascii0);
  __m128i t1 = _mm_maddubs_epi16(input, mul_1_10);
  __m128i t2 = _mm_madd_epi16(t1, mul_1_100);
  __m128i t3 = _mm_packus_epi32(t2, t2);
  __m128i t4 = _mm_madd_epi16(t3, mul_1_10000);
  return _mm_cvtsi128_si32(t4); 
}

I am not going to go through what this function does. The important point to understand about Muła’s function is that it actually converts sixteen digits to integers. In other words, using Muła’s function to parse eight digits, I handicap it by a factor of two, forcing it to waste a lot of effort.

Let me go straight at the speed results. I report throughput: how long it takes to parse an integer from a stream on average.

GCC 8.1 (-O3)

conventional3.9 cycles
Muła3.0 cycles

CLANG 6.0 (-O3)

conventional8.4 cycles
Muła3.2 cycles

So despite the handicap that Muła’s function has, it is still considerably faster than the conventional code. If I could parse pairs of eight digits instead of individual sets of eight digits, I could go nearly twice as fast.

I do not benchmark the latency. The latency story could be less positive for the Muła’s function which relies on instructions that have high throughput but also high latency.

Some of you might be thinking that there is surely a way to go faster, or, at least, to get rid of the need for Intel intrinsic functions. I tried to pull the same SIMD-within-a-register (SWAR) trick as in my previous post, but the best I could do was only barely faster than the conventional approach. To get it down to Muła’s speed, I would need shave another two cycles from it. I think it is possible. Here is the best I have so far using the SWAR approach:

uint32_t parse_eight_digits_swar(const char *chars) {
  uint64_t val;
  memcpy(&val, chars, 8);
  val = val - 0x3030303030303030;
  uint64_t byte10plus   = ((val        
      * (1 + (0xa  <<  8))) >>  8)
      & 0x00FF00FF00FF00FF;
  uint64_t short100plus = ((byte10plus 
       * (1 + (0x64 << 16))) >> 16) 
       & 0x0000FFFF0000FFFF;
  short100plus *= (1 + (10000ULL << 32));
  return short100plus >> 32;
}

At a glance, I repeat the sequence (multiplication, shift, mask) three times to aggregate eight digits. To go to 2N digits, I would need N such sequences. That is, I have a logarithmic-time algorithm that is limited by the size of my registers. It is good, but the fact that each step depends on the completion of the previous step is less good: it means that the latency is relatively high.

My code is available.

Credit: My SWAR function is an extension to eight digits of related functions by Muła on shorter sequences of digits. It was improved by Travis Downs who pointed out that we do not need to swap the bytes.

Quickly identifying a sequence of digits in a string of characters

Suppose that you want to quickly determine a sequence of eight characters are made of digits (e.g., ‘9434324134’). How fast can you go?

In software, characters are mapped to integer values called the code points. The ASCII and UTF-8 code points for the digits 0, 1,…, 9 are the consecutive integers 0x30, 0x31, …, 0x39 in hexadecimal notation.

Thus you can check whether a character is a digit by comparing with 0x30 and 0x39: ((c < 0x30) || (c > 0x39)). It is even cheaper than it looks because optimizing compilers simply take the code point, subtract 0x30 from it and compare the result with 9. So there is a single comparison!

Given a stream of characters, the conventional approach in C or C-like language is to loop over the sequence of characters and check that each one is a digit.

bool is_made_of_eight_digits(unsigned char * chars) {
    for(size_t j = 0; j < 8; j++) {
          if((chars[j] < 0x30) || (chars[j] > 0x39)) {
              return false;
          }
    }
    return true;
}

Can we do better? Instead of doing eight comparisons (one per character), we would like to do only one or two. For this, we can use SIMD within a register (SWAR): load the eight characters into a 64-bit integer and do some operations on the result integers.

Here is a simple “branchless” approach first… it does a single comparison:

bool is_made_of_eight_digits_branchless(const unsigned char  * chars) {
    uint64_t val;
    memcpy(&val, chars, 8);
    return ( val & (val + 0x0606060606060606) &0xF0F0F0F0F0F0F0F0) 
             == 0x3030303030303030;
}

(Credit: Travis Downs simplified and improved the version I had.)

In some instances, it might be faster to do two comparisons instead of one, especially if the results are predictible.

bool is_made_of_eight_digits_branchy(unsigned char  * chars) {
    uint64_t val;
    memcpy(&val, chars, 8);
    return (( val & 0xF0F0F0F0F0F0F0F0 ) == 0x3030303030303030)
      && (( (val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0 )
      == 0x3030303030303030); 
}

How do they work? One the one hand, they check that the most significant 4 bits of each character is the value 0x3. Once this is done, you know that the characters value must range in 0x30 to 0x3F, but you want to exclude the values from 0x3a to 0x3f. If you add 0x06 to the integers from 0x30 to 0x39 you get the integers 0x36 to 0x3f, but adding 0x06 to 03a gives you 0x40. So you can add 0x06 to each byte and check again that the most significant 4 bits of each character is the value 0x3.

It is crazily hard to benchmark such routines because their performance is highly sensitive to the data inputs. You really want to benchmark them on your actual data. And compilers matter a lot. Still, we can throw some synthetic data at it and see how well they fare (on a skylake processor).

compilerconventionalSWAR 1 comparisonSWAR 2 comparisons
gcc 8 (-O2)11.4 cycles3.1 cycles2.5 cycles
gcc 8 (-O3)5.2 cycles3.1 cycles3 cycles
clang 6 (-O2)5.3 cycles2.4 cycles2.2 cycles
clang 6 (-O3)5.3 cycles2.4 cycles2.1 cycles

The table reports the average time (throughput) to check that a sequence of eight characters is made of digits. In my tests, the branchless approach is not the fastest. Yet it might be a good bet in practice because it is going to run at the same speed, irrespective of the data.

Let us some consider less regular data where the processors cannot easily guess the result of the comparisons:

compilerconventionalSWAR 1 comparisonSWAR 2 comparisons
gcc 8 (-O2)12.1 cycles3.1 cycles5.2 cycles
gcc 8 (-O3)7.2 cycles3.1 cycles5.1 cycles
clang 6 (-O2)7.3 cycles2.4 cycles4.3 cycles
clang 6 (-O3)7.1 cycles2.4 cycles4.5 cycles

My source code is available.

Further reading. After working on this problem a bit, and finding a workable approach, I went on the Internet to check whether someone had done better, and I found an article by my friend Wojciech Muła who has an article on the exact same problem. It is a small world. His approach is similar although he has no equivalent to my single-comparison function.