Validating UTF-8 bytes (Java edition)

Strings are just made of bytes. We send and receive bytes over the network all the time. If you know that the bytes you are receiving form a string, then chances are good that it is encoded as UTF-8. Sadly not all streams of bytes can be valid UTF-8 strings. Thus you should check that your bytes can be safely parsed as strings.

In earlier work, I showed that you needed as little as 0.7 cycles per byte to do just that validation. That was in C using fancy instructions.

What about Java?

Designing a good benchmark is difficult. I keep things simple. I generate 1002-byte UTF-8 string made of random (non-ASCII) characters. Then I try to check how quickly different functions validate it.

Here are the contenders:

  1. You can use the standard Java API to entirely decode the bytes, and report false when there is an error:

    CharsetDecoder decoder = 
      StandardCharsets.UTF_8.newDecoder();
    try {
      decoder.decode(
           ByteBuffer.wrap(mydata));		           
    } catch (CharacterCodingException ex) {		        
          return false;
    } 
    return true;
    
  2. You can try a branchless finite-state-machine approach:

    boolean isUTF8(byte[] b) {
        int length = b.length;
        int s = 0;
        for (int i = 0; i < length; i++) {
          s = utf8d[256 + s 
               + (utf8d[b[i] & 0xFF] & 0xFF) ];
        }
        return s == 0;
    }
    

    … where utf8d is some array where the finite-state machine is coded.

  3. Finally, you can use the isWellFormed from the Guava library. It simply tries to find the first non-ASCII character and then it engages into what is a straight-forward series of if/then/else.
  4. Here are the timings in nanoseconds per 1002-byte strings. I estimate that my processor runs at about 3.4 GHz on average during the test (verified with perf).

    Java API2000 ns6.7 cycles per byte
    branchless2200 ns7.5 cycles per byte
    Guava’s if/then/else750 ns2.6 cycles per byte

    My code is available.

    The most obvious limitation in my benchmark is that Guava’s if/then/else approach is sensitive to branch mispredictions while my benchmark might not be rich enough to trigger difficult-to-predict branches.

Nobel-prize winner Romer on innovation and higher education

Romer was one of the the winners of the Nobel prize in economics this year (2018). He wrote about higher education and innovation. One of his proposals is the introduction of more generous scholarships for students in science and engineering.

His starting point is that to innovate and get richer, we need more and more people doing R&D. Investing more in R&D is not sufficient. You might think that, in the long run, it could be. If firms spend more on R&D, then the wages of scientists and engineers will go up, and this will attract more scientists. However, students lack this information.

One the one hand, giving money to undergraduate students who want to pursue science and engineering is a good way to signal to these students that society wants more scientists and engineers. At the graduate level, though the current funding might be considered adequate, it is very much in the end of (older) professors who can be quite directive. The general intuition, I believe, is that Romer wants to give back to young and bright students some freedom and power in deciding what they want to do. In particular, I think Romer believes that some of these students might be interested in doing work that is less “conventional” (i.e., maybe harder to publish in conventional venues) but more “useful”. In effect, he wants to fight against stagnation in higher education.

In his own words:

Unfortunately, in the last 20 years, innovation policy in the United States has almost entirely ignored the structure of our institutions of higher education. As a result, government programs that were intended to speed up the rate of technological progress may in fact have had little positive effect.

This pattern of outcomes, increased numbers of Ph.D. recipients and steadily worsening academic job prospects, can be explained by increased subsidies for Ph.D. training.

The picture that emerges from this evidence is one dominated by undergraduate institutions that are a critical bottleneck in the training of scientists and engineers, and by graduate schools that produce people trained only for employment in academic institutions (…)

I am not exactly sure why all student unions are not pushing his ideas to governments. It sure sounds attractive: give engineering students more money and get better growth.

I love universities. I have spent almost all my life in them. But, like Romer, I fear that there is a bit too much stagnation. There is insufficient pressure to innovate and too many incentives favouring excess conservatism. Giving back power, not to younger professors, but to actual students, is probably a wise move.

Yet I am somewhat skeptical of Romer’s overall view that increasing the supply of engineers is key. Great many students trained in engineering do nothing even vaguely resembling R&D, and when they do, they do not do it for long. Simply put, graduates go where jobs and money is. It is not the case that we live in a Spiderman world where smart engineering students graduate to go work in an industrial lab developing new robotics.

We need to make the spiderman world possible: make it so that young engineers can start a small company building prototypical exoskeletons for the weaker people to walk again. Do that, and you won’t have a shortage of young new engineers for long.

Science and Technology links (October 13th, 2018)

  1. Chronic exposure to canola oil results in a significant increase in body weight and impairments in working memory… in mice. (Source: Nature)
  2. Belly fat is not just fat in reserve, it signals your body through hormones. Some of these signals might be helpful, but some are not. Visceral fat is strongly associated with prostate cancer and breast cancer. Losing fat in your belly is a good way to reduce your cancer risks.
  3. Computerized Tomography (CT) scans are used to diagnose blot clots, tumours, fractures. Algorithms based on “deep learning” can accurately identify CT scan abnormalities requiring urgent attention. (Source: Lancet)
  4. Does innovation bring prosperity or does prosperity bring innovation? You would think that it is the former, but it seems that there is a strong case for the latter. Indeed, if you come from a prosperous society, you will be more “future-oriented” as per the life history theory. That is, if you are rich, you are more likely to be optimistic and to invest in the future, thus to contribute in new inventions.This seems a tad simplistic to me. While I don’t deny that the life history theory might be valid, I think facing difficulties with the right frame of mind is what is most likely to push you to innovate.
  5. Open source is the idea that we can broadly collaborate on building new technology as if it were a public good. It has been repeatedly at odds with other conventional industrial models where technology is assumed to be a private good. For example, Microsoft has long openly fought open source as if it were a threat to its business model. Bill Gates, one of Microsoft’s founders, famously compared people who share software freely to thieves.To this day, government agencies have trouble funding open source software while many research grants go toward funding private software projects.

    In the last 20 years, most corporations have changed their stance on open source and come to realize that it is not a threat to corporate profits. I assume governments are not far behind.

    Patents are another problem for open source: patents are granted by the state to keep an invention from being part of a public good. Thus patents (and thus governments) can hinder open source. The proper fix would be to change laws so that patents cannot harm open source projects… but that is difficult to achieve politically.

    Patents can be used is several different ways but corporations like Microsoft mostly use patents in a defensive manner: when another entity seeks to sue Microsoft for patent infringement, Microsoft can threaten to sue back.

    Open source projects can be outgunned as part of such a fight. To bring some balance back several companies have joined forces to create the Open Invention Network which creates a large defensive pool of patents. Should a company try to shut down an open source project on the basis of a patent, the Open Invention Network can intervene.

    This week Microsoft joined the Open Invention Network. Other members include Google, IBM, Sony, Toyota.

Smart bracelet: my experience with the Mi Band 3

I was an early adopter of fitness bracelets with heart-rate monitoring. I use them to track my heart rate at rest. You heart rate should be around 65 if you are fit. If it goes much higher than 65, you might be getting unfit or you might have a medical condition. I also like to track the quality of my sleep.

I don’t really need this data right now in the sense that I am in good health. However, it is hard to keep good habits on the long term. I can easily monitor my weight by looking in the mirror, but it is easier to let my physical condition slide. I have ever saw bad numbers, it might nudge me into correcting my habits.

I have never tried the Apple Watch, which I hear is great. However, I am looking for something that I do not need to recharge every day. Also, I prefer to pay less if I can.

I have owned a couple of Fitbit watches. Fitbit bracelets and watches have been a major disappointment. They are expensive, the software is often buggy, the hardware tends to crash periodically, it is not durable, and so forth. Though on paper the Fitbit watches have many features like notifications and so on, they are finicky to setup and often fail mysterious. Fitbit has good social networking features and it makes good looking products, but that is all.

When my last Fitbit watch broke apart, I decided that I would never buy a Fitbit product anymore. Too expensive, too buggy.

I should say that one of my sons as well as my wife swear by their Fitbit bracelets. They seem to have good luck with them. The only difference between me and them that I can see is that I sleep with my bracelets whereas they do not.

So I tried my luck with a Chinese product, the Mi Band. It is $35. It is almost an order of magnitude less expensive than equivalent Fitbit products.

I have been wearing it for about a month now. Here are my thoughts. Firstly, the negatives:

  • When I got it, it was in Chinese with a Chinese-only manual. I panicked slightly. Thankfully the mobile app was available in pretty much all languages. I setup the app and had trouble initially pairing my phone with the bracelet (which remained dark), until I realized that I needed to recharge the bracelet. Then it worked but the bracelet kept its display in Chinese. I was annoyed. I tried many things, until, mysteriously, the mobile app downloaded an update and after syncing with the bracelet, everything has been in English.
  • The Mi Band does not look as nice as Fitbit products. It is slightly ugly. However, I should say that I don’t find the Apple Watch particularly pretty.

    Note that this is my personal taste. Several people have commented that my Mi Band looks good.

  • It is entirely possible that the Communist government of China can access some of my data.

Secondly the positives:

  • The mobile app has been rock solid. It looks good. It seems less buggy than the Fitbit app which I often needed to restart.
  • Notifications actually work! That is, if someone calls me or if I have a meeting, my bracelet will vibrate. In theory, Fitbit watches do that as well, but I never could get it to work reliably. It really improves my quality of life!
  • The bracelet works, it is responsive. I find it easy to use and configure.
  • Even though the bracelet continuously record my heart rate, the battery life is excellent.
  • Did I mention that it is only 35$? I have the Mi Band 3. If the Mi Band 4 comes out next year, I will be eager to purchase. I would gladly pay 35$ a year for a smart bracelet.
  • For a tiny bit of extra money, you can purchase third-party products to change the look of the Mi Band. I bought a dark steel enclosure for 12$. It is great.
  • I have not checked the accuracy of the device. However the heart-rate and sleep numbers are quite similar to what I was getting with the Fitbit watches.

Science and Technology links (October 6th, 2018)

  1. Ibuprofen extends lifespan in many species including, maybe, human beings.
  2. Donna Strickland is the third women to win a Nobel prize in Physics. It appears that she won the prize for one 1985 article. Her scientific publications are available on Google Scholar. She is maybe one more example of the fact that it only takes one paper to make a remarkable contribution.
  3. People who suffer from dementia often had high blood sugar many years before their sickness.
  4. U.S. median household income is setting new records in both nominal and real terms.
  5. The Nobel prize in medicine was awarded to researchers who developed cancer immunotherapy. I am pleased that we are rewarding scientists who helped cure actual people.

Quickly parsing eight digits

In my previous post, I described how we can quickly determine whether eight characters are made of digits (e.g., ‘13223244’). It takes a bit over 2 cycles. Suppose that you want to turn this string into an integer value.

Most programmers would implement it as a simple loop:

uint32_t parse_eight_digits(const unsigned char *chars) {
  uint32_t x = chars[0] - '0';
  for (size_t j = 1; j < 8; j++)
    x = x * 10 + (chars[j] - '0');
  return x;
}

Can you do better?

If you are willing to use SIMD instructions through Intel intrinsics, then Muła has worked out as nice function that looks like this:

uint32_t parse_eight_digits_ssse3(const char *chars) {
  __m128i ascii0 = _mm_set1_epi8('0');
  __m128i mul_1_10 =
      _mm_setr_epi8(10, 1, 10, 1, 10, 1, 10, 1,
                        10, 1, 10, 1, 10, 1, 10, 1);
  __m128i mul_1_100 = _mm_setr_epi16(100, 1, 100, 1,
                       100, 1, 100, 1);
  const __m128i mul_1_10000 =
      _mm_setr_epi16(10000, 1, 10000, 1, 10000, 1, 10000, 1);
  __m128i input = _mm_sub_epi8(
               _mm_loadl_epi64((__m128i *)chars), ascii0);
  __m128i t1 = _mm_maddubs_epi16(input, mul_1_10);
  __m128i t2 = _mm_madd_epi16(t1, mul_1_100);
  __m128i t3 = _mm_packus_epi32(t2, t2);
  __m128i t4 = _mm_madd_epi16(t3, mul_1_10000);
  return _mm_cvtsi128_si32(t4); 
}

I am not going to go through what this function does. The important point to understand about Muła’s function is that it actually converts sixteen digits to integers. In other words, using Muła’s function to parse eight digits, I handicap it by a factor of two, forcing it to waste a lot of effort.

Let me go straight at the speed results. I report throughput: how long it takes to parse an integer from a stream on average.

GCC 8.1 (-O3)

conventional3.9 cycles
Muła3.0 cycles

CLANG 6.0 (-O3)

conventional8.4 cycles
Muła3.2 cycles

So despite the handicap that Muła’s function has, it is still considerably faster than the conventional code. If I could parse pairs of eight digits instead of individual sets of eight digits, I could go nearly twice as fast.

I do not benchmark the latency. The latency story could be less positive for the Muła’s function which relies on instructions that have high throughput but also high latency.

Some of you might be thinking that there is surely a way to go faster, or, at least, to get rid of the need for Intel intrinsic functions. I tried to pull the same SIMD-within-a-register (SWAR) trick as in my previous post, but the best I could do was only barely faster than the conventional approach. To get it down to Muła’s speed, I would need shave another two cycles from it. I think it is possible. Here is the best I have so far using the SWAR approach:

uint32_t parse_eight_digits_swar(const char *chars) {
  uint64_t val;
  memcpy(&val, chars, 8);
  val = val - 0x3030303030303030;
  uint64_t byte10plus   = ((val        
      * (1 + (0xa  <<  8))) >>  8)
      & 0x00FF00FF00FF00FF;
  uint64_t short100plus = ((byte10plus 
       * (1 + (0x64 << 16))) >> 16) 
       & 0x0000FFFF0000FFFF;
  short100plus *= (1 + (10000ULL << 32));
  return short100plus >> 32;
}

At a glance, I repeat the sequence (multiplication, shift, mask) three times to aggregate eight digits. To go to 2N digits, I would need N such sequences. That is, I have a logarithmic-time algorithm that is limited by the size of my registers. It is good, but the fact that each step depends on the completion of the previous step is less good: it means that the latency is relatively high.

My code is available.

Credit: My SWAR function is an extension to eight digits of related functions by Muła on shorter sequences of digits. It was improved by Travis Downs who pointed out that we do not need to swap the bytes.

Quickly identifying a sequence of digits in a string of characters

Suppose that you want to quickly determine a sequence of eight characters are made of digits (e.g., ‘9434324134’). How fast can you go?

In software, characters are mapped to integer values called the code points. The ASCII and UTF-8 code points for the digits 0, 1,…, 9 are the consecutive integers 0x30, 0x31, …, 0x39 in hexadecimal notation.

Thus you can check whether a character is a digit by comparing with 0x30 and 0x39: ((c < 0x30) || (c > 0x39)). It is even cheaper than it looks because optimizing compilers simply take the code point, subtract 0x30 from it and compare the result with 9. So there is a single comparison!

Given a stream of characters, the conventional approach in C or C-like language is to loop over the sequence of characters and check that each one is a digit.

bool is_made_of_eight_digits(unsigned char * chars) {
    for(size_t j = 0; j < 8; j++) {
          if((chars[j] < 0x30) || (chars[j] > 0x39)) {
              return false;
          }
    }
    return true;
}

Can we do better? Instead of doing eight comparisons (one per character), we would like to do only one or two. For this, we can use SIMD within a register (SWAR): load the eight characters into a 64-bit integer and do some operations on the result integers.

Here is a simple “branchless” approach first… it does a single comparison:

bool is_made_of_eight_digits_branchless(const unsigned char  * chars) {
    uint64_t val;
    memcpy(&val, chars, 8);
    return ( val & (val + 0x0606060606060606) &0xF0F0F0F0F0F0F0F0) 
             == 0x3030303030303030;
}

(Credit: Travis Downs simplified and improved the version I had.)

In some instances, it might be faster to do two comparisons instead of one, especially if the results are predictible.

bool is_made_of_eight_digits_branchy(unsigned char  * chars) {
    uint64_t val;
    memcpy(&val, chars, 8);
    return (( val & 0xF0F0F0F0F0F0F0F0 ) == 0x3030303030303030)
      && (( (val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0 )
      == 0x3030303030303030); 
}

How do they work? One the one hand, they check that the most significant 4 bits of each character is the value 0x3. Once this is done, you know that the characters value must range in 0x30 to 0x3F, but you want to exclude the values from 0x3a to 0x3f. If you add 0x06 to the integers from 0x30 to 0x39 you get the integers 0x36 to 0x3f, but adding 0x06 to 03a gives you 0x40. So you can add 0x06 to each byte and check again that the most significant 4 bits of each character is the value 0x3.

It is crazily hard to benchmark such routines because their performance is highly sensitive to the data inputs. You really want to benchmark them on your actual data. And compilers matter a lot. Still, we can throw some synthetic data at it and see how well they fare (on a skylake processor).

compilerconventionalSWAR 1 comparisonSWAR 2 comparisons
gcc 8 (-O2)11.4 cycles3.1 cycles2.5 cycles
gcc 8 (-O3)5.2 cycles3.1 cycles3 cycles
clang 6 (-O2)5.3 cycles2.4 cycles2.2 cycles
clang 6 (-O3)5.3 cycles2.4 cycles2.1 cycles

The table reports the average time (throughput) to check that a sequence of eight characters is made of digits. In my tests, the branchless approach is not the fastest. Yet it might be a good bet in practice because it is going to run at the same speed, irrespective of the data.

Let us some consider less regular data where the processors cannot easily guess the result of the comparisons:

compilerconventionalSWAR 1 comparisonSWAR 2 comparisons
gcc 8 (-O2)12.1 cycles3.1 cycles5.2 cycles
gcc 8 (-O3)7.2 cycles3.1 cycles5.1 cycles
clang 6 (-O2)7.3 cycles2.4 cycles4.3 cycles
clang 6 (-O3)7.1 cycles2.4 cycles4.5 cycles

My source code is available.

Further reading. After working on this problem a bit, and finding a workable approach, I went on the Internet to check whether someone had done better, and I found an article by my friend Wojciech Muła who has an article on the exact same problem. It is a small world. His approach is similar although he has no equivalent to my single-comparison function.

Science and Technology links (September 30th, 2018)

  1. Oculus is launching a new standalone virtual-reality headset next year, the Oculus Quest. It is the price of a console like the Nintendo Switch, and might have comparable computational power. It does not need to be tethered to a PC (it is wireless). It should have top-notch virtual reality with six degrees of freedom, meaning that you can walk around. Thanks to inside-out tracking, you don’t need to install sensors in the room.

    At a glance, the produce looks comparable to the HTC Vive Focus which is already available (in China).

    I think that these standalone headsets are interesting and could make virtual reality mainstream… something that tethered headset could never hope to do.

    These small headsets cannot hope to present rich and photorealistic environments. But my experience with virtual reality so far is that simpler is often better.

  2. A single dose of a drug called Trodusquemine completely reverses arthrosclerosis (the buildup of fat in our arteries)… in mice.
  3. Questionable research practices are common in ecology and evolution research: 64 percent of researchers admitted to at least one instance of selective reporting. The fact that these researchers almost never share their data does not help.
  4. You can treat inflammated appendixes with antibiotics instead of using surgery.

Science and Technology links (September 22nd, 2018)

  1. Apple benefits from the chip-making technology of a company called TSMC. This company has surpassed Intel in transistor density. Thus, in some sense, the microprocessors in Apple’s latest iPhone are more advanced than the microprocessors you find in brand-new PCs.
  2. Here is a provocative opinion piece: For decades, the medical community has ignored mountains of evidence to wage a cruel and futile war on fat people, poisoning public perception and ruining millions of lives.
  3. Narcolepsy might be an autoimmune condition (source: Nature).
  4. A major insurer wants to include the use of fitness trackers (like the Apple Watch) as part of its policies (source: BBC).
  5. Economics might go against our deeply held instincts (source: New Scientist).
  6. Medical researchers in universities receive grants to conduct clinical trials, but they do not register the results contrary to what the regulations stipulate:

    (…) compliance is particularly lacking at universities conducting drug trials. These also include German universities such as Berlin (Charité), Heidelberg and Cologne (0%).

  7. The number of ruminant animals in the United States is roughly the same today as it was 200 years ago (Source: Nutrition Today).
  8. Open-source software is an international phenomenon that enables free collaboration from people all over the world. Code from Japanese and Canadians is more likely to be accepted into projects whereas code from Germany and Brazil is less likely to be accepted, with Americans being in the middle.
  9. Rangel found in her thesis that, at least some of the time, people rate software code more highly when they are told it is from a female developer:

    Respondents were asked to score source code written by a fictive male or female developer, (…) participants were randomly assigned one of four code examples (…) the fictive female author was scored higher than the fictive male author. These unexpected results support the need for further understanding of the complexities of gender related to software engineering, and should not provide a foundation for complacency in regard to improving female participation in software engineering.

    (Source: Northcentral University)

  10. People who consume one diet drink a day ‘three times more likely to suffer stroke or dementia’ (Source: The Independent)
  11. Wheat gluten intake increases weight gain according to an article in Nature. (Source: Nature)
  12. Treating fever may lead to a higher mortality rate. (Source: Surg. Infect.)
  13. Physical activity does not influence obesity risk. (Source: International Journal of Epidemiology)
  14. Despite claims by the World Health Organization (WHO) that eating processed meat causes colon cancer and red meat probably causes cancer, the observational data used to support the claims are weak, confounded by multiple unmeasured factors, and not supported by other types of research needed for such a conclusion. (Source: Animal Frontiers)
  15. We might be able to prevent cognitive decline and Alzheimer’s by clearing the brain from senescent cells, using new drugs called senolytics. (Source: Nature)

On the state of virtual-reality gaming

For nearly two years, I have been trying a wide range of video games in a virtual reality setting. Our lab. in Montreal has some permanent space dedicated to the HTC Vive, so I was also able to test out games with a wide range of people. I must have tried several dozen different games so far.

Gaming in virtual-reality is a disappointment. I am surprised that Sony sold millions of virtual-reality headsets. To my knowledge, there are no big studio betting on virtual reality. It is mostly owned by independent developers making small bets.

To be clear, I am not disappointed at virtual reality per se. However, it seems clear that two years ago, I greatly underestimated how much work we collectively need to do to get “virtual reality right”.

What works? A few games are quite good. I have two favorite games.

One of them is Superhot VR. In Superhot, you are an assassin moving from one minimalist sandbox to another, killing people best you can (with a knife, your fist, a bottle, a gun, …). It would be quite bland if not for the trick that time flows only as fast as you move. As long as you remain immobile, time remains still. The game is a “port” to virtual reality of a conventional game, but virtual reality makes it shine.

My other favorite game is Beat Saber. As the name suggests, you use (light) sabers to cut coloured boxes coming at you (not unlike a Star Wars Jedi) at the rhythm of some music. It is probably my favorite virtual-reality game so far.

Both of them are so good that they provide an unforgettable experience. However, they are both modest games.

What might we say about virtual-reality gaming?

  1. Both of these games are highly immersive. Once you are in the game, you feel as if you were teleported elsewhere and you forget where your body is. Yet they are not, in any way, realistic. That is, you are teleported in an artificial world that looks nothing like our everyday world.

    A few years ago, many people assumed that photorealism was required for immersion. That is entirely false.

  2. As a related, but distinct point, neither of these games was particularly expensive to build, or technically challenging. I could probably write cheap clones of these games in a few months, and I am not a video-game programmer. That is, of course, a consequence of the fact that there are seemingly no major investments.
  3. These games require “six degrees of freedom” and handheld commands. That is, they work because you can really move in the environment (forward, backward) while looking in all directions, and using your hands freely.

    However, they only require you to move within a small space. This last step is important since your actual body is still limited to a relatively small space.

    Many games allow you to travel vast distances through various tricks such as teleports, or by moving from within a vehicle. For example, you can point to a far location and click a button to appear there. Even though teleports “work” technically, they are disappointing. I almost invariably get frustrated at such games.

    Other games offer only restricted degrees of freedom. Some games only require you to look around, without having to move. I find these games disappointing as well.

  4. My impression is that simply carrying over existing video games is almost always going to be a futile exercise.

What might come around the corner?

  1. Multiplayer virtual-reality gaming might be great. There are games like Rec Room that offer decent experiences already, with a lot of frustration thrown in. However, we will need better hardware with features like eye tracking. It is almost here.
  2. I still haven’t seen any “long-form” game. That is, playing for hours in a deep and involved game is not possible, right now. What is worse: I cannot even imagine what such a game might look like.