Science and Technology links (November 14th 2020)

  1. COVID 19 forced enterprises to move to remote work. There has been decades of research showing that allowing workers to work remotely improves job satisfaction and productivity. It improves work-family balance. It reduces sick leaves. Not absolutely everything is positive, but much if it is. So why are employers reluctant to allow remote work? According to some researchers, it has to do with worker selection. That is, everything else being equal, if you recruit people to work from home, you will tend to disproportionally attract people who are lazy or incompetent. (I am not sure how broadly applicable this idea is.)
  2. There is increasing evidence that Alzheimer’s begins in the gut.
  3. The claim that more people are alive today than have ever died appears to be wrong.
  4. Schools adopt face recognition technology.
  5. Increasing your protein consumption is likely to make your body more muscular: slightly increasing current protein intake for several months by 0.1 g/kg/d in a dose-dependent manner over a range of doses from 0.5 to 3.5 g/kg/d may increase or maintain lean body mass.
  6. Is social science free from political biases? Despite what they assume, social scientists are probably not free from such biases and the consequences are probably quite bad, say Honeycutt and Jussim. For example, papers finding biases against women receive far more citations than papers failing to find such biases, despite the fact that the papers finding biases might be far weaker methodologically.
  7. Measured intelligence in human beings vary by ethnic origin. Lasker et al. attempt to relate this effect to both skin color and European ancestry. They find that skin color is not a significant variable while European ancestry appears to correlate well with measured intelligence. The whole topic is often considered to be outside of the Overton window and most social scientists would consider such inquiries to be unacceptable. I personally object to the current state of intelligence research on other grounds: as a computer scientist, I find that psychologists play with the concept of intelligence without ever definining it properly. That is, while you might be measuring something, you should make sure that you really understand what you are measuring. Someone’s height is a well defined attribute but “intelligence” is not a comparably well defined attribute. That you can quantify “something” does not imply that you know what you are measuring. I challenge psychologists to relate intelligence to the Church-Turing thesis.
  8. In mammals, babies can often repair injuries without scars, but this ability is quickly lost and adults accumulate scars over time. There is protein found in the skin of baby mice, but usually not present in adult mice. When applying this protein to the skin of adult mice, we find that the adult skin regains the baby-skin ability to regenerate without scars. In effect, this single protein rejuvenates the adult skin.
  9. According to Carlsmith, we might be within range of being able to match the human brain using maybe tens of thousands powerful processors. Using current technology, it would be costly though not for corporations like Google. In fact, the cost is sufficiently low that the work could be done in secret, if Carlsmith is right.
  10. 1% of the world’s population emits 50% of CO2 from commercial aviation.
  11. Apple has released a processor/system for their laptop called M1. It powers both the recently released MacBook Air and the smaller MacBook Pro. It has 16 billion transistors. Unsurprisingly, maybe, that is more than the number of transistors that you can find in the latest iPhone, which has about 12 billion transistors. But the iPhone 7 had about 3.3 billion transistors. The iPhone 5s had about a billion transistors. If you look at long-term charts of the number of transistors inside our systems, we appear to be maintaining an exponential growth in the number of transistors. Interpreted as an exponental fall in the number of transistors in commonly available processors, Moore’s law is very much alive even though we keep hearing that the end is in sight. In turn, this unavoidably leads to higher and higher performance as our chips are able to do more per unit of time. Interestingly, the power usage itself also tends to fall. The early Pentium 4 mobile processors at the beginning of the century consumed 35 Watts for the processor alone: you can probably charge your whole iPhone for a day of us using 35 Watts for 15 minutes. For comparison, you brain consumes about 20 Watts.
  12. Though we do not have AIDS (HIV) vaccine yet, we might finally have a drug that reliably protects us (at least women) from getting infected.

Xbox Series X and PlayStation 5: early impressions

This week, my family got a copy of each new major game console: the Microsoft Xbox Series X and the Sony PlayStation 5. I haven’t yet had time to try them out well, but I know enough to give my first impressions.

They are both very similar machines from the inside. The same kind of processor, the same kind of memory. Reportedly, the Xbox Series X has a few more cores, and it might be the fastest of the two, but it is only fair to say that they are close. They sell at a comparable price.

But of these consoles look at first glance like an incremental upgrade on the previous generation. Though the PlayStation 5 is much taller than a PlayStation 4, it is basically functionally the same. I just removed the PlayStation 4 and put the PlayStation 5 instead. In another room where we tried it, it would not work, but it had to do with a bad HDMI cable. Using the Sony PlayStation 5 with the provided HDMI cable solved the problem. Upgrading to the PlayStation 5, we were able to bring back all our games, and they appear to work well.

The PlayStation 5 controller is like nothing I have ever experienced before. It is not that the Xbox Series X has a bad controller: they appear to be much the same. But the PlayStation 5’s “haptic feedback” feels like a form of virtual reality. You can feel textures, and water, and so forth. It is also much slicker looking that the PlayStation 4 controller. It remains to be seen whether game makers will take advantage of the new controller.

Both machines offer a qualitatively different experience: they are both very fast. So much faster than the previous generation that you get a real leap. Everything is snappier.

The PlayStation 5 has a fast, but tiny disk. Our disk is already almost full. There is no way to expand it right now. You can connect an external drive, but it will only help you with legacy (e.g. PlayStation 4) games. This is going to be a big problem, and quick. The Xbox Series X has a more reasonable disk, but it will also fill quickly.

They are both quiet game consoles. They generate a fair amount of heat but they do so quietly.

The Sony PlayStation 5 appears to fully support bluetooth components, while the XBox Series X only supports hardware adopting Microsoft’s proprietary wireless technology.

The XBox Series X has a legacy USB port that can be used to recharge your controller… and not much else. You cannot hook a microphone or speakers to it. To connect a speaker or a microphone to your XBox Series X without going through the television, you have to hook it up to the controller through a dongle. The Sony PlayStation 5  has modern and seemingly fully functional USB ports.

The XBox Series X has a wide range of games available through Microsoft gamepass, and the price is attractive. There are few new generation titles, but the XBox Series X  makes it up in volume. The fact that there are relatively few (if any) games exclusive to the XBox Series X  probably makes it a less exciting console if you already have an XBox.

The Sony PlayStation 5 can play your PlayStation 4 games, and it has a few interesting titles coming out. I am looking forward to receiving and trying Spider-Man Miles Morales.

Overall, it is a great time to be a gamer, especially if you can afford these consoles. If not, you might rejoice in the fact that used XBox and PlayStation consoles just got cheaper.

Benchmarking theorem provers for programming tasks: yices vs. z3

One neat family of tools that most programmers should know about are “theorem provers”. If you went to college in computer science, you may have been exposed to them… but you may not think of using them when programming.

Though I am sure that they can be used to prove theorems, I have never used them for such a purpose. They are useful for quickly checking some assumptions and finding useful constants. Let me give a simple example.

We have that unsigned odd integers in software have multiplicative inverses. That is,  if you are given the number 3, you can find another number such that when you multiply it with 3, you get 1. There are efficient algorithms to find such multiplicative inverses, but a theorem prover can do it without any fuss or domain knowledge. You can write the following Python program:

s = Solver()
a = BitVec('a', 64)
s.add(a*3  == 1)

It will return 12297829382473034411. As 64-bit unsigned integers, if you multiply 12297829382473034411 with 3, you get back 1. If there was no possible solution, the theorem prover would tell as well. So it can find useful constants, or prove that no constant can be found.

For some related tasks, I have been using the popular z3 theorem prover and it has served me well. But it can be slow at times. So I asked Geoff Langdale for advice and he recommended yices, another theorem prover that might be faster for the kind of work that programmers do, e.g., using fixed-bit integer values.

Though I trust Geoff, I wanted to derive some measures. So I built the following benchmark. For all integers between 0 and 1000, I try to find a multiplicative inverse. It will not always work (even numbers do not have inverse), but the theorem prover is left to figure that out.

What are the results?

z3 15 s
yices 1 s

So, at least in this one test, yices is 15 times faster than z3. My Python scripts are available. You can install z3 and yices by using the standard pip tool. Be mindful that yices should be present on your system, but the authors provide easy instructions.

I found the Python interface of yices to be quite painful compared to z3. So if performance is not a concern, z3 might serve you well.

But why refer to performance? Go back the numbers above. To solve 1000 inverse problems in 15 s is really quite slow on a per number basis. It is on the order of 60 million CPU cycles per number. And it is an easy problem. As you start asking more complicated questions, a theorem prover can quickly slow down to the point of becoming unusable. Being able to go just 10x faster can make a large difference in practice.

Caveat: It is just one test and it does not, in any way, establish the superiority (in general) of yices over z3.

How will the pandemic impact software programming jobs?

Software programming is not for everyone, but among the careers that are mostly unregulated, and thus mostly free from rents, it has consistently been one of the best choices. You can earn more money if you embrace some professions that are regulated (e.g., medical professional), but if you are a recent immigrant, or someone who could not afford college education, programming is a decent and accessible choice.

I expect that what makes it a good avenue is a mix of different unique features:

  • It is relatively easy to tell a good programmer from a bad one. It is hard to produce correct and efficient software “by accident”. Thus even if you lack the best credentials, you can still “prove” that you are good, quickly.
  • It is one of the few industry that has been consistently innovating. Thus there are always new jobs created. Once we are done putting businesses online, mobile applications appear, and so forth.

So what happens when a pandemic happens and remote work becomes the norm all of a sudden? It is impossible to predict the future, but I like to put my views in concrete terms, with a time stamp on them.

I have been programming for decades and my impression is that you do not learn to program by taking classes. Not really. You can learn the basics that way, but nothing close to what you need to be a productive member of the industry. In this respect, programming is not unique. I do not think you can take Japanese classes and expect to show up in Tokyo and be a functional member of the city. Simply put, there is much that is not formalized.

In programming, there is also the additional problem that the best programmers are often doing something else besides teaching. It is entirely possible that the very best historians are also teaching, but the very best programmers are programming not teaching. You do not become a computer science professor based on your programming skills. In fact, most computer science professors have never released generally useful software.

Thankfully, you can learn to program on your own. My youngest son just finished a complete video game, written in C# using Unity. It should appear on Steam soon. I never taught my son any programming. Not really. He did take a few classes for fun, but he is almost entirely self-taught.

Yet, human beings are social creatures. If you want to “up your game”, you need to see what the very best people are doing, you need to be challenged from them. It is possible online.

My best advice to people who wanted to become good programmers was to go and work with a master. If you work with someone who is a very good programmer, you will learn. You will learn faster than you ever could on your own. I, myself, have learned a lot from the wide range of superb programmers I have had the pleasure of working with.

Of course, it is still possible for a junior programmer to work with an experience master despite the pandemic. However, my impression is that it has become harder. I can only base it on my limited view of the world, but I am much less interested in taking in new graduate students and research assistants today.

I had a “lab”: a room filled with graduate students and a few research assistant. These people would come work, I would come in at random times during the day, we would chat, we would look at code on the giant white board. Sometimes, on Fridays, we would play games. There are even rumours that beer was available at times. The room is still there. I am no longer showing up. The white board is probably blank (I don’t know). I use Zoom, extensively, but I cannot believe that it is the same effect. The camaraderie is gone.

My experience might be unique, but if it is at all representative of what is happening, I bet that many junior folks are getting much less personal training and coaching that they otherwise would. If that is correct…

I predict that there will be fewer new hires. I expect that unexperienced programmers will be less appealing than ever. Any challenge making training and coaching harder is bound to reduce their number.

Meanwhile, people who know what they are doing and can be relied to work well from home are going to be more in demand than ever. Since it describes the very best programmers earning the very best salaries, what this suggests is that the salary distribution will spread even more. A few top programmers will receive the salaries that would have otherwise gone to the younger programmers.

It may also lead to some industry concentration. If it is harder to find “fresh blood”, then it makes it harder to start a new company. Many of the local tech talks had less to do with the speakers and more to do with meeting new faces and discussing employment.

We have been told for years how the secret to the Silicon Valley was in the impromptu meeting by the local burger joint… What happens when people work from home? If the narrative about Silicon Valley was at all true, then you would expect fewer new companies.

Longer term, I do not believe that this should impact the innovation rate in the software industry. People will adjust. However, I think that short-term job prospects for the younger programmers are going to be difficult.

Credit: This blog post is motivated by an exchange on Twitter with Richard Startin and Ben Adams.

Science and Technology links (October 31st 2020)

  1. Amazon has 1 million employees.
  2. “The iPhone 12 contains a Lidar. The first 3D Lidar was released a decade ago and cost $75,000.” (Calum Chace)
  3. There is water on the Moon, possibly enough to make fuel.
  4. Good looking people have greater social networks and may receive favorable treatment from others, but it is a mixed blessing. They are better supported, but might also be enticed to party more and invest more in sex which takes time away from work.
  5. It looks like the regular use of skin creams could reduce inflammation in your whole body and thus, possibly, keep you healthier. (speculative)
  6. You can predict someone’s height within a few centimeters from their genes.
  7. We found new salivary glands hidden under our skull’s base.
  8. People are driving forklifts remotely from an office.
  9. Toronto (the Canadian city) is going to try out automated shuttles.
  10. Genes may predict mathematical abilities and related brain volume .
  11. Bees have five eyes.
  12. In vitro (in laboratory), we have been able to regenerate cartilage. This will not help you in the near future if you have joint pains, but people in the future may fare better.
  13. As we age, we accumulate senescent cells and they are believed to cause trouble. Senolytics are midly toxic compounds that target senescent cells and destroy them. Researchers found that a particular senolytic proved capable of improving frailty and cognitive functions in old mice. There are ongoing clinical trials regarding senolytic drugs in human beings, but we still have some time to go.
  14. In A global decline in research productivity? Evidence from China and Germany, the authors verify recent results related the United States pointing that while the number of researchers is steadily increasing, high-value outputs do not seem to increase at a similar rate. One possible implication for these results is that, keeping everything else equal, increasing the number of researchers is wasteful. In fact, it may suggest that we are overesting in the production of new researchers (i.e., we might be training too many PhDs). My own take is that we are insufficiently preoccupied with research productivity. We encourage researchers to write grant applications, publish papers, acquire rents (i.e., patents), but innovation is based on a “throw over the wall” model from the researcher’s point of view. A typical researcher believe that it is not his or her purpose to enhance products, cure diseases and so forth. The simplistic approach of “getting more researchers” may therefore not translate into new innovative products and cancer cures. To get to Mars, we may need more people like Elon Musk and Jeff Bezos, more Moon projects, and fewer new PhDs. Even if you disagree with this last assertion, the fact is that it becomes harder and harder to justify training more PhDs in the hope of getting more prosperity.

What the heck is the value of “-n % n” in programming languages?

When coding efficient algorithms having to do with hashing, random number generations or even cryptography, a common construction is the expression “-n%n“. My experience has been that it confuses many programmers, so let us examine it further.

To illustrate, let us look at the implementation of std::uniform_int_distribution found in the GNU C++ library (Linux) and clean up the line in question:

threshold = -range % range;

The percent sign (%) in this expression refers to the modulo operation. It returns the remainder of the integer division. To simplify the discussion, let us assume that range is strictly positive since dividing by zero causes problems.

We should pay attention to the leading minus sign (). It is the unary operator that negates a value, and not the subtraction sign. There is a difference between “-range % range" and “0-range % range". They are not at all equivalent. They will actually give you different values; the latter expression is always zero. And that is because of the priority of operation. The negation operation has precedence on the modulo operation which has precedence on the subtraction operation. Thus you can rewrite “-range % range" as “(-range) % range". And you can write “0-range % range" as “0- (range % range)“.

When the variable range is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero.

So let us assume that the variable range is an unsigned type, as it is meant to be. In such cases, the expression is generally non-zero.

We need to compute -range. What does it mean to negate an unsigned value?

When the variable range is an unsigned type, Visual Studio is likely to be unhappy at the expression -range. A recent Visual Studio returns the following warning:

warning C4146: unary minus operator applied to unsigned type, result still unsigned

Nevertheless, I believe that it is a well defined operation in C++, Go and many other programming languages. Jonathan Adamczewski has a whole blog post on the topic which suggests that the Visual Studio warning is best explained by a historical deviations from the C++ standard from the Microsoft Visual Studio team. (Note that the current Visual Studio team seems committed to the standards going forward.)

My favorite definition is that –range is defined by range + (-range) = 0. That is, it is the value such that when you add it to range, you get zero. Mathematicians would say that it is the “additive inverse”. In programming languages (like Go and C++) where unsigned integers wrap around, then there is always one, and only one, additive inverse to every integer value.

You can define this additive inverse without the unary negation: if max is the maximum value that you can represent, then you can replace –range by maximumrange + 1. Or, maybe more simply, as (0-range). And indeed, in the Swift programming language, this particular line was represented as follow:

      let threshold = (0 &- range) % range

The Swift language has two subtraction operations, one that is not allowed to overflow (the usual ‘-‘), and one that is allowed to overflow (‘&-‘). It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer.

In C#, the system will not let you negate an unsigned integer and will instead cast it as a signed integer, so you have to go the long way around if you want to remain in unsigned mode, like so…

threshold = (uint.MaxValue - scale + 1) % scale

This expression is unfortunately type specific (here uint).

To conclude: you can learn a lot just by examining one line of code. To put it another way, programming is a much deeper and complex practice than it seems at first. As I was telling a student of mine yesterday: you are not supposed to read new code and understand it right away all of the time.

Ridiculously fast unicode (UTF-8) validation

One of the most common “data type” in programming is the text string. When programmers think of a string, they imagine that they are dealing with a list or an array of characters. It is often a “good enough” approximation, but reality is more complex.

The characters must be encoded into bits in some way. Most strings on the Internet, including this blog post, are encoded using a standard called UTF-8. The UTF-8 format represents “characters” using 1, 2, 3 or 4 bytes. It is a generalization of the ASCII standard which uses just one byte per character. That is, an ASCII string is also an UTF-8 string.

It is slightly more complicated because, technically, what UTF-8 describes are code points, and a visible character, like emojis, can be made of several code points… but it is a pedantic distinction for most programmers.

There are other standards. Some older programming languages like C# and Java rely on UTF-16. In UTF-16, you use two or four bytes per character. It seemed like a good idea at the time, but I believe that the consensus is increasingly moving toward using UTF-8 all the time, everywhere.

What most character encodings have in common is that they are subject to constraints and that these constraints must be enforce. To put it another way, not any random sequence of bits is UTF-8. Thus you must validate that the strings you receive are valid UTF-8.

Does it matter? It does. For example, Microsoft’s web server had a security vulnerability whereas one could send URIs that would appear to the security checks as being valid and safe, but once interpreted by the server, would allow an attacker to navigate on the disk of the server. Even if security is not a concern, you almost surely want to reject invalid strings before you store them in your database as it is a form of corruption.

So your programming languages, your web servers, your browsers, your database engines, all validate UTF-8 all of the time.

If your strings are mostly just ASCII strings, then checks are quite fast and UTF-8 validation is no issue. However, the days when all of your strings were reliably ASCII strings are gone. We live in the world of emojis and international characters.

Back in 2018, I started wondering… How fast can you validate UTF-8 strings? The answer I got back then is a few CPU cycles per character. That may seem satisfying, but I was not happy.

It took years, but I believe we have now arrived at what might be close to the best one can do: the lookup algorithm. It can be more than ten times faster than common fast alternatives. We wrote a research paper about it: Validating UTF-8 In Less Than One Instruction Per Byte (to appear at Software: Practice and Experience). We have also published our benchmarking software.

Because we have a whole research paper to explain it, I will not go into the details, but the core insight is quite neat. Most of the UTF-8 validation can be done by looking at pairs of successive bytes. Once you have identified all violations that you can detect by looking at all pairs of successive bytes, there is relatively little left to do (per byte).

Our processors all have fast SIMD instructions. They are instructions that operate on wide registers (128 bits, 256 bits, and so forth). Most of them have a “vectorized lookup” instruction that can take, say, 16 byte values (in the range 0 to 16) and look them up in a 16-byte table. Intel and AMD processors have the pshufb instruction that match this description. A value in the range 0 to 16 is sometimes called a nibble, it spans 4 bits. A byte is made of two nibbles (the low and high nibble).

In the lookup algorithm, we call a vectorized lookup instruction three times: once on the low nibble, once on the high nibble and once on the high nibble of the next byte. We have three corresponding 16-byte lookup tables. By choosing them just right, the bitwise AND of the three lookups will allow us to spot any error.

Refer to the paper for more details, but the net result is that you can validate almost entirely a UTF-8 string using roughly 5 lines of fast C++ code without any branching… and these 5 lines validate blocks as large as 32 bytes at a time…

simd8 classify(simd8 input, simd8 previous_input) {
  auto prev1 = input.prev<1>(previous_input);
  auto byte_1_high = prev1.shift_right <4>().lookup_16(table1);
  auto byte_1_low = (prev1 & 0x0F).lookup_16(table2);
  auto byte_2_high = input.shift_right <4>().lookup_16(table3); 
  return (byte_1_high & byte_1_low & byte_2_high);

It is not immediately obvious that this would be sufficient and 100% safe. But it is. You only need a few inexpensive additional technical steps.

The net result is that on recent Intel/AMD processors, you need just under an instruction per byte to validate even the worse random inputs, and because of how streamlined the code is, you can retire more than three such instructions per cycle. That is, we use a small fraction of a CPU cycle (less than 1/3) per input byte in the worst case on a recent CPU. Thus we consistently achieve speeds of over 12 GB/s.

The lesson is that while lookup tables are useful, vectorized lookup tables are fundamental building blocks for high-speed algorithms.

If you need to use the fast lookup UTF-8 validation function in a production setting, we recommend that you go through the simdjson library (version 0.5 or better). It is well tested and has features to make your life easier like runtime dispatching. Though the simdjson library is motivated by JSON parsing, you can use it to just validate UTF-8 even when there is no JSON in sight. The simdjson supports 64-bit ARM and x64 processors, with fallback functions for other systems. We package it as a single header file along with a single source file; so you can pretty much just drop it into your C++ project.

Credit: Muła popularized more than anyone the vectorized classification technique that is key to the lookup algorithm. To my knowledge, Keiser first came up with the three-lookup strategy. To my knowledge, the first practical (non hacked) SIMD-based UTF-8 validation algorithm was crafted by K. Willets. Several people, including Z. Wegner showed that you could do better. Travis Downs also provided clever insights on how to accelerate conventional algorithms.

Further reading: If you like this work, you may like Base64 encoding and decoding at almost the speed of a memory copy (Software: Practice and Experience 50 (2), 2020) and Parsing Gigabytes of JSON per Second (VLDB Journal 28 (6), 2019).

Science and Technology links (October 17th 2020)

  1. Computer vision (i.e., artificial intelligence) and cameras are used in London to monitor citizens with respect to social distancing.
  2. A fecal transplant from old mice to young mice appears to “age the young mice“. It appears that the reverse might also work: fecal transplants from the young to the old could “rejuvenate the old”. (Speculative.)
  3. Using high levels of vitamin D supplements appear safe and has benefits on diseases like asthma and psoriasis.
  4. Obesity and type 2 diabetes are associated with low bone turnover along with an increased fracture risk, but we do not understand why.
  5. Men have greater variability in brain structure, and that is true at all ages.

Why is 0.1 + 0.2 not equal to 0.3?

In most programming languages, the value 0.1 + 0.2 differs from 0.3. Let us try it out in Node (JavaScript):

> 0.1 + 0.2 == 0.3

Yet 1 + 2 is equal to 3.

Why is that? Let us look at it a bit more closely. In most instances, your computer will represent numbers like 0.1 or 0.2 using binary64. In this format, numbers are represented using a 53-bit mantissa (a number between 252 and 253) multiplied by a power of two.

When you type 0.1 or 0.2, the computer does not represent 0.1 or 0.2 exactly.

Instead, it tries to find the closest possible value. For the number 0.1, the best match is 7205759403792794 times 2-56. It is just slightly larger than 0.1, about 0.10000000000000000555. Importantly, this is a bit larger than 0.1. The compute could have used 7205759403792793 times 2-56 or 0.099999999999999991 instead, but it is a slightly worse approximation.

For 0.2, the computer will use 7205759403792794 times 2-55 or about 0.2000000000000000111. Again, this is just slightly larger than than 0.2.

What about 0.3? The compute will use 5404319552844595 times 2-54, or approximately 0.29999999999999998889776975, so just under 0.3.

When the computer adds 0.1 and 0.2, it has no longer any idea what the original numbers are. It only has 0.10000000000000000555 and 0.2000000000000000111. When it adds them together, it seeks the best approximation to the sum of these two numbers. It finds, unsurprisingly, the a value just above 0.3 is the best match: 5404319552844596 times 2-54, or approximately 0.30000000000000004440.

And that is why 0.1 + 0.2 is not equal to 0.3 in software. When you stream different sequences of approximations, even if the exact values would be equal, there is no reason to expect that your approximations will match.

If you are working a lot with decimals, you can try to rely on another computer type, the decimal. It is much slower, but it would not have this exact problem since it is designed specifically for decimal values:

>>> Decimal(1)/Decimal(10) + Decimal(2)/Decimal(10)

However, decimals have other problems:

>>> Decimal(1)/Decimal(3)*Decimal(3) == Decimal(1)

What is going on? What can’t computers support numbers the way human beings do?

Computers can do computations the way human beings do. For example, WolframAlpha has none of the problems above. Effectively, it gives the impression that it processes values a bit like human beings do. But it is slow.

You may think that computers being so fast, there is really no reason of being inconvenienced at the expense of speed. And that may well be true, but many software projects that start out believing that performance is irrelevant, end up being asked to optimize later. And it can be really difficult to engineer speed back into a system that sacrificed performance at every step.

Speed matters.

AMD recently released its latest processors (zen3). They are expected to be 20% faster than their previous family of processors (zen2). This 20% performance boost is viewed as a remarkable achievement. Going only 20% faster is worth billions of dollars to AMD.

Science and Technology links (October 3rd 2020)

  1. The mortality rate for kids under five have fallen by 60% since 1990.
  2. Samsung new storage drives are both affordable and really fast (up to 7GB/s).
  3. Alzheimer’s disease may be driven by overactivation of cerebral fructose metabolism. The researchers write: we propose that Alzheimer’s disease is a modern disease driven by changes in dietary lifestyle in which fructose can disrupt cerebral metabolism and neuronal function.
  4. Belly fat increases your mortality risk: A nearly J shaped association was found between waist circumference and waist-to-height ratio and the risk of all cause mortality in men and women. It does not appear that being fat in other ways is harmful: larger hip circumference and thigh circumference were associated with a lower risk. So instead of weighting yourself, you ought to watch your waist circumference.
  5. Elderly people in Finland are living longer while being also fitter and healthier.
  6. Investing in innovation pays: innovation efforts produce social benefits that are many multiples of the investment costs.
  7. Gender difference in occupational preferences is largely independent of individual, parental, and regional controls:

    We document that female apprentices tend to choose occupations that are oriented towards working with people, while male apprentices tend to favor occupations that involve working with things. In fact, our analysis suggests that this variable is by any statistical measure among the most important proximate predictors of occupational gender segregation.

How expensive is integer-overflow trapping in C++?

Integers in programming languages have a valid range but arithmetic operations can result in values that exceed such ranges. For example, adding two large integers can result in an integer that cannot be represented in the integer type. We often refer to such error conditions as overflows.

In a programming languages like Swift, an overflow will result in the program aborting its execution. The rationale is that once an arithmetic operation has failed, everything else the program might be doing is suspect and you are better off aborting the program. Most other programming languages are not so cautious. For example, a Rust program compiled in release mode will not abort by default.

In C++, most compilers will simply ignore the overflow. However, popular compilers give you choices. When using GCC and clang, you can specify that integer overflows should result in a program crash (abort) using the -ftrapv flag.

I was curious about the performance implications so I wrote a small program that simply adds all of the values in a large array. The answer I sought turns out to depend critically on the choice of compiler:

GCC 9 clang 9
no trapping 0.17 ns/int 0.11 ns/int
trapping 2.1 ns/int 0.32 ns/int
slowdown 12 x 3 x

With no trapping, the clang compiler beats GCC (0.11 vs. 0.17) by a 50% margin but this should not preoccupy us too much: it is a single microbenchmark.

What is a lot more significant is that enabling overflow trapping in GCC incurs an order of magnitude slowdown. Though it is only one microbenchmark, the size of the result suggests that we should be concerned. Looking at the assembly, I find that the clang compiler generates sensible code on x64 processor, with simple jumps added when the overflow is detected. Meanwhile, GCC seems to call poorly optimized runtime library functions.

Overall this one test does establish that checking for overflows can be expensive.

Credit: This blog post was motivated by an email by Stefan Kanthak.

Science and Technology links (September 19th 2020)

  1. A large city dating back 4,300 years has been discovered in China. It predates the Chinese civilization. At its center was a wide pyramid supporting a 20-acre palace. Little is known about the people living there other than the fact that they had relatively advanced technology.
  2. Genetic testing of dead warriors from 3000 years ago in Germany reveals that they could not drink milk. It is unclear when Europeans began to drink milk.
  3. Video games might be especially beneficial to boys as it improves their literacy.
  4. Anti‐inflammatory treatment rescues age-related memory deficits in mice.
  5. People today live several years older than thirty years ago. The core of the improvement is due to a reduction of deaths due to heart disease. Meanwhile drug overdoses are more lethal than they were.

Parsing floats in C++: benchmarking strtod vs. from_chars

Programmers often need to convert a string into a floating-point numbers. For example, you might get the string “3.1416” and you would like to get the resulting value as floating-point type.

In C/C++, the standard way is the strtod function:

char * string = "3.1416";
char * string_end = string;
double x = strtod(string, &string_end);
if(string_end == string) { 
  //you have an error!

Unfortunately, the strtod function is locale-sensitive like many of these string processing functions. It means that it may behaved differently depending on the system where you run the code. However, all runtime libraries appear to have locale-specific functions like strtod_l or _strtod_l (Windows). You can use these locale-specific functions to specify the default locale and thus get functions that behave the same on all systems.

One nice feature of the strtod function is that it gives you back a pointer at the end of the parsed number. In this manner, you can efficiently parse a sequence of comma-separated numbers, for example.

You can also use C++ streams but they are typically not geared toward performance.

In C++17, we got a nicer option: the from_chars function. It works similarly:

std::string st = "3.1416";
double x; 
auto [p, ec] = std::from_chars(, + st.size(), x);
if (p == {
      //you have an errors!

The great thing about from_chars is that it is, by definition, locale-independent. Furthermore, it is standardized, so it should be available everywhere. Unfortunately, to my knowledge, the only C++ compiler to come with a fully implemented from_chars function is Visual Studio 2019. And you need to have a recent release.

Indeed, though C++17 has been out for some time, and many compilers claim to fully support it, the runtime libraries need to catch up! You can get independent from_chars implementations, of course, but it would be nice if runtime librairies supported C++17 out of the box.

Still, we should celebrate that Microsoft is doing the right thing and eagerly supporting the latest standards.

I heard good things about the Visual Studio from_chars. And among other things, that it is very fast.

I wanted to know much faster it was. So I wrote a benchmark! I generate a large number of random values in the [0,1] interval and I record how long it takes to parse them back.

stdtod (C) 120 MB/s
from_chars (C++17) 140 MB/s

You do get a nice 20% performance boost, at no cost whatsoever. Brilliant.

Good students find questions, not answers

It is often believed that learning is a simple matter of collecting answers and replies.  I suspect that “learn mechanistically how to answer the questions” is a great way for weak students to pass courses, and for smart students to ace courses. However, I believe that if you really want to learn the material, you have to learn to ask good questions and stop focusing on answers.

If you ask experienced professors, they will probably admit that some students who do very well on tests (A+ all the way) make uninspiring graduate students, while less stellar undergraduate students can end up being superb graduate students.

Part of the difference is that in graduate school, you have to ask questions first. A researcher’s job is to come up with new questions, not necessarily new answers. That is because once you have a good question, the answer can often be derived mechanistically.

It is somewhat counterintuitive if you have been trained in conventional schools. For example, you see mathematics as a set of “obvious questions” followed by “non-obvious answers”. And you think that, surely, the hard part was coming up with the answers.

This leads you to think that the way forward is to work really hard on really complicated answers to simple questions. But where will the questions come from?

What is a good question?

  1. A question should be at the frontier of the domain of knowledge as you understand it. To really master the material, you have to go beyond the surface and understand its limits.
  2. It should trigger interest from the part of the recipient. Your question should be interesting, not just to you, but to the person receiving the question. You may wonder why it should matter: it does because it puts a psychological bar, ruling out tempting but shallow questions.

A good question is usually prepared, not improvised. Most people need a few minutes to come up with a good question. But there are some quick recipes that often deliver good questions…

  • Questions about definitions… It is common that a speaker will use terms without defining them clearly. If you are unsure about how you would define a term, it might make for a great question.
  • « How do you know » questions are often great. If the expert tells you that programming in a given manner reduces memory usage, you may ask « How would I know ? »

The real reason for asking good questions is that the process itself will force you to learn. That is, if you are interested in learning, in really learning, you will not focus on coming up with answers, rather you will focus on coming up with questions.

You can see it play out if you pay attention. Go to an event where “expert learners” are trying to up their game. For example, a good workshop with people at the top of their game. If the event is well managed, you will observe that people are asking each other good questions.

Experienced professors do it effortlessly. The best engineers are great at it.

As an experiment, next time you encounter an expert in a field that you do not yet master, focus on trying to come up with good, non-obvious questions. I bet that you will learn a lot.

Science and Technology links (September 5th 2020)

  1. Single cells are able to navigate complex mazes. E.g., it works with mouse pancreatic cancer cells.
  2. Body builders and athletes sometimes take a supplement called AKG. It was found to prolong life in worms. Researchers have now found that AKG massively delays frailty in mice: old mice that took AKG look better and are stronger. The mice also live a bit longer.
  3. Hair dyes applied at home are not associated with greater cancer risks.
  4. Plaque in the arteries is correlated with cardiovascular diseases (heart attacks) and tends to accumulate with age. Researchers have determined that a daily supplement of something called IPE could reverse the amount of plaque in a relatively small clinical trial.
  5. A company called NuScale got a new nuclear reactor design approved in the US. It took over two million pages of documentation. These are new, small and very safe reactors. They should become commercially available in a few years.
  6. People who suffer from diabetes are at greater risk of death from diseases like COVID 19. Researchers found that those taking metformin (a cheap and popular anti-diabetes drug) are protected from COVID 19.
  7. In mice, both caloric restriction (eating less) and a drug called rapamycin, prolongs female fertility into old age.

Sentinels can be faster

A standard trick in programming is to use “sentinel values”. These are special values that represent metadata efficiently.

The C language represents strings as a sequences of characters terminated with the null character. The null character is a sentinel that indicates the string length. E.g., my name (“Lemire”) would be represent as the string “Lemire\0” in memory when using the C language where I represent the null character as ‘\0’ . The length of my name (6 characters) is never stored explicitly.

If you are a long-time C programmer, you probably think that the null sentinel is an “obvious” concept but newcomers to programming often find it counterintuitive. It is a non-trivial idea.

The null sentinel is “space efficient” in C. It is almost surely the case that other programming languages have more memory overhead per string.

Yet I remember reading that null-terminated strings were a design flaw in the C language from the point of view of performance. Indeed, you frequently need to know the length of a string and when you do, you need to scan all of the string to find the null sentinel and compute the string length. Other ancient programming languages like Pascal had string values with length prefixes. Today, I expect most programming languages to have steered clear of null-terminated strings.

But the use of sentinels can also lead to computationally efficient code though you do not need null characters per se. Let me illustrate.

Suppose that you are given a string and that you want to skip all of the leading spaces. If you are only given the starting point of the string and either its length or its ending point, then you might be stuck with code like so:

const char * skip_leading_spaces(const char * start, const char * end) {
  while((start != end) && (is_space(*start))) {
  return start;

For each character, you have to check that you are still within the string, and you have to check whether it is a space. You end up doing two comparisons! The C++ programming languages has a strong bias in favour of such constructions. The popular functional programming idioms are often built on top of similar code.

Of course, you could get smarter. For example, you could branch on the string size to try to avoid one of the check at each character.  But if the string length is hard to predict, you may end up with small benefits and complicated code.

What if you know that the string is either null-terminated, or that it contains at least one non-space character that can serve as a de facto sentinel ? Then you can use simpler code where you only have to load a new character and check that it is a space:

const char * skip_leading_spaces(const char * start) {
  while(is_space(*start)) {
  return start;

You end up with half the number of comparisons!

Is it faster?

It can be a lot faster. I wrote a little benchmark where I generate random strings having a random number of leading spaces. The sentinel approach is 40% faster in my tests.

Of course, the results depend critically on your data and on the exact problem you are trying to solve. Yet I think it is worth keeping sentinels in mind when you need to write high performance code, and you are dealing with irregular data.

Science and Technology links (August 29th 2020)

  1. In children, higher video game time is positively associated with cognition (i.e., kids who play more video games are smarter). Note that it does not follow that playing video games makes you smarter; it could be that smarter kids are more interested in video games.
  2. I always assumed that the introduction of antibiotics after world war 2 would be visible in the longevity statistics. But neither the introduction of vaccines nor the widespread introduction of antibiotics is visible in the longevity curves as a sharp upward discontinuity. Instead, we see a boring monotonic curve: people live longer and longer without any specific boost due to the introduction of a given technology. In fact, researchers tell us that antibiotics did not have a dramatic effect on the mortality of infectious diseases. I have never encountered a good explanation regarding our ever increasing longevity.
  3. Loeb discusses the effect of ever increasing longevity. He points out that, for the very old, there is no cake large enough to fit one candle per year on a birthday cake. Instead, he suggests, we should use a number of candle proportional to the logarithm of the age. So at year 2, you would get one candle, you might get 2 at year 4, 3 at year 8, 4 at year 16, 5 at year 32, 6 at year 64 and 6 at year 128. It scales much better.
  4. A single injection of a human gene believed to boost muscles and strength in old mice did just that: the old mice got stronger. Evidently, it suggests that the same could be done in elderly human beings.
  5. Computer speed has historically increased due to Moore’s law: processors acquired more and more transitors and got faster clock speeds. Some people have been pessimistic about the future. In There’s plenty of room at the Top, Leiserson et al. point out that we can do much to keep improving the performance of our computers. They conjecture that it will require more specific techniques, such as a better integration of software and hardware. If true, this would be in contrast with historical trends where we have increasingly abstracted out the hardware within the software, and tended to rely on generic hardware designs. It may help vertically integrated vendors like Apple while hurting horizontal players like Intel.

Science and Technology links (August 9th 2020)

  1. The BBC reports that diversity and anti-bias training is of little use and may even be counterproductive if the goal is reduce biases:

    “The effect of bias training is very weak if you look at the long run,” says Kalev. “A company is better off doing nothing than mandatory diversity training.”

    Some research warns that such training may spur more racism by enticing people to think in terms of “races”.

  2. Our ancestors frequently died of smallpox, and it did not affect just kids or the poor:

    [Louis XIV] was a man who almost died of smallpox when he was 9 years old and lost nearly all of his legitimate heirs — his son, a grandson and a great-grandson — along with his younger brother, another grandson and a great-grandson, to smallpox. Eventually, he was succeeded by his second great-grandson, who became Louis XV and died (you guessed it) of smallpox.

    In America, smallpox is usually associated with the decimation of Native Americans, but Europeans were not immune to the disease. As late as the 18th century, for example, smallpox killed about 400,000 Europeans annually. The overall mortality rate was 20% to 60%. Among infants, it was more than 80% and was one of the reasons for the low overall life expectancy of 20 to 30 years. The disease was eradicated in 1980. Today, we don’t think of smallpox any more than we think of the bubonic plague, which, in five short years, killed almost one-third of all Europeans in the 14th century.

  3. Eating well is associated with high greenhouse gas emissions:

    After adjustment for energy intake, high-nutritional-quality diets had significantly higher greenhouse gas emissions (+9% and +22% for men and women, respectively) than did low-nutritional-quality diets.

  4. It seems that the extended ice age our ancestors survived was caused by several volcanic eruptions. This was only a few thousands years ago.
  5. In the UK, coal use has fallen to levels so low that it compares with pre-industrial levels.
  6. Though the population in the Western world is aging quickly, the number of people affected by dementia (e.g., Alzheimer’s) is falling. We do not know why.
  7. We are reportedly renaming genes to avoid the limitions of Microsoft Excel. (This should be an object of shame for Microsoft.)
  8. Mammals like human beings have a limited ability to recover from brain damage. In particular, we are not very good at producing new neurons. However, recent work shows that other cells in the brain can become neuron stem cells (neurons able to produce other neurons) in response to injury.

Performance tip: constructing many non-trivial objects is slow

I started programming professionally when Java came out and right about when C++ was the “hot new thing”. Following the then-current fashion, I looked down at C and Pascal programming.

I fully embraced object-oriented programming. Everything had to be represented as an object. It was the right thing to do. Like many young (and not so young) programmers, I suffered from cargo cult programming. There was a new fashion and its advocates were convincing. I became a staunch advocate myself.

When you program in C, allocating and de-allocating dynamic ressources is hard work. Thus you tend to give much consideration to how and when you are going to allocate ressources. More “modern” languages like Java and C++ have freed us from such considerations. Largely, the trend has continued with a few isolated exceptions.

Over time, I came to recognize one vicious performance anti-pattern that comes with object-oriented programming: the widespread use of small but non-trivial objects. Objects that do a non-trivial amount of work at the beginning or end of their life are “non-trivial”. For example, arrays and strings with sizes determined at runtime are non-trivial.

The net result is that in many instances, with a few lines of code, you can generate thousands or millions of tiny non-trivial objects. For example, in JavaScript, the JSON.parse function takes an input JSON string (a single string), and maps it into lots and lots of tiny objects. Another example is the programmer who loads a file, line by line, storing each line into a new string instance, and then splits the line into dynamically allocated substrings. These programming strategies are fine as long as they are not used in performance sensitive code.

Let us run a little experiment using a fast programming language (C++). Starting from a long random strings, I am going to randomly generate 100,000 small substrings, having a random length of less than 16 bytes using non-trivial objects (std::string). For comparison, I am going to create 100,000 identical small strings using trivial objects (C++17 std::string_view). A std::string_view is basically just a pointer to a memory region otherwise managed.

How quickly can I copy these strings? I published the source code of a benchmark.

I am expressing the speed in volume per second (GB/s) where the volume is given by the sum of all of the small strings. Your results will vary, but here is what I get under GNU GCC 9 (Linux) with an AMD Rome processor:

non-trivial objects (std::string) 0.8 GB/s
trivial objects (std::string_view) 15 GB/s

The std::string implementation is likely optimized for handling small strings. If you repeat the same experiment while storing your data in a std::vector instance, you may get catastrophic performance.

Though 0.8 GB/s may sound fast, keep in mind that it is the speed to do something trivial (copying the strings). If even the trivial parts of your software are slow, there is no hope that your whole system is going to be fast.

Furthermore, do not expect higher-level languages like Python or JavaScript to do better. The C++ performance is likely a sensible upper bound on how well the approach works.

Recommendation: In performance-sensitive code, you should avoid creating or copying thousands or millions of non-trivial objects.

Science and Technology links (August 1st 2020)

  1. In Japan, a large dam is being constructed almost entirely by robots.
  2. Naked mole rats are mammals that do not age in the sense that their fitness and mortality follows a flat curve, and not a Gompertz curve. Senescent cells are large dysfunctional cells that we tend to accumulate as we age. Researchers report that naked mole rats have few senescent cells. It may be because their senescent cells are less resilient than that of mice.
  3. Mice consuming a little bit of alcohol regularly live longer and less likely to become obese.
  4. Cable TV is dying and the COVID-19 pandemic does appear to be helping. AT&T lost 900,000 cable subscribers in the first quarter of 2020 while Comcast lost 477,000 cable subscribers in the second quarter.
  5. Yeast cells age in one of two distinct, mutually exclusive ways.
  6. Smallpox was killer:

    In America, smallpox is usually associated with the decimation of Native Americans, but Europeans were not immune to the disease. As late as the 18th century, for example, smallpox killed about 400,000 Europeans annually. The overall mortality rate was 20% to 60%. Among infants, it was more than 80% and was one of the reasons for the low overall life expectancy of 20 to 30 years.