Science and Technology links (April 20th 2019)

  1. Early-career setback cause a performance improvement among those who persevere. This is related to the observation that immigrants are four times more likely to become millionaires. In biology, that is called hormesis: by challenging your muscles, you get stronger; by exposing yourself to some radiations or starving a little, you live longer. So you should seek out challenges and expose your kids to difficulties.
  2. We have a significant bias in favor of tall men. Tall men get promoted more often and earn more money; they are also much more successful with women. However, heightism, like ageism, is considered an acceptable form of discrimination. It is fine to mock a man because he is small or old. It is not fine to mock a man for being gay, transgender or black.
  3. Canadians who finish high school, get a full time job and only have children within marriage have less than a one percent chance of being poor.
  4. Currently, a sizeable fraction of men go bald with age and there is relatively little that can be done. There is some good surgery, but it is expensive. Products like minoxidil work, but only so. A new product (clascoterone) has passed a Phase II clinical trial with good results. It seems quite safe and probably more effective than current drugs.
  5. A stem-cell therapy for knee arthritis got solid results during a clinical trial.
  6. It seems that you can bring back some brain function hours after death (in pigs).
  7. We are making progress against the “bubble boy” syndrome.
  8. For every 100 women who earn a bachelor’s degree from US colleges and universities there are 74 men.

Parsing short hexadecimal strings efficiently

It is common to represent binary data or numbers using the hexadecimal notation. Effectively, we use a base-16 representation where the first 10 digits are 0, 1, 2, 3, 5, 6, 7, 8, 9 and where the following digits are A, B, C, D, E, F, with the added complexity that we can use either lower or upper case (A or a).

We sometimes want to convert strings of hexadecimal characters into a numerical value. For simplicity, let us assume that we have sequences of four character. Each character is represented as a byte value using its corresponding ASCII code point. So ‘0’ becomes 48, ‘1’ is 49, ‘A’ is 65 and so forth.

The most efficient approach I have found is to simply rely on memoization. Build a 256-byte array where 48 (or ‘0’) is mapped to 0, 65 (or ‘A’) is mapped to 10 and so forth. As an extra feature, map all disallowed values to -1 so we can detect them. Then just lookup the four values and combine them.

uint32_t hex_to_u32_lookup(const uint8_t *src) {
  uint32_t v1 = digittoval[src[0]];
  uint32_t v2 = digittoval[src[1]];
  uint32_t v3 = digittoval[src[2]];
  uint32_t v4 = digittoval[src[3]];
  return v1 << 12 | v2 << 8 | v3 << 4 | v4;

What else could you do?

You could replace the table lookup with a fancy mathematical function:

uint32_t convertone(uint8_t c) {
  return (c & 0xF) + 9 * (c >> 6);

How do they compare? I implemented both of these and I find that the table lookup approach is more than twice as fast when the function is called frequently. I report the number of instructions and the number of cycles to parse 4-character sequences on a Skylake processor (code compiled with GNU GCC 8).

Instruction count Cycle count
lookup 18 4.3
math 38 9.6

I am still frustrated by the cost of this operation. Using 4 cycles to convert 4 characters to a number feels like too much of an expense.

My source code is available (run it under Linux).

Further reading: Fast hex number string to int by Johnny Lee; Using PEXT to convert from hexadecimal ASCII to number by Mula.

Science and Technology links (April 13th 2019)

  1. There is little evidence that digital screens are harmful to teenager’s mental health. If there is an effect, it is small.
  2. Cotton bags must be reused thousands of times before they match the environmental performance of plastic bags. Organic cotton bags are much worse than regular ones, requiring 20,000 reuse instead of only 7,000, due to the lower yield of organic farming. Cotton bags cannot be recycled. Paper bags must be reused dozens of times to have the same environmental impact as single-use plastic bags. For extra points, compute how many years you need to use an organic cotton bag, at a rate of two utilization a week, to use it 20,000 times. Why are we outlawing plastic bags, and not reusable organic cotton bags?
  3. I never understood the appeal of artificial-intelligence system that take very little input from human beings (self-taught software). Rich Sutton makes a powerful case for it:

    The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.

    To put it another way, our most powerful weapon for ‘smarter’ software is to design systems that get better as we add more computational power, and then to add the computational power.

    The net trend is to build software that looks more and more like ‘brute force’ at a high level, but with increasing sophistication in the computational substrate to provide the necessary brute force.

  4. Goldstein, Qvist and Pinker make a powerful case for nuclear power in the New York Times. Nuclear power is safe, clean, relatively inexpensive and environmentally friendly. Renewal energies are not the solution despite all the propaganda at the moment:

    Where will this gargantuan amount of carbon-free energy come from? The popular answer is renewables alone, but this is a fantasy. Wind and solar power are becoming cheaper, but they are not available around the clock, rain or shine, and batteries that could power entire cities for days or weeks show no sign of materializing any time soon. Today, renewables work only with fossil-fuel backup. Germany, which went all-in for renewables, has seen little reduction in carbon emissions.

  5. Human beings have better color perception than most other mammals.

    Humans, some primates, and some marsupials see an extended range of colors, but only by comparison with other mammals. Most non-mammalian vertebrate species distinguish different colors at least as well as humans, and many species of birds, fish, reptiles and amphibians, and some invertebrates, have more than three cone types and probably superior color vision to humans.

    So why would human beings have superior color vision compared to other mammals?

    A recent evolutionary account posits that trichromacy facilitates detecting subtle skin color changes to better distinguish important social states related to proceptivity, health, and emotion in others.

  6. As you age, your working memory degrades. A Nature article reports on how this can be reversed with electric brain stimulation.
  7. Genetically modified plants (GMOs) have reduced pesticide use by 37% while improving yields by 22%. Though no new technology is free from risk, neither lower yields nor higher pesticide use are free from risk.
  8. The poverty rate in China went from 34.5% of the population to 0.7% of the population between 2001 and 2015.

Why are unrolled loops faster?

A common optimization in software is to “unroll loops”. It is best explained with an example. Suppose that you want to compute the scalar product between two arrays:

  sum = 0;
  for (i = 0; i < length; i++)
    sum += x[i] * y[i];

An unrolled loop might look as follows:

  sum = 0;
  i = 0;
  if (length > 3)
    for (; i < length - 3; i += 4)
      sum += x[i] * y[i] + x[i + 1] * y[i + 1] +
             x[i + 2] * y[i + 2] + x[i + 3] * y[i + 3];
  for (; i < length; i++)
    sum += x[i] * y[i];

Mathematically, both pieces of code are equivalent. However, the unrolled version is often faster. In fact, many compilers will happily (and silently) unroll loops for you (though not always).

Unrolled loops are not always faster. They generate larger binaries. They require more instruction decoding. They use more memory and instruction cache. Many processors have optimizations specific to small tight loops: manual loop unrolling generating dozens of instructions within the loop tend to defeat these optimizations.

But why would unrolled loops be faster in the first place? One reason for their increased performance is that they lead to fewer instructions being executed.

Let us estimate the number of instructions that we need to be executed with each iteration of the simple (rolled) loop. We need to load two values into registers. We need to execute a multiplication. And then we need to add the product to the sum. That is a total of four instructions. Unless you are cheating (e.g., by using SIMD instructions), you cannot do better than four instructions.

How many instruction do we measure per iteration of the loop? Using a state-of-the-art compiler (GNU GCC 8), I get 7 instructions. Where do these 3 extra instructions come from? We have a loop counter which needs to be incremented. Then this loop counter must be compared with the end-of-loop condition, and finally there is a branch instruction. These three instructions are “inexpensive”. There is probably some instruction fusion happening and other clever optimizations. Nevertheless, these instructions are not free.

Let us grab the numbers on an Intel (Skylake) processor:

amount of unrolling instructions per pair cycles per pair
1 7 1.6
2 5.5 1.6
4 5 1.3
8 4.5 1.4
16 4.25 1.6

My source code is available.

The number of instructions executed diminishes progressively (going toward 4) as the overhead of the loop becomes smaller and smaller due to unrolling. However, the speed, as measured in number of cycles, does not keep on decreasing: the sweet spot is about 4 or 8 unrolling. In this instance, unrolling is mostly beneficial because of the reduced instruction overhead of the loop… but too much unrolling will eventually harm the processing.

There are other potential benefits of loop unrolling in more complicated instances. For example, some loaded values can be carried between loop iterations, thus saving load instructions. If there are branches within the loop, it may help or harm branch prediction to unroll. However, I find that a reduced number of instructions is often in the cards.

Science and Technology links (April 6th 2019)

  1. In a randomized trial where people reduced their caloric intake by 15% for two years, it was found that reducing calories slowed aging. This is well documented in animals, going all the way to worms and insects, but we now have some evidence that it applies to human being as well. Personnally I do not engage in either caloric restriction or fasting, but I am convinced it would be good for me to do so.
  2. What is the likely economic impact of climate change over the coming century? We do not know for sure. However, all estimates point to a modest impact, always significantly less than 10% of the size of the economy over a century while the world’s economy grows at about 3% a year.

    Clearly, 27 estimates are a thin basis for drawing definitive conclusions about the total welfare impacts of climate change. (…) it is unclear whether climate change will lead to a net welfare gain or loss. At the same time, however, despite the variety of methods used to estimate welfare impacts, researchers agree on the order of magnitude, with the welfare change caused by climate change being equivalent to the welfare change caused by an income change of a few percent. That is, these estimates suggest that a century of climate change is about as good/bad for welfare as a year of economic growth.

  3. Is the scientific establishment biased against women or not? Miller reports on new research showing that men tend to reject evidence of bias whereas women tend to reject contrary evidence.
  4. Technology greatly improved the productivity of farming. We are often told that the reason we did not see famines on a massive scale despite earlier predictions to that effect (e.g., by the Club of Rome) is due to the so-called Green Revolution. It is seems that this not well founded on facts:

    We argue a political myth of the Green Revolution focused on averted famine is not well grounded in evidence and thus has potential to mislead to the extent it guides thinking and action related to technological innovation. We recommend an alternative narrative: The Green Evolution, in which sustainable improvements in agricultural productivity did not necessarily avert a global famine, but nonetheless profoundly shaped the modern world.

  5. Sugar does not give your mood a boost. We do not feel more energetic after eating sugar.
  6. Though e-cigarettes are probably incomparably safer than actual cigarettes, people have been banning them on the ground that e-cigarettes might be a gateway toward cigarettes. They are likely wrong. If anything, e-cigarettes are probably a solution for people who have not managed to stop smoking by other means. They have been found to be a highly effective way to stop smoking. Thus e-cigarettes are likely saving lifes; people who ban e-cigarettes despite the evidence should have to answer for the consequences of their choices.
  7. People who think that little boys are more physically aggressive than little girls because of how they are raised are likely wrong.
  8. I am impressed with the courage of these researchers: Oral sex is associated with reduced incidence of recurrent miscarriage (Journal of Reproductive Immunology, 2019).

Science and Technology links (March 30th 2019)

  1. As we age, we accumulate old and useless (senescent) cells. These cells should die, but they do not. Palmer et al. removed senescent cells in obese mice. They found that these mice were less diabetic and just generally healthier. That is, it appears that many of the health problems due to obesity might have to do with the accumulation of senescent cells.
  2. Europe is changing its copyright laws to force websites to be legally responsible for the content that users upload. In my opinion, copyright laws tend to restrict innovation. I also think that Europe is generally not interesting in innovating: where is Europe’s Google or Europe’s Samsung?
  3. China is cloning police dogs.
  4. Do we create new neurons throughout life, or not? It remains a controversial question, but a recent article in Nature seems to indicate that neurogenesis in adult human beings is tangible:

    By combining human brain samples obtained under tightly controlled conditions and state-of-the-art tissue processing methods, we identified thousands of immature neurons in (…) neurologically healthy human subjects up to the ninth decade of life. These neurons exhibited variable degrees of maturation (…) In sharp contrast, the number and maturation of these neurons progressively declined as Alzheimer’s Disease advanced.

  5. Generally speaking, the overall evidence is that fit and healty people tend to be smarter. It is a myth unsupported by science that the gym rat is dumb whereas the pale out-of-shape guy is smart.If you want to be smart, you better stay fit and healthy. Evidently, this suggests that as you age, you may become lose some of your intellectual sharpness.Cornelis et al. processed a large dataset of cognitive tests and they conclude that you are not losing your intelligence very much, at least until you reach a typical retirement age:

    declines in cognitive abilities between the end of the fourth decade and age 65 are small.

    In their experiments, fluid intelligence (basically our reasoning ability) did not change very much and sometimes increased over time. This apparently contradict other studies based on smaller samples, and the authors discuss this apparent contradiction. Reaction time increased with age: older people are slower, everything else being equal.

Java is not a safe language

The prime directive in programming is to write correct code. Some programming languages make it easy to achieve this objective. We can qualify these languages as ‘safe’.

If you write in C++ without good tools, you are definitively in the ‘unsafe’ camp. The people working on the Rust programming language are trying to build a ‘safe language’.

Where does Java lie?

Back when Java was still emerging, I had been tasked with building a new image compression library. I designed a dual Java/C++ library. My client was a company providing medical services, but they had no use for the Java code. To this day, I think that they only use the C++ code.

When I tried to sell them a license to the Java code, I stressed that Java was safer, had automatic memory management and the like. Their top engineer looked at my Java code and spotted a potential memory leak. Yes, Java has memory leaks. You may have been told that it does not happen, but it happens all the time in real systems. We had a beer and a good laugh about it. Meanwhile, he had been able to prove that my C++ code was safe and did not have memory leaks.

In any case, most people would agree that Java is ‘safer’ than C++, but as my story illustrates, it is more of a statistical statement than a black-and-white one.

Is Java a safe language in 2019?

It is a time-dependent culturally-loaded question, but I do not think of Java as a safe language today. If ‘safety’ is your primary concern, then you have better options.

Let me review some examples:

  1. Java does not trap overflows. That is, if you are trying to count how many human beings there are on Earth using a Java ‘int’, incrementing the counter by one each time, the counter will overflow silently and give you a nonsensical result. Languages like Rust and Swift catch overflow. The Java standard library has some functions to guard against overflows, but they are not part of the language. As a related issue, Java promotes and convert types silently and implicitly. Can you guess what the following code will print out?
    short x = Short.MAX_VALUE;
    short y = 2;
    int ix = Integer.MAX_VALUE;
    int iy = 2;

    This type of behaviour leads to hard-to-catch bugs.

  2. Java allows data races, that is, it is possible in Java to have several threads accessing the same object in memory at the same ‘time’ with one thread writing to the memory location. Languages like Rust do not allow data races. Almost anyone who has programmed non-trivial Java programs has caused or had to debug a data race. It is a real problem.
  3. Java lacks null safety. When a function receives an object, this object might be null. That is, if you see ‘String s’ in your code, you often have no way of knowing whether ‘s’ contains an actually String unless you check at runtime. Can you guess whether programmers always check? They do not, of course, In practice, mission-critical software does crash without warning due to null values. We have two decades of examples. In Swift or Kotlin, you have safe calls or optionals as part of the language. Starting with Java 8, you have Optional objects in the standard library, but they are an afterthought.
  4. Java lacks named arguments. Given a function that takes two integer values, you have to write ‘f(1,2)’. But is it instead ‘f(2,1)’? How do you know that you got the parameters in the right order? Getting confused in the argument order is a cause of hard-to-debug problems. Many modern programming languages have named arguments.

Ultimately, I believe that while some programming languages make it easier to produce correct code than others, much of it comes down to good engineering practices. I would never go as far as saying that programming languages do not matter, but I bet that ‘who’ writes the software is a lot more important.

Hasty comparison: Skylark (ARM) versus Skylake (Intel)

In a previous post, I ran a benchmark on an ARM server and again on an Intel-based server.  My purpose was to indicate that if one function is faster, even much faster, on one processor, you are not allowed to assume that it will also be faster on a vastly different processor. It wasn’t meant to be a deep statement, but even simple facts need illustration. Nevertheless, it was interpreted as an ARM versus Intel comparison.

In the initial numbers that I offered, the ARM Skylark processor that I am using did very poorly compared to the Intel Skylake processor. Eric Wallace explained away the result:  The default compiler on my Linux CentOS machine appears to be unaware of my processor architecture (ARM Aarch64) and, incredibly enough, compiles the code down to 32-bit ARM instructions.

So let us get serious and use a recent compiler (GNU GCC 8) from now on.

And while we are at it, let us do a Skylark versus Skylake, ARM versus Intel, benchmark. I am going to pick three existing C programs from the Computer Language Benchmark Games:

  1. Binarytree is a memory access benchmark. The code constructs binary trees and must traverse them.
  2. Mandelbrot is a number crunching benchmark.
  3. Fasta is a randomized string generation benchmark.

The Skylark processor is from a 32-core box, the reported maximum frequency is 3.3GHz. The Skylake processor is from a 4-core box with a maximal frequency of 4GHz. Here are the numbers I get.

Skylark (ARM) Skylake (Intel)
Binarytree 80 s 16 s
Mandelbrot 15 s 24 s
Fasta 2.0 s 0.8 s

My benchmark is available.

What can we conclude from these numbers? Nothing except maybe that the Skylark box struggles with Binarytree. That benchmark is dominated by the cost of memory allocation/deallocation.

Let me try another benchmark, this time from the cbitset library:

Skylark (ARM) Skylake (Intel)
create 23 ms 4.0 ms
bitset_count 3.2 ms 4.4 ms
iterate 5.0 ms 4.0 ms

The “create” benchmark is basically a memory-intensive test, whereas the two other tests are computational. Again, it seems that ARM server struggles with memory allocations.

Is that something that has to do with the processor or the memory subsystem? Or is it a matter of compiler and standard libraries?

Update: Though the ARM processor has a relatively CentOS distribution, it comes with an older C library. Early testing seems to suggest that this software difference accounts for a sizeable fraction (though not all) of the performance gap between Skylake and Skylark.

Update 2: Using ‘jemalloc’, the ‘Binarytree’ goes from 80 s to 44 s while the ‘create’ timing goes from 23 ms to 13 ms. This gives me confidence that some of the performance gap reported about between Skylake and Skylark is due to software differences.

Technological aging

We are all familiar with biological aging. Roughly speaking, it is the loss of fitness that most animals undergo with time. At the present time, there is simply not much you can do against biological aging. You are just not going to win any gold medals in the Olympics at age 65.

However, not all “aging” in human beings in biological.

There is what I would call “chronological aging”: the trivial fact that, with each passing day, you have been alive one more day. While biological aging might be reversed one day, it is a logical certainty that no amount of technology, except maybe time travel, can reverse chronological aging. Interestingly enough, technology could affect (but not reverse) chronological aging: if you go for a trip at the speed of the light, your chronological aging will be slowed compared to the people you leave behind.

However, much of “aging” is actually social. For example, my children make fun of me since I cannot skateboard. That is true, I never learned to skateboard. However, I am quite convinced that I could learn. I might even like it. However, I am concerned about what people might think if I show up to a skate park with my skateboard.

More interesting to me is “technological aging”. It is the idea that with chronological age, people tend to fail to adopt new technologies up until the point where it becomes too hard for them to catch up.

It goes like this:

  1. You are up-to-date technologically in your teens and twenties.
  2. Some new technology is developed when you are in your thirties or beyond. Maybe its ebooks, ecommerce or Facebook.
  3. At first, this new technology is not very good or it is simply reasonable to consider it with suspicion. So you give it a pass. You choose not to adopt it. In any case, you are doing well with the technology you know.
  4. More new technologies come along, some of them build on the technology you did not adopt. It becomes increasingly tempting to give it a pass. Not only are you missing some of the foundations, but it is new and can be viewed with suspicion.
  5. Finally, after a few decades, you are disconnected technologically, incapable of keeping up.

Observe that both technological aging is somewhat independent from biological aging. That is, imagine a society where we can rejuvenate anyone. So you reach the biological age of 30, and you are stuck there. You never have grey hair. Your skin remains youthful. You would still be able to tell how old someone is just by which technologies they choose to use.

Technological aging is not unique. There is a related concept which I call “cultural aging”. For example, we tend to prefer the music that came out when we were in our teens. I believe that the same effects are at play. New music or new styles come along, but you don’t embrace them because you already have your favorite music. Over time, you become increasingly disconnected.

In any case, the great thing about technological aging is that, unless biological aging, I believe that it is largely reversible. You can adopt ebooks even if you are in your 60s. You can drop cable TV in favour of the Internet. You can stop defending the lecture as a mode of instruction and embrace YouTube and podcasts. However, it takes deliberate effort.

Science and Technology links (March 23rd 2019)

  1. Half of American households subscribe to “Amazon Prime”, a “club membership” for Amazon customers with monthly fees. And about half of these subscribes buy something from Amazon every week. If you are counting, this seems to imply that at least a quarter of all American households order something from Amazon every week.
  2. How do the preprints that researchers post online freely differ from genuine published articles that underwent peer review? Maybe less than you’d expect:

    our results show that quality of reporting in preprints in the life sciences is within a similar range as that of peer-reviewed articles

  3. Very low meat consumption might increase the long-term risk of dementia and Alzheimer’s.
  4. We appear to be no closer to find a cure for Alzheimer’s despite billions being spent each year in research and clinical trials. Lower writes:

    Something is wrong with the way we’re thinking about Alzheimer’s (…) It’s been wrong for a long time and that’s been clear for a long time. Do something else.

  5. Many researchers use “p values” (a statistical measure) to prove that their results are “significant”. Ioannidis argues that most research should not rely on p values.
  6. Eating nuts improves cognition (nuts make you smart).
  7. As we age, we become more prone to diabetes. According to an article in Nature, senescent cells in the immune system may lead to diabetes. Senescent cells that are cells that should be dead due to damage or too many divisions, but they refuse to die.
  8. Hospitalizations for heart attacks have declined by 38% in the last 20 years and mortality is at all time low. Though clinicians and health professionals take the credit, I am not convinced we understand the source of this progress.
  9. In stories, females identify more strongly with their own gender whereas males identify equally with either gender.
  10. Theranos was a large company that pretended to be able to do better blood tests. The company was backed by several granted patents. Yet we know that Theranos technology did not work. The problem we are facing now is that Theranos patents, granted on false pretenses and vague claims, remain valid and will hurt genuine inventors in the future. If we are to have patents at all, they should only be granted for inventions that work. Nazer argues that the patent system is broken.
  11. Smaller groups tend to create more innovative work, and larger groups less so.
  12. The bones of older people become fragile. A leading cause of this problem is the fact stem cells in our bones become less active. It appears that this is caused by excessive inflammation. We can create it in young mice by exposing them to the blood serum of old mice. We can also reverse it in old mice by using an anti-inflammatory drug (akin to aspirin).
  13. Gene therapy helped mice regain sight lost due to retinal degeneration. It could work in human beings too.
  14. Based on ecological models, scientists predicted over ten years ago that polar bear populations would soon collapse. That has not happened: there may be several times more polar bears than decades ago. It is true that ice coverage is lower than it has been historically due to climate change, but it is apparently incorrect to assume that polar bears need thick ice; they may in fact thrive when the ice is thin and the summers are long. Crowford, a zoologist and professor at the University of Victory tells the tale in her book The Polar Bear Catastrophe That Never Happened.