Science and Technology links (July 21st, 2017)

Want proof that you live in the future? Ok. There is this “cryptocurrency” called ethereum and it is causing a shortage of microprocessors:

Demand from Ethereum miners has created temporary shortages of some of the graphics cards, according to analysts, who cite sold-out products at online retailers. Estimates of additional sales from this demand run as high as $875 million, according to RBC Capital Markets analyst Mitch Steves. That would roughly equal AMD’s total sales from graphics chips last year, or half of Nvidia’s quarterly sales of those components.

This is all very strange.

It is not exactly known why hair turns gray as well age, but it is largely reported as an irreversible process linked with cells dying out. Yet time and time again, there are anecdotes of graying reversal. The latest one was published in a reputable journal (JAMA Dermatology) that even tweeted a picture. In that report, 14 cancer patients have seen their hair pigmentation come back. That offers a powerful hint that we could reverse gray hair with the right therapy. Obviously, there are cheap and obvious ways to turn your hair any color you like at any age… but such reports remind us that there is much we do not understand yet.

Intel’s latest chip, the Core i9 X-series, can produce one teraflop of computing performance for about $2000. If you have 2 billion dollars, you can theoretically buy a million of these chips and produce the first exascale supercomputer. Of course, you’ll also cause a massive power shortage in your neighborhood if you ever turn the thing on.

Jeff Bezos, the president of Amazon, is 53, and so he was in his early 30s when he started out his business. A picture of him offering a side-by-side comparison, 20 years ago and today, has been widely distributed. I would not contradict the current Jeff Bezos: he looks like he could break me in half.

Our brains are poor at repairing themselves. There is such a thing as in vivo neuroregeneration, but it is not widespread in the human body. Researchers have found that using by using the right electrical field, they could entice stem cells to relocate where repairs are needed and then differentiate appropriately.

We all know that we inherit our genes from our parents. Then our cells turn on or off those genes through a set of poorly understood techniques called epigenetics. This is necessary if only for cell differentiation: the cells from your brain have the same genes as the cells from your toes, but they express different genes. The older version of you has the same genes as the younger version, but the older you express more genes. It is believed that lifestyle can affect genes. If you starve all your life or exercise intensively, you will express different genes. But can this program of gene expression be passed on to your children? It seems that you can, at least in some specific ways. A recent article in Science makes a case for it:

Parents provide genetic information that guides the development of the offspring. Zenk et al. show that epigenetic information, in the form of the repressive mark H3K27me3, is also propagated to the offspring and regulates proper gene expression in the embryo. Preventing the propagation of maternally inherited H3K27me3 led to precocious gene activation and, ultimately, embryo lethality.

In the early days of the XIXth century, there was debate as to how species evolved. How did the giraffes get long necks? The commonly accepted view is that of Darwin: giraffes with longer necks tended to survive longer and to have more offsprings so that over time, giraffes acquired longer and longer necks, one generation at a time. There were theories that predate Darwinism, one of them by Jean-Baptiste Lamarck. Lamarck believed in soft inheritance. For example, he would believe that if your parents are body builders, you would inherit larger muscles. Lamarck’s view is now discredited, but if epigenetic markers can be passed on to offsprings, then we would be forced to conclude that he was partly right. If you follow the logic of the Science article, it is conceivable that in a society of bodybuilders, kids could receive epigenetic markers that enhance muscle growth. I should point out that even if epigenetic markers are passed on, this does not put into question Darwinism: at best, Darwinism is an incomplete theory.

What is “modern” programming?

As a young teenager, I dabbled with basic and some assembly. Things got serious when I learned Turbo Pascal. “Now we are talking”, I thought. Turbo Pascal offered one of the earliest examples of Integrated Development Environment (IDE). In effect, an IDE is a program that lets you conveniently write, compile, debug and run code, all within a friendly environment. Turbo Pascal did not have much in the way of graphics (it was text-based), but it had menus and windows. You could enter a debugging mode, track the value of variables, and so forth.

Then I moved on to Delphi (a graphical Turbo Pascal) and it had a superb IDE that would still look good today. I played with Visual Basic, designing a “talking clock” that I published on Bulletin Board Systems at the time (with Windows 3.1). Then I found out about Visual Studio… For years, my reference for C++ programming was Visual Studio. So it was all IDEs, all the time.

Smalltalk famously had powerful graphical IDEs back in the early 1980s.

My point is that using an IDE is not “modern”. The present is very much like the past. What we program has changed, but, in many instances, how we program has not changed. I have the latest Visual Studio on my Dell laptop. The man I was 20 years ago would be perfectly at ease with it. Debugging, code completion, code execution… it is much like it was. In fact, Visual Studio was never very different from Turbo Pascal. And I find this deeply depressing. I think we should make much faster progress than we are.

I submit to you that modern programming has little to do with the look of your desktop. Graphical user interfaces are only skin deep. Modern programming techniques are all about processes and actual tools, not the skin on top of them. I don’t care whether you are using Eclipse or Emacs… this tells me nothing about how modern you are.

So what is “modern”?

  • Coding is social. Twenty years ago, it was sensible to require everyone in your organization to use the exact same IDE and to depend uniquely on your IDE to build, test, deploy code… But there are lots of smart people outside of your organization and they often do not use your IDE. And today you can reach them. This means that you must be wise regarding the tools and processes you adopt.

    If you mock people who program using the Atom text editor, Visual Studio or Emacs, you are not being social. You need to be as inclusive as possible, or pay the price.

  • The Go language comes with its own formatting tool. I don’t care whether you reformat your code automagically as you save, or whether click a button, or whether you type go fmt, it is all the same… and it is definitively a great, modern idea. It is progress. All programming languages should force upon the users a unique code format. No more bikeshedding.

    And so we are clear, Java had guidelines, but guidelines are insufficient. We need a tool that takes the code as an input and generates a uniquely defined output where everything is dealt with, from line length to spaces.

    The goals are that there is never any possible argument as to how the code should be formatted and that the correct format is produced without effort. I cannot tell you how important that is.

  • Programming languages like Rust, Go, Swift… come with their own package management system. So, in Swift, for example, I can create a small text file called Package.swift and put it at the root of my project, where I declare my dependencies…
    import PackageDescription
    
    let package = Package(
        name: "SwiftBitsetBenchmark",
        dependencies: [
       .Package(url: "https://github.com/lemire/SwiftBitset.git",  
              majorVersion: 0),
       .Package(url: "https://github.com/lemire/Swimsuit.git",  
              majorVersion: 0)
        ]
    )
    

    (Source example.)

    Then I can type swift build and the software will automatically grab the dependencies and build my program. And this works everywhere Swift runs. It does not matter which text editor or IDE you are using.

    You don’t want to use text editor, and you prefer to use a graphical interface? Fine. It makes no difference.

    Why is that modern? Because automatically resolving dependencies with so little effort would have looked like magic to me 20 years ago. And it is immensely important to resolve dependencies automatically and systematically. I do not want to ever have to manually install and deploy a dependency. I want other people to be able to add my library to their project in seconds, not minutes or hours.

    Yes, you can add it to existing languages (e.g., as Maven or IDEs do with Java), but there needs to be a unique approach that just works.

  • Programming languages like Go, Swift and Rust support unit testing from the start. In Go, for example, create a file myproject_test.go and add functions like func TestMyStuff(t *testing.T), then type go test and that is all. Twenty years ago, hardly anyone tested their code, today it is an absolute requirement and it needs to be done in a unique manner so you can move from project to project and always know how the tests are done.

    If I cannot spot sane unit tests in your code right away, I will assume that your code is badly broken.

  • Continuous integration: as code changes, you want a remote tool to grab the new code and test it… so that a regression can be stopped early. It is not enough that people can run tests on your code, they also need to see the results of automated testing and check the eventual failures for themselves.

    Continuous integration is part of a larger scheme: you must automate like crazy when you program. Manual labor should be minimized. And sometimes that means that you really ought to only click on a button, but what it should never mean is that you repeatedly need to follow a complicated sequence of commands, whether through a graphical user interface or through a command shell.

  • Version control. Twenty years ago, it made sense to write your code on your desktop and send the new code (as patches) by email. But this only makes sense when the pace of collaboration is slow. Today, this would be insane. Anything less than Git is backward. Note that even Microsoft builds Windows using Git today.

So what happens when you work with smart students who never learned about modern programming? They look at a command like go get and they only see the skin (a command line). They think it is backward. Where are the flashy graphics?

They work within a nice-looking IDE like Visual Studio or Eclipse and they are convinced that they are “modern”, totally oblivious to the fact that IDEs go back decades. And then instead of using the IDE for its strengths, such as better affordance and faster operations, and adopting modern programming techniques elsewhere… they stick with old-school programming:

  • No test. At least, no automated and systematic test.
  • Hard dependencies on a specific setup.
  • No automation. No continuous integration. No automated deployment.

They are programming just like I did when I started out decades ago with Turbo Pascal. It is very much old school, despite the graphical user interfaces.

Science and Technology links (July 14th, 2017)

PC shipments are at the lowest level of the last 10 years, and they have been declining for the last two years.

Using smartphone data, researchers are able for the first in history to measure objectively how active people are. Lots of prior research relied on questionnaires, but self-reported numbers are often much less objective than direct measures. A recent paper in Nature reports on their findings based on this better methodology. I find it interesting that people in Asia are often very active, followed by Europe… while people in America are much less active. The researchers also report that there is a lot of inequality within countries as to how active are, especially among the female population.

We need some perspective: I think that activity data is only the start. In the near future, we might be able to monitor the health of millions of people, and we might arrive at major breakthroughs by analyzing the resulting data.

Affordable, consumer-grade virtual reality (VR) started out about a year ago in 2016. The initial hardware releases were better than I was hoping. However, the software is lacking so the overall uptake is much weaker than what I was hoping for. I think we are in a chicken-or-the-egg dilemma, with software developers being unwilling to invest massively due to the lack of hardware penetration, and consumers are unwilling to invest in hardware due to the lack of software. It is not hopeless, however. The hardware is good, but it is complicated and still relatively expensive. This week, Facebook dropped the price of the Oculus Rift and its controller to US$400. You still need a gamer PC, which limits the market somewhat despite the price drop. Regarding the software, I am told that a huge market for the likes of the Oculus Rift is porn and it is booming market. Sadly, I don’t have nearly as much experience with VR porn as I should. Still, if games can’t drive VR, I am sure porn can. Bloomberg predicts that Facebook might release a $200 wireless headset in 2018, and we know that HTC Vive will release a wireless headset of its own. However, we don’t just need “wireless”, we also need “standalone”. It would be fine for high-quality headsets to come with a base of some sort, but the need to purchase, maintain and upgrade a Windows PC is too much. Not everyone is a gamer or a porn consumer. Another key point that I missed in the first year of the current VR revolution is the resolution issue. The current high-end headsets (HTC Vive and Oculus Rift) are quite good at giving you the illusion that you are in a virtual world… but because of the relatively low resolution of the headset, you are embodied in a character that has weak eyesight. This is apparent if you try to use VR as a virtual office: reading a document in VR is a painful experience… you have to bring the document in your face to be able to read comfortably. In the PlayStation VR headset, you can play non-VR games, but the experience is disappointing because of the low resolution. The resolution must be much higher than it is currently. You should be able to look at a TV screen in VR, and get a 1080p experience. This means that the headset itself must have a resolution much, much higher than 1080p. Or we need some kind of technological tricks.

The American FDA is about to approve what amounts to an anti-cancer gene therapy.

Apparently, President Trump has been complimenting the French first lady’s body. She is 64.

There is no computer that comes close to reaching one exaflop. It is often believed that if we had such a computer, we could run a full human brain simulation and, possibly, achieve human-like intelligence through a digital computer. Currently, the most powerful computers are all in China. The US government is apparently trying to stir things up by throwing money at the cause. The difficulty in reaching one exaflop with our current technology is power usage. That is, we could probably build right now a one exaflop computer, but we could not continuously power it up. To get some idea… a medium-size power station might generate something like 500 MegaWatts. Currently, you can buy graphics cards that generate a teraflop for 500 Watts. An exaflop is a million teraflops, so you’d need a million of these powerful cards, or 500 MegaWatts. I might be off by an order of magnitude, but the point is that you’d need a power station just to power your exaflop computer. Hardware is only as useful as the software that runs on it… and if you need a whole power station to keep the hardware running, chances are that you are not going to be given a lot of freedom in running software experiments. So a one exaflop computer that costs a fortune to run is probably nearly useless. For all of the money invested in massive computers, I cannot recall any discovery or breakthrough that followed.

Pruning spaces faster on ARM processors with Vector Table Lookups

Last week, I asked how fast one could remove spaces from a string using ARM processors. On a particular benchmark, I got 2.4 cycles per byte using regular (scalar) code and as little as 1.8 cycles per byte using ARM NEON instructions. These are “vectorized instructions” that you find in virtually all ARM processors. Vectorized instructions operate over wide registers (spanning at least 16 bytes), often executing the same operation (such as addition or multiplication) over several values at once. However, my trick using ARM NEON instructions relied on the fact that my input stream would contain few spaces. So it was not a very positive blog post for ARM processors.

But then I got feedback from several experts such as Martins Mozeiko, Cyril Lashkevich and Derek Ledbetter. This feedback made me realize that I had grossly underestimated the power of ARM NEON instructions. One reason for my mistake is that I had been looking at older ARM NEON instructions instead of the current AArch64 instructions, which are much more powerful.

To recap, on an x64 processor, you can remove spaces from strings very quickly using vectorized instructions in the following manner:

  • Compare 16 bytes of input characters with white space characters to determine where (if anywhere) there are white space characters.
  • The result of the comparison is itself a 16-byte register, where matching characters have the byte value 255 whereas non-matching characters have the byte value 0. Turn this vector register to a 16-bit integer value by “downsampling” the bits. This can be achieved by a “movemask” instruction present in all x64 processors since the introduction of the Pentium 4 a long time ago.
  • From this mask, compute the number of white space characters by counting the 1s. This can be done with the popcnt instruction.
  • From this mask also, load up a “shuffling register” that tells you how to reorder the bytes so that white space characters are omitted. Then use what Intel and AMD call a “shuffling instruction” (pshufb introduced with the SSSE3 instruction set many years ago) to quickly reorder the bytes.

I thought that the same could not be done with ARM NEON, but I was wrong. If you have access to recent AMD processors (supporting AArch64), then you can closely mimic the x64 processors and get good performance.

Let us review the various components.

To start, we can quickly compare 16 byte values with the byte value 33 to quickly identify common white space characters such as the space, the line ending, the carriage return and so forth.

uint8x16_t is_nonwhite(uint8x16_t data) {
  return vcgeq_u8(data, vdupq_n_u8(' '+1));
}

ARM NEON has convenient “reduce” instructions, so I can sum up the values of a vector. I can put this to go use to quickly compute how many matching characters I have:

uint8_t bytepopcount(uint8x16_t v) {
  return vaddvq_u8(vshrq_n_u8(v,7));
}

To compute a 16-bit mask, I also use such a reduce function after computing the bitwise AND of my comparison with some convenient vector (which allows me to distinguish which characters match)…

uint16_t neonmovemask_addv(uint8x16_t input8) {
  uint16x8_t input = vreinterpretq_u16_u8(input8);
  const uint16x8_t bitmask = { 0x0101 , 0x0202, 0x0404, 0x0808, 0x1010, 0x2020, 0x4040, 0x8080 };
  uint16x8_t minput = vandq_u16(input, bitmask);
  return vaddvq_u16(minput);
}

Finally, I call a Vector Table Lookup instruction which is pretty much equivalent to Intel/AMD’s shuffle instruction:

int mask16bits =  neonmovemask_addv(data);
uint8x16_t shuf = vld1q_u8(shufmask + 16 * mask16bits);
uint8x16_t reshuf = vqtbl1q_u8(data,shuf);

Of course, I am not explaining everything in detail. My full source code is available. All you need is access to a recent ARM processor with Linux running on it, and you are all set to run it.

It turns out that we can double my previous best score:

scalar 1.40 ns
NEON (old code) 0.92 ns
NEON (Vector Table Lookup) 0.52 ns

What is better is that my new code is effectively branchless: its performance is not very sensitive to the input data.

Using the fact that I know the clock speed of my processor, I can make a quick comparison in terms of CPU cycles per input byte…

scalar ARM recent x64
scalar 2.4 cycles 1.2 cycles
vectorized (NEON AArch64 and SSSE3) 0.88 cycles 0.25 cycles

(The source code for x64 processors is available on GitHub.)

What is interesting is that we are getting under one cycle per input byte which is a kind of performance that is difficult to achieve with scalar code that writes byte values one by one. It is still the case that the ARM NEON code is over three times slower than the equivalent on x64 processors, but I am using a relatively weak core (A57 on a Softiron Overdrive 1000) and my code might be subject to further optimization.

Science and Technology links (July 7th, 2017)

People magazine recently named Julia Roberts, who is 49, as the World’s Most Beautiful Woman.

Volvo plans to commercialize self-driving cars in 2020, and all electric by 2019. France will ban petrol cars in 2040.

The Fermi paradox is the idea that we ought to see intelligent life in the universe, since it is so vast… yet we have no evidence for it. Sandberg et al. claims that there is no paradox because the probability of life is simply too small: we might be a unique or nearly unique case.

According to an article in Nature, caffeine helps to fight obesity in mice.

The New York Times has an article about how tech companies have successfully lobbied schools to include computer science in their curriculum. I have mixed feelings about this entire story. I think we should resist the temptation to think that because learning to program can be highly beneficial for some, then many people should learn it. It is just not true that we will have millions of programmers in 20 years. Programming is and will remain a specialized activity.

There is a lot of talk about cancer vaccines, where your immune system is geared up to fight the specific kind of cancer you have. It is worth repeating that we are nowhere near a cure for cancer.
Scientists claim to have cured Alzheimer’s (in mice): “The drug completely erased evidence of Alzheimer’s synapse damage and memory loss in mouse models of the disease.”

Concerned with the poor quality of modern-day science, Vazire writes that “the drive for eminence is inherently at odds with scientific values”.

Are your strings immutable?

A value is immutable if it cannot change.

Immutability is a distinct notion than that of a constant. The speed of light in a vacuum is believed to be a universal constant, for example. Constants are immutable in the sense that they cannot change. However, immutability refers to values, not to the assignment of values. For example, the number 3 is immutable. However, if I say that your rank is 3, this rank could change. That’s because your rank is a variable, and variables may change their values even if these values are immutable.

That is, a variable may change its value to point to a different immutable value. That’s a somewhat confusing point for non-programmers. For example, my name is “Daniel”. To say that strings are immutable is to say that I cannot change the string “Daniel”. However, I can certainly go see the government and have my first name changed so that it is “Jack”. Yet this change does not modify the string “Daniel”. If I could change the string “Daniel” then, possibly, all individuals named “Daniel” would see their name changed.

So in the world around us, values are typically immutable. And that’s largely why it is believed that immutable values are safer and easier.

Working with mutable values requires more experience and more care. For example, not only does changing the string “Daniel” affect all people named “Daniel”, but what if two people try to change the string at the same time?

So integer values are always immutable, not only in real life but also in software. There is no programming language where you can redefine the value “3” to be equal to “5”.

Yet I believe that most programming languages in widespread use have mutable arrays. That is, once you have created an array of values, you can always change any one of the entries. Why is that? Because immutability could get costly as any change to an immutable array would need to be implemented as a copy.

Arguably, the most important non-numeric type in software is the string. A string can be viewed as an array of characters so it would not be unreasonable to make it mutable, but strings are also viewed as primitive values (e.g., we don’t think of “Daniel” as an array of 6 characters). Consequently, some languages have immutable strings, others have mutable strings. Do you know whether the strings in your favorite language are mutable?

  • In Java, C#, JavaScript, Python and Go, strings are immutable. Furthermore, Java, C#, JavaScript and Go have the notion of a constant: a “variable” that cannot be reassigned. (I am unsure how well constants are implemented and supported in JavaScript, however.)
  • In Ruby and PHP, strings are mutable.
  • The C language does not really have string objects per se. However, we commonly represent strings as a pointer char *. In general, C strings are mutable. The C++ language has its own string class. It is mutable.

    In both C and C++, string constants (declared with the const qualifier) are immutable, but you can easily “cast away” the const qualifier, so the immutability is weakly enforced.

  • In Swift, strings are mutable.

    However, if you declare a string to be a constant (keyword let), then it is immutable.

Pruning spaces from strings quickly on ARM processors

Suppose that I give you a relatively long string and you want to remove all spaces from it. In ASCII, we can define spaces as the space character (‘ ‘), and the line ending characters (‘\r’ and ‘\n’). I am mostly interested in algorithmic and performance issues, so we can simplify the problem by removing all byte values less or equal to 32.

In a previous post where I asked how quickly we could prune spaces, the best answer involved vectorization using 128-bit registers (SSSE3). It ends up being between 5 and 10 times faster than the naive approach.

Conveniently enough, ARM processors all have 128-bit vector registers, just like x64 processors. So can we make ARM processors go as fast as x64 processors?

Let us first consider a fast scalar implementation:

size_t i = 0, pos = 0;
while (i < howmany) {
    char c = bytes[i++];
    bytes[pos] = c;
    pos += (c > 32 ? 1 : 0);
}

This prunes all character values less or equal to 32, writing back the data in-place. It is very fast.

Can we do better with vector instructions? Vector instructions are instructions supported by virtually all modern processors that operate over wide registers (16 bytes or more).

On x64 processors, the winning strategy is to grab 16 bytes of data, quickly compare against white space characters, then extract a mask (or bitset) value made of 16 bits, one bit per character, where each bit indicates whether the value found is a white space. The construction of such a bitset is cheap on an x64 processor, as there is a dedicated instruction (movemask). There is no such instruction on ARM processors. You can emulate movemask using several instructions.

So we cannot proceed as we did on x64 processors. What can we do?

Just like with SSSE3, we can quickly check whether byte values are less or equal to 32, thus identifying white space characters:

static inline uint8x16_t is_white(uint8x16_t data) {
  const uint8x16_t wchar = vdupq_n_u8(' ');
  uint8x16_t isw = vcleq_u8(data, wchar);
  return isw;
}

Next we can quickly check whether any of the 16 characters is a white space, by using about two instructions:

static inline uint64_t is_not_zero(uint8x16_t v) {
  uint64x2_t v64 = vreinterpretq_u64_u8(v);
  uint32x2_t v32 = vqmovn_u64(v64);
  uint64x1_t result = vreinterpret_u64_u32(v32);
  return result[0];
}

This suggests a useful strategy. Instead of comparing characters one by one, compare 16 characters at once. If none of them is a white space character, just copy the 16 characters back to the input and move on. Otherwise, we fall back on the slow scalar approach, with the added benefit that we do not need to repeat the comparison:

uint8x16_t vecbytes = vld1q_u8((uint8_t *)bytes + i);
uint8x16_t w = is_white(vecbytes);
uint64_t haswhite = is_not_zero(w);
w0 = vaddq_u8(justone, w);
if(!haswhite) {
      vst1q_u8((uint8_t *)bytes + pos,vecbytes);
      pos += 16;
      i += 16;
 } else {
      for (int k = 0; k < 16; k++) {
        bytes[pos] = bytes[i++];
        pos += w[k];
     }
}

Most of the benefit from this approach would come if you can often expect streams of 16 bytes to contain no white space character. This seems like a good guess in many applications.

I wrote a benchmark where I try to estimate how long it takes to prune spaces, on a per character basis, using input data where there are few white space characters, placed at random. My source code is available, but you need an ARM processor to run it. I run the benchmark on a 64-bit ARM processor (made of A57 cores). John Regher has a few more benchmarks on this same machine. I think it is the same cores that you find in the Nintendo Switch.

scalar 1.40 ns
NEON 0.92 ns

The technical specification is sparse. However, the processor runs at 1.7 GHz as one can verify by using perf stat. Here is the number of cycles per character we need…

scalar ARM recent x64
scalar 2.4 cycles 1.2 cycles
vectorized (NEON and SSSE3) 1.6 cycles 0.25 cycles

(The source code for x64 is available on GitHub.)

In comparison, on an x64 processor, the scalar version uses something like 1.2 cycles per character, which would put the ARM machine at half the performance of a recent x64 processor on a per cycle basis. That is to be expected as the A57 cores are hardly meant to compete with recent x64 processors on a cycle per cycle basis. However, with SSSE3 on an x64 machine, I manage to use a little as 0.25 cycles per character, which is more than 5 times better than what I can do with ARM NEON.

This large difference comes from an algorithmic difference. On x64 processors, we are relying on the movemask/pshufb combo and we end up with a branchless algorithm involving very few instructions. Our ARM NEON version is much less powerful.

There is a lot to like about ARM processors. The assembly code is much more elegant than the equivalent with x86/x64 processors. Even the ARM NEON instructions feel cleaner than the SSE/AVX instructions. However, for many problems, the total lack of a movemask instruction might limit the scope of what is possible with ARM NEON.

But maybe I underestimate ARM NEON… can you do better than I did?

Note: The post has been edited: it is possible on 64-bit ARM processors to reshuffle 16 bits in one instruction as one of the commenters observed.

Note: I get better performance in a follow-up blog post.

Science and Technology links (July 1st, 2017)

Canada is 150 years old today.

The iPhone is 10 years old this year. We can safely say that the iPhone 7 is over a hundred times faster, in almost every way than the original iPhone. Very few things get 100 times better over 10 years. You have to improve the performance by 60% each and every year.

Though mammals like us can heal injuries, there is often scarring. Scarring should be viewed as imperfect healing. It is not just a matter of looks, scars make your tissues less functional. As far skin healing is concerned, scientists have found a way to cause skin to heal without scarring, at least in mice.

Essentially, we can manipulate wound healing so that it leads to skin regeneration rather than scarring, (…) the secret is to regenerate hair follicles first. After that, the fat will regenerate in response to the signals from those follicles. (…) regenerating fat cells in skin can be beneficial for conditions beyond scarring. The process could potentially become a new anti-aging treatment, as the formation of deep wrinkles is thought to result from permanent loss of skin fat.

It seems that fasting (going without food) could be a key to regenerating your immune system:

The study has major implications for healthier aging, in which immune system decline contributes to increased susceptibility to disease as people age. By outlining how prolonged fasting cycles — periods of no food for two to four days at a time over the course of six months — kill older and damaged immune cells and generate new ones, the research also has implications for chemotherapy tolerance and for those with a wide range of immune system deficiencies, including autoimmunity disorders.

Chimpanzees are not that much stronger than we are:

But now a research team reports that contrary to this belief, chimp muscles’ maximum dynamic force and power output is just about 1.35 times higher than human muscle of similar size, a difference they call “modest” compared with historical, popular accounts of chimp “super strength,” being many times stronger than humans.

Human beings are optimized for high endurance:

The flip side is that humans, with a high percentage of slow-twitch fibers, are adapted for endurance, such as long-distance travel, at the expense of dynamic strength and power. When we compared chimps and humans to muscle fiber type data for other species we found that humans are the outlier, suggesting that selection for long distance, over-ground travel may have been important early in the evolution of our musculoskeletal system

So how do you fight a chimpanzee? I would guess that getting the fight to last as long as possible is your best bet as a human being. The chimpanzee will get exhausted first. So I would probably either keep the chimpanzee at bay or run away. If the chimpanzee pursues, I would just wear him down.

A few weeks ago, there was an article in Nature claiming that human lifespan is limited to 115 years. There are very few of us that can hope to ever reach 115 years of age at the present time, but the question is whether it will change. Some people believe that 115 years of age is a hard limit that cannot be exceeded. Several scientists have now issued counterpoints. Siegfried Hekimi from McGill University (Montreal) says that…

You can show the data are compatible with many different trajectories and not at all an ongoing plateau (…) by extending trend lines, we can show that maximum and average lifespans could continue to increase far into the foreseeable future. (…) If this trend continues and our life expectancy of the average person becomes 100, the longest person might make it to 150 (…)

Jim Vaupel from the Max Planck Institute writes:

The evidence points towards no looming limit. At present the balance of the evidence suggests that if there is a limit it is above 120, perhaps much above – and perhaps there is not a limit at all.

Maarten Rozing from the University of Copenhagen writes about a biological clock limiting our lifespan:

We now know not only that the idea of such a clock is highly implausible, but also that ageing is proving to be more amenable to change than used to be supposed

The rebuttals can be found in Nature:

Of course, the real answer is at this point is that we do not know how long human beings could live. This being said, Yuval Noah Harari makes a compelling case in his book Homo Deus: A Brief History of Tomorrow that homo sapiens has reached the end of the line. Very solid arguments can be made that, say, in 100 years, there won’t be any homo sapiens left on the planet. So it is entirely possible that we will never find out how long homo sapiens could live.

Video game review… Nier: Automata

Single-player RPG games are having a tough time. Last year I reviewed Deus Ex: Mankind Divided. Though I felt it was an excellent game, it was not a commercial success and it seems that there will not be a follow-up game in the series in the foreseeable future. More recently, I reviewed Mass Effect: Andromeda. I felt that it was a very solid game, but occasional poor writing and some botched graphical models opened up the game to harsh criticism. Again, it looks like Mass Effect might come to an end because of the poor sales.

I am currently playing another single-player RPG, this time from Japan, Nier: Automata. Sales-wise, it looks to be one of the top-10 games of all time on the PlayStation 4, so it is doing quite well.

The game mechanic itself is very much that of an old-school game. In fact, a fair amount of time is spent playing the game as if it were a two-dimensional shooter. Otherwise, the game plays quite a bit like a typical and classical action RPG “à la Zelda”.

The game looks good, but it is quite simple, even simplistic. There are only so many different enemy types. Most of the map looks the same. The 3D models are crude at times though always effective. The layouts are simplistic. I get the impression that the game engine must be simple. This gives the game an old-school look and feel. I also suspect that this means that the game is a massive success financially for its producers. A game like Mass Effect: Andromeda has a sophisticated design, with finely tuned non-trivial combat mechanics and lots of massive unique environments, so it has to be far more expensive to develop.

You play as an android that has two modes of attack that can be used simultaneously. Firstly, there is a drone that follows you around, and you can order this drone to shot continuously at enemies. Given that most enemies, including bosses, have a hard time damaging you if you stay far away, this drone almost trivializes the game. There are entire boss fights that you can win by jumping up a ledge and just having your drone shoot the enemy down. It helps that you have infinite ammunition. Secondly, you can use melee weapons like swords. That’s where the game gets interesting because though your melee weapons can cause a lot of damage, they also open you up to receiving a lot of damage. There is real skill involved in fighting powerful enemies up close.

Because you are an android, you can reprogram yourself by acquiring new programs. For example, you can make it so that whenever your health levels fall under a threshold, you automatically heal yourself using one of your “healing potions”. You can also make it so that after receiving some damage, you become invincible for a second or two. Combining these two programs is sufficient that, for most purposes, you are invincible… as long as you have enough “healing potions”… but these are cheap and widely available in stores.

When I first starting playing, I paid little to no attention to these programs, nor did I pay much attention to my choice of weapon. However, it ends up making a critical difference, at least on the default difficulty level.

There is no automatic save points, so you can die and have to restart the game from the beginning. You have to think about saving. If you die, your body will remain where you die along with some of your gear. You can retrieve it by playing again and getting back to your body.

Playing the game requires some skill, but on the default difficulty level, I only ever had trouble with one part of the game… there is a crazy boss at some point, “the Opera boss”, it is a giant lady with an armored dress. And I suspect that I had so much trouble because I did not understand the game very well.

Not everything is absolutely smooth. Several times I was left wondering about where I was supposed to go, what I was supposed to do, but I never got stuck long enough to be annoyed.

I have done an entire first playthrough but the game has this weird mechanic whereas you are supposed to beat the game several times, and each time you do so, you get to see a different side of the story. Looking at the Wikipedia entry for the game, it seems that I will need to play at least two more times through the game to really see the bulk of the story.

The music of the game really adds a lot to the experience. To be honest, I suspect that I play just to be immersed in the music and aesthetic of the game. I find it relaxing.

Though I have not played through the entire game, I know enough to appreciate the story and the theme. The game is set in our far future. It is supposedly very, very far in our future but, oddly, city structures are still holding more or less intact. There is no human being anywhere, though you are told that they reside on the Moon, unseen. You are an android that looks like a young human being, but there are cruder robots all over the surface of the Earth. The crude robots are your enemies, sometimes. Supposedly, there is a war going on between the crude robots and the androids, but looks can be deceiving.

It is probably most accurate to depict the story about being about the post-human era on Earth. Human beings are gone, but intelligent machines remain behind. It is very reminiscent of Stross’ Saturn’s Children. Though everybody around is a machine, you get to care for them, very much so.

That’s maybe the surest sign that the game is a success. You care for the characters. Even if they are machines that can be rebooted at will. It is saying a lot because I don’t normally empathize easily with Japanese characters, as I find the Japenese culture a bit too strange. So while the game is simple, it is skillfully made.

If you ever liked playing Zelda, and you don’t mind something a bit more serious where Zelda could die, this is a game for you.

Science and Technology links (June 23rd, 2017)

Elon Musk, Making Humans a Multi-Planetary Species, New Space. June 2017, 5(2): 46-61.

Reportedly, Ikea is working on augmented reality software that would allow you to see the furniture in your home before buying it.

Current virtual reality headsets provide a good experience, but if you ever tried to read text while where one of the headsets, you may have noticed that it is quite hard. It seems that it is because the image resolution is too low. When using prototypes with very high resolution, you can “read text across the room”.

You would think that a tree that is over 200 years old would have accumulated a lot of (random) genetic mutations. However, it seems that it is not the case: as trees grow old, even very old, they preserve their genes intact. We do not know why or how.