How quickly can you remove spaces from a string?

Sometimes programmers want to prune out characters from a string of characters. For example, maybe you want to remove all line-ending characters from a piece of text.

Let me consider the problem where I want to remove all spaces (‘ ‘) and linefeed characters (‘\n’ and ‘\r’).

How would you do it efficiently?

size_t despace(char * bytes, size_t howmany) {
  size_t pos = 0;
  for(size_t i = 0; i < howmany; i++) {
      char c = bytes[i];
      if (c == '\r' || c == '\n' || c == ' ') {
      bytes[pos++] = c;
  return pos;

This code will work on all UTF-8 encoded strings… which is the bulk of the strings found on the Internet if you consider that UTF-8 is a superset of ASCII.

That’s simple and should be fast… I had a blast looking at how various compilers process this code. It ends up being a handful of instructions per processed byte.

But we are processing bytes one by one while our processors have a 64-bit architecture. Can we process the data by units of 64-bit words?

There is a somewhat mysterious bit-twiddling expression that returns true whenever your word contains a zero byte:

(((v)-UINT64_C(0x0101010101010101)) & ~(v)&UINT64_C(0x8080808080808080))

All we need to know is that it works. With this tool, we can write a faster function…

uint64_t mask1 = ~UINT64_C(0) / 255 * (uint64_t)('\r');
uint64_t mask2 = ~UINT64_C(0) / 255 * (uint64_t)('\n');
uint64_t mask3 = ~UINT64_C(0) / 255 * (uint64_t)(' ');

for (; i + 7 < howmany; i += 8) {
    memcpy(&word, bytes + i, sizeof(word));
    uint64_t xor1 = word ^ mask1;
    uint64_t xor2 = word ^ mask2;
    uint64_t xor3 = word ^ mask3;

    if (haszero(xor1) ^ haszero(xor2) ^ haszero(xor3)) {
      // check each of the eight bytes by hand?
    } else {
      memmove(bytes + pos, bytes + i, sizeof(word));
      pos += 8;

It is going to be faster as long as most blocks of eight characters do not contain any white space. When this occurs, we are basically copying 64-bit words one after the other, along with a moderately expensive check that our superscalar processors can do quickly.

As it turns out, we can better without using 64-bit words.

// table with zeros at indexes corresponding  to white spaces
int jump_table[256] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  0, 1, 1, 0, 1,  ...};

size_t faster_despace( char* bytes, size_t howmany ) {
  size_t i = 0, pos = 0;
  while( i < howmany )
    bytes[pos] = bytes[i++];
    pos += jump_table[ (unsigned char)bytes[pos] ];
  return pos;

This approach proposed by Robin Leffmann is very clever, and it is fast because it avoids penalties due to mispredicted branches.

Can we do even better? Sure! Ever since the Pentium 4 (in 2001), we have had 128-bit (SIMD) instructions.

Let us solve the same problem with these nifty 128-bit SSE instructions, using the (ugly?) intel intrinsics…

__m128i spaces = _mm_set1_epi8(' ');
__m128i newline = _mm_set1_epi8('\n');
__m128i carriage = _mm_set1_epi8('\r');
size_t i = 0;
for (; i + 15 < howmany; i += 16) {
    __m128i x = _mm_loadu_si128((const __m128i *)(bytes + i));
    __m128i xspaces = _mm_cmpeq_epi8(x, spaces);
    __m128i xnewline = _mm_cmpeq_epi8(x, newline);
    __m128i xcarriage = _mm_cmpeq_epi8(x, carriage);
    __m128i anywhite = _mm_or_si128(_mm_or_si128(xspaces, 
                xnewline), xcarriage);
    int mask16 = _mm_movemask_epi8(anywhite);
    x = _mm_shuffle_epi8(
          x, _mm_loadu_si128(despace_mask16 + mask16));
    _mm_storeu_si128((__m128i *)(bytes + pos), x);
    pos += 16 - _mm_popcnt_u32(mask16);

The code is fairly straight-forward if you are familiar with SIMD instructions on Intel processors. I have made no effort to optimize it… so it is possible, even likely, that we could make it run faster. My original SIMD code had a branch but Nathan Kurz realized that it was best to simplify the code and remove it.

Let us see how fast it runs!

I designed a benchmark using a recent (Skylake) Intel processor over text entries where only a few characters are white space.

regular code5.5 cycles / byte
using 64-bit words 2.56 cycles/byte
with jump table 1.7 cycles/byte
SIMD (128 bits) code0.39 cycles / byte
memcpy0.08 cycles / byte

So the vectorized code is nearly 14 times faster than the regular code. That’s pretty good.

Yet pruning a few spaces is 5 times slower than copying the data with memcpy. So maybe we can go even faster. How fast could we be?

One hint: Our Intel processors can actually process 256-bit registers (with AVX/AVX2 instructions), so it is possible we could go twice as fast. Sadly, 256-bit SIMD instructions on x64 processors work on two 128-bit independent lanes which make the algorithmic design more painful.

Leffmann’s approach is not as fast as SIMD instructions, but it is more general and portable… and it is still three times faster than the regular code!

My C code is available.

Best programming language for high performance (January 2017)?

I keep hoping that the field of programming language will evolve. I am a bit tired to program in Java and C… I’d like better languages.

I am particularly interested in what I generally call “high-performance programming”. I want to pick languages where I can get the most out of my hardware. It is fine to program in JavaScript and Python, but there comes a time where you tackle large problems on hardware that is barely sufficient.

  • For my purposes, I am going to narrow it down to popular “system programming languages” which means Swift, Rust, Go, C++, C. These are languages that well-suited to build more than applications, but also servers, database engines and so forth. They all offer ahead-of-time compilation, which makes the performance easier to reason about.

    I think there will be no disagreement that C and C++ and system programming languages. Whether Go and Swift qualify is somewhat subjective. Would you write an operating system in Go? I would not. But Wikipedia seems to be happy to consider them all as system programming languages.

  • I am going to ignore assembly because I can’t imagine using it all the time. It is simply not practical unless you are working on a tiny code base (maybe on embedded software).
  • I am going to ignore Rust, D and Nim as they have too few users. Programming is a social activity: if nobody can use your code, it does not matter that it runs fast. I think that they are very interesting languages but until many more people start coding in these languages, I am going to ignore them. Fortran is out for similar reasons: it still has important users, but they are specialized.
  • I am not including non-system languages such as JavaScript, PHP, Matlab, Ruby, R, Python because though they can be fast, they are simply not generally well suited for high performance. They are typically “fast enough” for specific purposes, but they also have hard limitations. For example, JavaScript and Python appear unable to support natively multithreaded execution. (Spawning new processes does not count.)
  • Java and C# are interesting cases. They have been both described and used as “system programming languages”. I think that the evidence is overwhelming that Java can be used to build significant systems that perform very well. However, I think that running in a virtual machine (JVM, CLR) is a hard limitation: you can never be “bare metal”. I lump Scala with Java.
  • There is a countless number of niche languages that I have not mentioned. Most of them will never pick up any amount of users.

Let me run through a few desirable features for high-performance programming along with the current languages that fare well.

  • Operating systems are built in C or C++. Moreover, many important libraries have C interfaces. This means that being able to call C functions with minimal overhead is important in some instance. While Go allows you to call C functions, there is considerable overhead in terms of performance. All other programming languages can call C with minimal overhead.
    • Obvious winners: C, C++
    • Good players: Swift
    • Loser: Go
  • Though this is controversial, I think that a language with manual memory management may have the upper hand. In modern systems, managing memory is often a major overhead. C and C++ have manual memory management. Swift has reference counting which is a form of automated memory management, but with the advantage of being purely deterministic: when your data goes out of scope, the memory is reclaimed. However, reference counting does have an overhead, especially in a multithreaded context. Go has generational garbage collection (like Java or C#) which means that “it stops the world” at some point to reclaim memory, but the Go engineers have optimized for short pauses. That’s a good thing too because “stopping the world” is not desirable. I don’t know which approach I prefer: Swift or Go.
    • Winners: C, C++
    • Losers: Go, Swift
  • Our computers have many cores. Parallel programming is important. We would like our languages to support elegantly multicore programming. I think that Go has clearly the upper hand. Of course, for example, it is possible to do everything Go does from C as far as multicore programming, but not as easily. As far as I can tell, Swift has one standard way to use multiple cores: Grand Central Dispatch. It also has Operation/OperationQueue for higher level multi-threading.  It might be fine, but it does not look on par with what I find in other languages.
    • Special mention: Go
    • Winners : C++, C
    • Loser: Swift
  • For high performance, a good programming language should have “easy to reason about performance”. This means that when looking at a piece of code, you should have a good appreciation for how fast it will run. Honestly, this has gotten very hard over time, forcing us to rely more and more on benchmarking. However, simpler languages like C and Go makes it easier to reason about performance. Simply put, the more expansive the language, the harder it gets to compile it in your head.
    • Winners: C, Go
    • Losers: Swift, C++
  • Our processors have fancy new instructions, like SIMD instructions (AVX2, Neon…). These should be readily available from the programming language. No programming language is easy to program with something like SIMD instructions, but C and C++ clearly come on top. In Go, you can write functions in assembly and call them from the language, but it comes with a discouraging performance overhead. Swift is barely better (there is a simd package, but it only works under macOS).
    • Winners: C, C++
    • Losers: Go, Swift
  • It has been decades and there is still no universal way to manage dependencies in C. There is also no standard build mechanism. Sure, your favorite IDE can make it easier to add a dependency and build your program, but having something built into the language makes life much easier. In Go and Swift, there is a standard, cross-platform way to build programs, run tests and add dependencies.
    • Winners: Go, Swift (special mention for Go)
    • Losers: C, C++

Let me summarize my analysis with a table:

C C++ GoSwift
Call C Yes Yes SlowFair
Bare metal memory Yes Yes No No
Cores Yes Yes Great Maybe (GCD)
Simple syntax Yes No Yes No
Safety Minimal Minimal Yes Yes
SIMD Yes Yes Slow Maybe (macOS only)
Tools None None GreatGood

If I were hired to build high-performance programming, I’d be happy to be “forced” into any one of these languages. They are all fine choices. They are all improving at a fast pace. I think that Go, for example, has improved tremendously since I first learned it.

Predicting the future job market: the librarians

People spend a lot of time worrying that robots and computers are going to wipe out all jobs. My belief is that the job market is a lot more complex and simplistic reasonings (“better robots means fewer jobs”) are likely misleading.

I recently observed that despite the widespread introduction of automatic bank tellers in the 1970s, there are more human bank tellers in the US today than ever before. People argue that it has to do with regulations… but that’s not a valid counterargument: more and more jobs are opened due to regulations. If you have a model of the job market of the future and it ignores government regulations, politics, and demography… then your model is probably leading you astray.

I thought I’d pick another example. Librarians working for public libraries. Back in 1990, if you wanted to know something, you often had to go to a public library and search through the catalogs for a reference book. Librarians would work hard to index each and every book in every library. Even so, looking up information in a library was not easy so librarians were often on call to answer questions.

Large organizations often had librarians. Certainly, all major corporations and large government organizations had librarians. They were highly valued, well-paid workers.

Then something happened. The Internet and the Web became ubiquitous. I don’t know how many people still go to the library to look up information, but I’d guess that the percentage is small. For my own research, I no longer go to a library. I still refer my students to the University library, but I don’t think many of my students ever talk to a librarian.

So how did librarians fare?

In 1990, there were 21,000 accreditated librarians in the US working for public libraries (ALA-MLS) and there were 32,000 in 2011. A 50% increase…

What is the US government predicting for the next ten years?

(…) the increased availability of electronic information is expected to increase the demand for librarians in research and special libraries, where patrons will need help sorting through the large amount of digital information.

So it seems like the Web is not wiping out librarians.

It seems to me that we might still have as many librarians as we do today in 50 years. Why not? When I look at colleges around me, none of them are closing their libraries. I have not heard any plans regarding the closing of libraries. My local town just built a brand new library…

I’d argue that librarians have largely lost the purpose they had in 1990. They are no longer the gatekeepers of scholarly information… but jobs are tricky things. They often exist for a variety of reasons, and the primary stated purpose of a job might not be so essential.

If your model of the future job market cannot account for the increases in the number of librarians and bank tellers… then what is it good for?

Betting against techno-unemployment

There are millions of truck drivers in the US today. In particular, there are about 1.7 million tractor-trailer (human) drivers. There are many more professional truck drivers in the US, but tractor-trailer drivers are the “archetypal truck drivers”.

In 2016, we had several demonstrations of self-driving tractor-trailer trucks. A self-driving tractor-trailer truck delivered beer for Uber in October.

Putting two and two together, many people are worried that we are soon going to face a wave of technology-related unemployment. You can’t, realistically, retrain over a million tractor-trailer drivers into Facebook engineers.

However, when jobs disappear slowly, it does not always translate into unemployment. For example, people tend to leave naturally an industry even when they still have a job. This occurs through retirements and deaths, but also in various other ways. Maybe someone gets tired of the long hours involved when driving trucks, and they go for an office job. Some people decide to start a small business. And so forth. Moreover, when the job prospects are poor, fewer people tend to enter an industry. There may be no need to fire anyone.

This kind of attrition is likely happening to journalism today. In the past, every town had its independent newspaper. Newspapers were our primary information source about the world. Today, fewer and fewer of us read newspapers and those who do tend to be older. The bulk of the journalism jobs were never at the New York Times, they were always in a myriad of small papers. These small papers are mostly obsolete. Many prospered through classified ads, but newspapers classified ads are a relic of the past. The few small newspapers that remain can be run with a much smaller crew. So how did the workforce fare? We went from half a million (in 1990) to less than half that (180,000) in 2016. Of course, over the same period of time, about 200,000 new jobs were created in Internet publishing, a closely related industry. So while there are fewer journalists today, it took 16 years to see the disappearance of 2/3 of all jobs… and the Internet created many related new jobs. All in all, it was not a good time to be a journalist, but it did not create a national emergency.

Journalism has been something of an extreme case. The Internet really disrupted journalism. If we look at other industries, like the automotive industry, we see much more modest job losses (like 20% over 15 years). Such modest job losses are still enough to create some pain, but most employees did fine.

There are many more truck drivers than there ever were journalists. It is almost an order of magnitude in difference. And while journalists can find new jobs in the vast fields of communications and entertainment… I don’t think anyone would know what to make of a million unemployed truck drivers.

Yet, even if, eventually, all tractor-trailer drivers are computers, it does not follow that we will experience techno-unemployment in the sense that those in the profession will see their careers destroyed. We need a sudden fall in the number of jobs.

Calum Chace wrote an excellent book “The Economic Singularity: Artificial intelligence and the death of capitalism“. As the title of his book suggests, Calum predicts some major disruption.

In particular, Calum predicted the following in an email to me:

By 2030, there will be fewer than 250,000 tractor-trailing drivers in the US, according to official government statistics.

The year 2030 is 13 years away… To fall down to 250,000 drivers from 1.7 million in 13 years, we would need to experience a 13% decline every year until then… It is the equivalent of the disappearance of 85% of all jobs… something much more severe than what journalism underwent. Something much, much more severe than what the automotive industry underwent.

I have decided to place a bet against Calum. He wins if during any year, up till the year 2030 (inclusively), the number of drivers fall under 250,000. I win otherwise. We each put $100 on the bet, with the money going to a charity chosen by the winner.

Though a nice theory, techno-unemployment is not currently occuring, despite all the technological progress we see. It is an idea, but not necessarily something that we can observe in the real world.

I think that there are a few good reasons for why that is.

  • There is a wide gap between what is technically possible and what we can achieve in the field. For example, we could pretty much automate at a large scale how medical data in handled. Yet very few countries did it, and few countries are in the process of doing it. Let us recall the fiasco where the US federal government deployed a fairly simple web application to help people find affordable insurance.

    So even if we can build and sell perfectly safe self-driving tractor-trailer trucks today… one should not underestimate the difficulties that we might have to deploy this technology. In fact, I can imagine a scenario where 2030 come and pass, we still do not see self-driving tractor-trailer trucks on the roads as a matter of course.

    Did you know that New Jersey and Oregon, two otherwise reasonable American states, ban self-service gas stations? Yet self-service gas stations were aggressively introduced back in the 1970s. How long do you think it will be before all American states allow self-driving tractor-trailer trucks?

    As far as I can tell, subways worldwide could easily be automated. It is simply not that hard to automate a subway train. Yet, according to Wikipedia, there are only two automated train networks in Canada.

  • Even when good automation technology does get deployed… if the industry grows sufficiently fast, it leaves ample room for professionals to earn a good living. For example, there are more human bank tellers employed in the US right now than there ever were. Part of it is due to regulations, but it would be impossible without a massive expansion of the banking industry.

    It is entirely possible that automation will be used mostly to grow the industry. There could still be a million truck drivers by 2030, but maybe they will focus on the less frequent routes while the most intense routes are automated.

Generally speaking, my view is that we are not going to be able to automate fast enough. In most high-income countries, we have aging populations. We are going to run out of nurses willing to work for wages people can afford. I am hoping that health care will improve at an accelerated rate so that we can start getting the diseases of aging under control in 20 years… but even if we do, we simply won’t have an ample supply of new eager employees.

My thesis is that techno-unemployment should be the last of our worries. Soon enough, we will live in countries where a quarter of the population is at an age where we expect people to be retired. It is already the case in Japan.

On the one hand, this should please people who fear overpopulation, as it means decreasing population (people over the age of 65 tends to have few children). On the other hand, this means few young new employees. We will need all the robots we can get.

We need self-driving trucks because we are going to run out of people to drive them.

Can your C compiler vectorize a scalar product?

If you have spent any time at all on college-level mathematics, you have probably heard of the scalar product:

float scalarproduct(float * array1, float * array2, size_t length) {
  float sum = 0.0f;
  for (size_t i = 0; i < length; ++i) {
    sum += array1[i] * array2[i];
  return sum;

Most people who need to do fancy things like dot products or matrix multiplications use hand-tuned libraries… but there are very reasonable reasons for the average programmer to come up with code that looks like a scalar product.

Vendors like Intel know that scalar products are important, so they have dedicated instructions to speed-up the computation of the scalar product. The approach taken by Intel involves SIMD instructions. SIMD instructions work over vectors of several words (such as 8 floats) at once. Using intrinsics, you can write your own hand-tuned scalar product…

float avx_scalarproduct(float * array1, float * array2, 
          size_t length) {
  __m256 vsum = _mm256_setzero_ps();
  size_t i = 0;
  for (; i + 7 < length; i += 8) { // could unroll further
    vsum = _mm256_fmadd_ps(_mm256_loadu_ps(array1 + i),
                       _mm256_loadu_ps(array2 + i),vsum);
  float buffer[8];
  float sum = buffer[0] + buffer[1] + 
                      buffer[2] + buffer[3] + 
                      buffer[4] + buffer[5] + 
                      buffer[6] + buffer[7];
  for (; i < length; ++i) {
      sum += array1[i] * array2[i];
  return sum;

Wow. Who wants to deal with code that looks so messy? With more effort, I can improve the performance further, but simply unrolling, but the code looks even more bizarre.

So is it worth the effort?

No. The GNU GCC compiler is able to compile my simple pure-C function to code that is as good as my hand-tuned function. (One caveat: you have to compile the code with the -ffast-math flag.) And the code produced by LLVM’s clang is more than twice as fast as my “hand-tuned” code (it corresponds to a fully unrolled version).

The source code of my full benchmark is available.

We can examine the assembly directly and verify that clang makes generous use of the vfmadd231ps instruction.

The fully unrolled scalar product code (see my GitHub repo) runs at a speed of about 0.2 cycles per pair of input floats. That’s what the clang compiler can produce and it is an order of magnitude faster than the regular code.

Why does any of this matters?

The larger question is: how do we scale up our software so that it can benefit from advanced parallelism? We know that multicore processing only helps so much: adding new cores to a processor adds a lot of complexity. It is really hard to test multicore software. Vectorization (through SIMD instructions) is conceptually much simpler. There are some minor downsides to wide vector instructions, such as the fact that we end up more register space and more expensive context switching, but it is simply not comparable to the difficulties inherent to multicore parallelism.

The fact that optimizing compilers are pretty good with SIMD instructions makes these instructions very compelling. We have ease of programming and performance in one nice bundle. In 2017, Intel should ship commodity processors with AVX-512 support, meaning that single instructions can process 512-bit registers. Soon enough, your compiler will take your good old code and multiply its speed, without any input on your part.

Note: This blog post was motivated by a Twitter exchange with Maxime Chevalier. My original claim was that the Swift compiler could probably automatically vectorize matrix multiplications, if I write it carefully. It is certainly true when I use a C compiler, but I don’t know how to get the same result in the Swift language given that I don’t know whether there is an equivalent to the -ffast-math flag in Swift. Thankfully, Swift code can call C without overhead, so you can probably simply rewrite your performance-sensitive floating-point code in C and all it from Swift. Or maybe the Swift engineers could give us -ffast-math in Swift?

The threat of technological unemployment

There is a widely reported threat to our economy: robots are going to replace human workers. It is nothing new… In 1930, Keynes, the famous economist introduced the term “technological unemployment”…

We are being afflicted with a new disease of which some readers may not yet have heard the name, but of which they will hear a great deal in the years to come—namely, technological unemployment. This means unemployment due to our discovery of means of economising the use of labour outrunning the pace at which we can find new uses for labour.

There is a long Wikipedia article on technological unemployment, where you can learn that people have been concerned about it for centuries… long before we could even imagine robots and computers. It is a recurrent theme of western thought.

Personally, I am unconcerned by technological unemployment as a threat. On the long run, I think that we can create jobs out of thin air if we need to (something I call “transemployment”). I strongly disagree with people like Tyler Cowen who think that regular folks are doomed.

We should first recognize the thesis for what it is. Currently, most people get most of their income through jobs. But jobs provide a lot more than just income. They also provide meaning and a social status. In some countries, like the USA, people often get access to critical health services through their employment. Unemployed people are more likely to be sick or depressed, they are less likely to be seen as viable mates… and so forth. We live in a society where even billionaires have “jobs”. If human labor were to become wholly unnecessary, we are going to substitute for it through transemployment. That is, we will create work that is not strictly needed. I think that this process is well under way. But how will we pay for it all? Well. Consider that if human labor becomes unnecessary, it follows that the wealth is being created by the robots, so it is still there. Some authors fear that technology will result in a radical concentration of wealth, the like of which we have never seen… while a few people will be super wealthy, all of us will slowly starve. Except that fewer people than ever in history are starving! It is true that people without a high school education in the US earn much less than the more educated. And there is an income gap building up. However, in the US, the less educated also enjoy a much higher amount of leisure time. Some of us envy the CEOs, but let us not forget that many of them work 60 hours or more. Do we really want all to focus our lives almost entirely on “jobs” forever and ever? Can’t we imagine a world where jobs are relatively secondary aspects of our lives? Maybe in the future, when we speak with teenagers, we will never mention employment as something they need to be concerned with.

I think that we can’t automate fast enough. I work for a university where too many people spend too much time on routine work, filling out the same forms day after day. We can’t seem to outrun bureaucracy with automation. Hospitals can’t find enough qualified nurses willing to work at modest wages: we need to more automation in health care, and we need it now. China is running out of young factory workers, they can’t get the robots in place soon enough.

I really do wish that we were much further along regarding technological unemployment. My bias as a techno-optimist is to be bullish on technology… but let us look at the signs… is technological unemployment something that we can observe around us?

  • Though the economy goes up and down, the unemployment rate worldwide is low. In is under 5% in the US, and even lower in Japan. I don’t think that people advocating imminent technological unemployment have a good explanation for this data point. Yes, some people get discouraged and drop out of the job market, thus artificially lowering the unemployment rate, but that has always been true.
  • In the USA, the participation rate, that is the fraction of people who work in the entire population is lower than it was 10 years ago by about 3%. In the last 10 years, the participation rate has been flat in France and in Japan, declining in the US and Canada, while it has been rising in Germany. The participation rate varies greatly from country to country. It is 56% in France, 60.5% in Japan and Germany, 63% in the US and 66% in Canada. It seems difficult to reason about it in absolute terms.

    In the US, a lot of people in their “prime age” who are not working are sick in some way….

    Participation in the labor force has been declining for prime age men for decades, and about half of prime-age men who are not in the labor force (NLF) may have a serious health condition that is a barrier to work. Nearly half of prime age NLF men take pain medication on a daily basis, and in nearly two-thirds of cases, they take prescription pain medication. The labor force participation rate has stopped rising for cohorts of women born after 1960. Over the past decade, retirements have increased by about the same amount as aggregate labor force participation has declined. Continued population aging is expected to reduce the labor force participation rate by 0.2 to 0.3 percentage point per year over the next decade. (Krueger, Where Have All the Workers Gone? 2016)

    But sickness and disability affect disproportionally older people.

    Young people outside of the workforce are sometimes “discouraged”, meaning that they do not seek employment because they see no hope of getting a good job. The number of discouraged people in the US has been slowly going down since its latest peak in 2010.

  • Firm investments in high-tech equipment and software is falling, relatively speaking. That’s hardly what you’d expect in an economy where robots are taking over? Of course, this is partially explained by the rapidly falling cost of some technologies.
  • Worker productivity has been increasing more or less monotonically for the last few decades but it has recently been declining. A technological boost akin to what might lead to technological unemployment should be accompanied by a rise in worker productivity.
  • When I was born, everyone dealt with a living and breathing bank teller. These days, I go see a bank teller maybe once a decade. They have been “replaced” by machines. But did you know that there are more bank tellers in the US than ever before? And I’d argue that bank tellers today have more interesting work, doing a lot less in terms of routine work.

Overall, there is scant evidence that we are undergoing a technological-unemployment crisis, if only because unemployment rates are low.

Don’t let the experts define science!

We know more than we can tell. We all know what a democracy is… We know that France, Canada, the USA, Japan… are democracies… Russia and China are not democracies. However, I have never seen a definition of “democracy” that fits the actual use of the term. One formal definition is “government by the people, exercised either directly or through elected representatives”. But, of course, both China and Russia have elected representatives. And they will certainly claim to be governments “by the people and for the people”.

Formal definitions are less useful than you think.

Science is another term that defies formal definitions.

The philosopher Karl Popper offered us the concept of falsifiability to distinguish scientific statements. Something is falsifiable if it could be proven false. For example, I can say that the Moon was constructed the god Zeus. Could I test this statement and prove it wrong? Probably not. Thus it is not a scientific statement.

Notice that falsifiability is entirely general and has nothing with the natural sciences. For example, I can tell you that the unemployment rate in the US will skyrocket to 20% in 2018. That’s a falsifiable prediction. What is not falsifiable is to say that “because we elected Donald Trump to the presidency, the unemployment rate will skyrocket…”.

So falsifiability is a useful ingredient of science… but it does not define science, not anymore that elections define democracy. For example, there are highly paid financial analysists whose job is to predict economic indicators for the coming years. Their predictions are thoroughly falsifiable, but we do not consider them to be scientists.

People often say that science relies on the so-called scientific method. You start with a hypothesis and then you try to falsify it. There is no denying that it does happen in science… but that’s not what scientists “do”. What most scientists do is to try things out, without a clearly stated hypothesis… then they look at the results… if they find something that they can’t explain… they design more experiments to try to deepen their understanding.

The major fault of the “scientific method” is that it assumes too much on the part of the scientists. A clearly stated hypothesis is often the end result, not the starting point.

Let me take an example. Maybe I think that coffee causes cancer. You’d think that it might serve as my hypothesis, but as stated it is not falsifiable. What do I mean by “cause cancer”? I certainly do not mean that drinking coffee will lead you to have a stomach cancer the next day. So I will start to collect statistics about coffee and cancer. Maybe I will setup experiments with mice that somehow drink coffee. Eventually, after years, I might end up with a hypothesis that others can try to falsify. Maybe.

So though the concept of falsifiability is central to the sciences, the scientific method is not.

Another relevant scholar was Thomas Kuhn. Kuhn believed that scientific truth is not an objective fact. We have a collection of beliefs that are proving useful and that are robust enough to fit the facts as we see them. So far so good… but then Kuhn added a dichotomy… there is “normal” science where people do boring stuff under the existing belief system… and then there is “revolutionary” science that seeks to replace the existing belief system.

So you had silly Newtonian physicists… and Einstein appeared, threw everything overboard and forced people to rethink their long-held assumptions. Kuhn’s view is compatible with the Heroic Theory of Scientific Development. Scientists collectively stumble in the dark until a hero comes in to save the day.

Kuhn’s revolutionary science makes a good story… but I don’t think that’s how science works. In retrospect, we can tell stories about what happened and how it happened… but that’s not nearly as useful as it sounds.

If really science followed two distinct models, then you’d be able to tell, as it happens, whether this scientist pursues “normal science” whereas this other scientist pursues “revolutionary science”. But life isn’t quite so simple.

You see… the reason the Heroic Theory of Scientific Development is false is that “ideas have people” (as opposed to people having ideas)… that is, ideas evolve and spread on their own… As long as the fertile ground is available, strong ideas thrive… and it is only in retrospect, in an illusionary manner, that we pin down when exactly an idea came to be.

Why did most scientific progress come from Europe and North America? It is not because white men are smarter. It is because of the European culture, maybe through Judeo-Christianity, created fertile ground for strong new ideas.

That Kuhn was wrong matters because… because if he were right then you could turn around science funding and simply identify the “revolutionary science” and support that instead of wasting time with “normal science”. But, of course, if there was some way to tell apart “normal science” from “revolutionary science”, we would know.

Let us take a recent computer science “breakthrough”: deep learning. Companies like Google use deep learning to translate documents, recognize the content of pictures, speech recognition and so forth. It is the fastest area of growth right now in computer science (or it must be). We have gone back in time and granted great scientists like Yann Lecun the label of “science revolutionary” because deep learning took hold of them some time ago. (And, to be clear, people like Lecun should be celebrated.) But what actually happened?

  • We have had the idea of neural networks since… nearly forever. Though I was far from anything having to do with machine learning, I kept ending up in conferences covering neural networks in the last few decades. The field has grown and there have been lots of interesting theoretical developments… But it can easily be qualified as a whole lot of “normal science”.
  • Our computing hardware and software are a lot better, a lot faster than they were 20 years ago. What would have required a supercomputer worth billions in the 1990s can fit on the desk of any software engineer today. How did we get there? Through a whole lot of “normal science” and “normal engineering”.
  • Companies like Google have a lot more data than any company ever had. It is hard to think clearly about it, but we are talking about orders of magnitude in difference. How did we get there? Again, lots of patient, “normal work”.

So the dichotomy between normal and revolutionary science is about as scientific as the study of history… which is to say, not at all.

If the subversion of beliefs is to be qualified as “revolutionary”, then we may as well give up on science. Science needs the freedom to put in question our beliefs at every step. If you are not constantly rethinking what you think you know, you are not doing science.

There is, however, a useful dichotomy between non-scientific and scientific work. Lots of government-supported scientists, including some celebrated ones, do work that should never be qualified as “science”. Also, most scientists are guilty, at one point or another, of being lacking in ambition. That is, they pursue work that has little chance of being proven useful. But I would not call it “normal science”, I would call it “boring science”.

Most science out there is boring. The interesting parts are done by a small minority of researchers. But ambitious science is not the same as Kuhn’s revolutionary science. There is no need to work on a paradigm shift. For example, you can be a cancer scientist. On paper, your goal is to cure cancer. You can take the easy path and study one of the millions of mechanisms that have some relevance to cancer. You can then publish an endless stream of papers. Or you can set yourself as a goal to actually cure cancers. And then you can aim for clinical trials and so forth.

Thankfully, if you know where to look, science is still super exciting. But it tends to happen only on a fertile ground… of which we have a limited supply.

What is fertile ground for science? Feynman described science as the belief in the ignorance of experts. What does it mean? It means that science can only thrive where experts can be questioned. Prior to the emergence of science (in Europe), truth was mostly whatever your ancestors and the highest figure of authority said it was. And that’s not a bad way to go about it. There is a lot of collected wisdom in your culture. But relying on your culture to determine truth is not science.

That’s why people who say that something is true because most scientists think it is true (e.g., with respect to climate change) have little understanding of what science really is. Scientific truth is not established by a vote among scientists. It does not matter how nice and popular your idea is… if it does not fit the facts on the ground, then it must be rejected. For example, biologists taught us that human beings have 24 chromosomes, up until the 1950s, even though it was visible that there were only 23.

If you live in a culture where questioning the established truth will get you in trouble, then you are not on a scientific fertile ground. So you need freedom (including freedom of speech).

Freedom of speech is a delicate thing, but an integral component of science. When we silence someone, it often only harms the few at first while it can greatly benefit the rich and the powerful. But if we could not stand up and speak up to contradict the authorities, we could not pursue science for long.

To sum it up: I don’t think we can define formally what science is. It is a useful cultural construct that emerged out of Europe. It entails falsifiable ideas, and people working in a culture where it is possible to question and subvert widely held beliefs.

Performance overhead when calling assembly from Go

The Go language allows you to call C functions and to rewrite entire functions in assembly. As I have previously documented, calling C functions from Go comes with a significant overhead. It still makes sense, but only for sizeable functions, or when performance is irrelevant.

What about functions written in assembly? To illustrate the performance constraints, I am going to use an example designed by Jason Aten.

Recent Intel processors have an instruction (tzcnt) that counts the “number of trailing zeroes” of an integer. That is, given a non-zero unsigned integer, you count the number of consecutive zeros starting from the least significant bits. For example, all odd integers have no trailing zero, all integers divisible by 4 have at least two trailing zeros and so forth. You might reasonably wonder why anyone would care about such an operation… it is often used to iterate over the 1-bit in a word. It is useful in data compression, indexing and cryptography.

A fast way to compute the number of trailing zeros without special instructions is as follows… (see Leiserson and Prokop, Using de Bruijn Sequences to Index a 1 in a Computer Word, 1998)


where table is a short array of bytes that fits in a cache line…

var table = []byte{
	0, 1, 56, 2, 57, 49, 28, 3, 61,
        58, 42, 50, 38, 29, 17, 4, 62, 
        47, 59, 36, 45, 43, 51, 22, 53,
        39, 33, 30, 24, 18, 12, 5, 63, 
        55, 48, 27, 60, 41, 37, 16, 46,
        35, 44, 21, 52, 32, 23, 11,54,
        26, 40, 15, 34, 20, 31, 10, 25,
        14, 19, 9, 13, 8, 7, 6,

Such a function is going to be fast, using only a handful of machine instructions and running in a handful of cycles. Still, Intel’s tzcnt instruction is superior, as it is a single instruction of a cost comparable to a single multiplication. Roughly speaking, we could expect tzcnt to be twice as fast.

Go can call tzcnt through a function written in assembly. So it seems that Go can easily make use of such instructions. Sadly no. Based on Aten’s code, I designed a benchmark. I am using a test server with a Skylake processor running at a flat 3.4 GHz, and my benchmark measures the instruction throughput. I think that the results speak for themselves:

pure Go (de Bruijn)3.55 cycles/call
assembly11.5 cycles/call

In this instance, the function that calls tzcnt (and does little else) runs at nearly half the speed of the pure Go function. Evidently, Go does not take the assembly and inline it.

Programmers have asked the Go authors to inline assembly calls, but there seems to be little support from the core Go team for such an approach.

My point is not that you can’t accelerate Go code using functions written in assembly but rather that if the function is tiny, the function-call overhead will make the performance worse. So you will be better off using pure Go.

The source code is available.

What is a useful theory?

I was an adept, as a teenager and a young adult, of thinkism. Thinkism is the idea that intelligence alone can solve problems. I thought I was smart so that I could just sit down and solve important problems. One after the other. Whatever contributions I ended up making had little to do with sitting down and thinking hard… and much more to do with getting dirty.

When you learn mathematics, you are given theorems that all seem to descend from the minds of brilliant men (and, in some cases, women). So you think that society makes progress when people sit down and think really hard about problems.

Then you realize that it is not quite how the world advances. There is often a myriad of minor conceptual advances… solar panels don’t go from being overly expensive and inefficient to being cheaper than oil due to one breakthrough… the progress follows some kind of random walk. You come to realize that most of the things we think we know are not quite right. Even the simplest ideas are often misleading. We generally move forward, but there are steps backward as well. And, more critically, useful theories from the past can suddenly become anchors.

Progress is Darwinian, not Cartesian.

You can recognize good theoretical ideas because they make you smarter. Algebra is one such idea. For example, when I try to explain mathematical ideas to kids who have not yet taken algebra, I struggle.

Before you learn any new theory, you ought to figure out what kind of problems will become easier.

One concept I have criticized is “probability”. That’s a controversial stance because so much of software, engineering, and machine learning appears to rely on probabilities. My own research makes abundant use of probability theory. However, I struggle to find real-world problems that become easier once you introduce probability theory. It is not that there is any shortage of hard problems worth solving… it is that too few of these problems are made easier when applying probabilities.

I remember being on the examination committee of a Ph.D. student a few years back. She was making use of probabilities and Bayesian networks throughout her presentation. At some point, to test her out, I asked what should have been a trivial hypothetical question… all she needed to do is really know what independent probabilities are… Not only did she failed to answer properly, I suspect that many of the other members of the examining board failed to realize she was confused.

I cannot blame her because I get probabilities wrong all the time. Let us consider the Monty Hall problem, as stated on Wikipedia:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

I have been confused by this problem… that’s ok, maybe I am just not that smart… but I have watched several people with a science Ph.D. being unable to even think about the problem, even when they took all the probability theory that they need to answer this question.

In this case, the problem is not hard if you stop short of using probability theory. You can just enumerate the possibilities:

  1. Door 1: Goat, Door 2: Car, Door 3: Goat
  2. Door 1: Car, Door 2: Goat, Door 3: Goat
  3. Door 1: Goat, Door 2: Goat, Door 3: Car

So you have picked Door 1. In the first scenario, the host will pick Door 2, and switching is the right move. In the second scenario, switching is the bad move. In the last scenario, switching would again be the right move. So in 2 out of 3 scenarios, switching is the right move.

If I explain it in this manner, I am convinced that even 10-year-olds can understand.

I have argued in more details in my blog post Probabilities are unnecessary mathematical artifacts that we should use different tools that make it easier for us to get the right answer.

My friend Antonia Badia wrote to me…

(…) it’s impossible to do Artificial Intelligence today without using probabilities.

I suspect he is right. However, the fact that we use painful tools today can be interpreted as a hint that there are massive conceptual breakthroughs to be made in the near future.

We should not assume that ideas are universally useful. We live on a moving landscape.

Calculus, as it is currently taught, was made exceedingly abstract so that pre-computing scientists could do their work. But in the computing era, it makes less and less sense. Zeilberger correctly argued, in my mind, that we should just do away with the concept of infinity…

So I deny even the existence of the Peano axiom that every integer has a successor. Eventually, we would get an overflow error in the big computer in the sky, and the sum and product of any two integers is well-defined only if the result is less than p, or if one wishes, one can compute them modulo p. Since p is so large, this is not a practical problem, since the overflow in our earthly computers comes so much sooner than the overflow errors in the big computer in the sky. (Zeilberger in “Real” Analysis is a Degenerate Case of Discrete Analysis)

Even if you disagree with Zeilberger, I don’t think that there can be much disagreement on the fact that school-taught mathematics pays little attention to modern-day problems. The long division is about as useful as Latin.

So, yes, a useful theory is one that makes you smarter. We are surrounded by theories that fail to pass this basic test. It is a problem and an opportunity.

Further reading
: Kuhn’s “Structure of Scientific Revolutions”, Pragmatism (Wikipedia)

Science and technology: what happened in 2016

This year, you are able to buy CRISPR-based gene editing toolkits for $150 on the Internet as well as autonomous drones, and you can ask your Amazon Echo to play your favorite music or give you a traffic report. You can buy a fully functional Android tablet for $40 on Amazon. If you have made it to a Walmart near you lately, you know that kids are going to receive dirt cheap remote-controlled flying drones for Christmas this year. Amazon now delivers packages by drone with its Prime Air service. There are 2.6 billion smartphones in the world.

So what else happened in 2016?


Cancer rates have dropped by 23% since 1991. And, no, it is not explained away by the fact that fewer people smoke.

Though most insects are short-lived, we have learned that some ants never age.


In the UK, scientists have been allowed to modify genetically human embryos.

As you age, your body accumulates “senescent cells”. These are non-functional cells that are harmless in small quantities but can cause trouble when they accumulate. Researchers have found that by removing them (using drugs called senolytics), they could extend the life of mice. A company funded by’s CEO Jeff Bezos, Unity Biotechnology, is working to bring this technology to human beings. The CEO of this new company has given a talk which you can watch on YouTube.

Google reports that it can determine the location of almost any picture with superhuman ability. Take a picture of your favorite landscape and Google will tell you where you are just by looking at the picture.

Researchers are attempting to regrow knee cartilage in human beings using stem cells.


Google defeated the world champion at the board game of Go. With the defeat of Kasparov at the hands of IBM’s Deep Blue 20 years ago, there is no longer any board game where human beings are superior to computers.

Google supplemented its Google Photo service with advanced artificial intelligence. If you are looking for pictures of your dog, Google can find them for you, without anyone ever entering metadata.


Europe has authorized a gene therapy to help cure children. Soon enough, we will routinely use genetic engineering to cure diseases.

We have entered the era of fully functional consumer-grade virtual-reality gear. Many of us have experienced high-quality virtual experiences for the first time in 2016. According to some estimates, over 3 million VR units were sold in 2016: 750k PlayStation VR, 261k Google DayDream, 2.3 million Gear VR, 450k HTC Vive and 355k Oculus Rift.

Dementia rates in human beings are falling. We do not know why.


Foxconn, the company that makes the iPhone for Apple, has replaced 60,000 employees with robots.


Pokemon Go is the first massively popular augmented-reality game.

China holds the first clinical trials of CRISPR-based anti-cancer therapies.


Netflix has more subscribers than any other TV cable company.

There are ongoing clinical trials where blood plasma from young people is given to older people, in the hope of rejuvenating them. Blood plasma is abundant and routinely discarded in practice, so if it were to work, it might be quite practical. (Further research that appeared later this year allows us to be somewhat pessimistic regarding the results of these trials in the sense that rejuvenation might require the removal of aging agents rather than the addition of youthful factors.)

Singapore is the first city in the world to have self-driving taxis.

We know that some invertebrates, like lobsters, do not age (like we do). For example, we have found 140-year-old lobsters in the wild recently and they aren’t the oldest. The hydra has constant mortality throughout its life. There is definitively a lot of variation in how different species age. Lots of sea creatures like sharks are both long-lived and vertebrates. We don’t know how long sharks can live, but we found a shark that is at least 272 years old. This suggests that even vertebrates, and maybe mammals, could be effectively ageless.


What would happen if you took stem cells and put them in a brain? Would they turn into functional neurons and integrate into your brain… potentially replacing lost neurons? Researchers have shown that it works in mice.

Japan had 153 centenarians in 1963. There are now over 60,000 centenarians in Japan. According to researchers (Christensen et al., 2009), a child born today in a developed country has a median life expectancy of over 100 years.

The iPhone 7 is just as fast as a 2013 MacBook Pro. That is, our best smartphones from 2016 can run some software just as fast as the best laptop from three years ago.

Software can now read mammograms more accurately than any doctor. Software is also faster and cheaper.

Google upgraded its “Google Translate” service using a new engine leveraging recent progress in “deep learning” algorithms. The quality of the translations has been much improved. What is really impressive is that it works at “Google scale” meaning that all of us can use it at the same time.


The most popular game console of this generation, the PlayStation 4, released a well-reviewed virtual-reality headset.

Between 2011 and 2016, traditional TV viewing by 18-24-year-olds dropped by more than 9 hours per week.

Mammal heart muscles do not regenerate. So if your heart runs out of oxygen and is damaged, you may never recover the loss function. However, scientists have showed (in monkeys) that we could induce regeneration with stem cells.

Microsoft claims to be able to match human beings in conversational speech recognition.

Tesla, the car maker, has enabled self-driving on all its cars. Even General Motors is producing self-driving cars in Michigan. You should still check whether it is legal for you to drive without holding the wheel.


Crazy researchers gave young human blood to old mice. The old mice were rejuvenated. Though we do not know yet what it means exactly, it is strong evidence that your age is inscribed in your blood and that by changing your blood, we can age or rejuvenate you, to a point.

On the same theme, the Conboy lab at Berkeley, with funding from Google (via Calico) has shown, using a fancy blood transfer technique, that there are “aging agents” in old mice blood. If you take these agents and put them into a young mouse, it is aged. Presumably, if we were to identify these agents, we could remove them with drugs or dialysis and rejuvenate old mice. In time, we could maybe rejuvenate human beings as well.

Facebook’s CEO, Mark Zuckerberg, said that it will be soon normal for human beings to live over 100 years. This is supported by the work of demographers such as James W. Vaupel.


It was generally believed that women eventually ran out of eggs and became irreversibly sterile. Researchers have found that a cancer drug can be used to spur the generation of new eggs, thus potentially reversing age-related infertility. In the future, menopause could be reversed.

Bones grown in a lab were successfully transplanted in human beings.

We don’t know exactly how long seabirds could live. It is hard work to study long-lived animals and it takes a long time (obviously). Biologists have encountered unexpected problems such as the fact that the rings that they use to tag the birds wear out faster than the birds do. In 1956, Chandler Robbins tagged an albatross. The 66-year-old albatross laid an egg this year.

Old people (in their 30s and 40s) can have young babies. Evidently, biological cells have a way to reset their age. Ten years ago (in 2006), Shinya Yamanaka showed how to do just that in the lab by activating only four genes. So no matter how old you are, I can take some of your cells and “reset them” so that they are “young again”. So far, however, nobody has known how to apply this to multicellular organism. The New York Times reports that researchers from the Salk Institute did just that. (They have a video on YouTube.) By activating the four genes, they showed that they could rejuvenate human skin tissue in vitro, then they showed that they could extend the lifespan of mice.