The surprising cleverness of modern compilers

I wanted to know how a modern C compiler like clang would process the following C code:

#include <stdint.h>
int count(uint64_t x) {
  int v = 0;
  while(x != 0) {
    x &= x - 1;
  return v;

Can you guess?

popcntq	%rdi, %rax

That is right. A fairly sophisticated C function, one that might puzzle many naive programmers compiles down to a single instruction. (Tested with clang 3.8 using -O3 -march=native on a recent x64 processor.)

What does that mean? It means that C is a high-level language. It is not “down to the metal”. It might have been back when compilers were happy to just translate C into correct binary code… but these days are gone. One consequence of the cleverness of our compilers is that it gets hard to benchmark “algorithms”.

In any case, it is another example of externalized intelligence. Most people, most psychologists, assume that intelligence is what happens in our brain. We test people’s intelligence in room, disconnected from the Internet, with only a pencil. But my tools should get as much or even more credit than my brain for most of my achievements. Left alone in a room with a pencil, I’d be a mediocre programmer, a mediocre scientist. I’d be no programmer at all. And this is good news. It is hard to expand or repair the brain, but we have a knack for building better tools.

Daniel Lemire, "The surprising cleverness of modern compilers," in Daniel Lemire's blog, May 23, 2016.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

18 thoughts on “The surprising cleverness of modern compilers”

  1. Note, however, that this optimization in LLVM is highly-specialized, and it pattern-matches against the exact implementation you gave and turns it into a popcnt. There is no magic going on, and any number of semantics-preserving rewrites of the C code will cause LLVM to miss the optimization, because the pattern isn’t recognized any longer.

  2. I probably wouldn’t understand an in-depth explanation, but I would still like to know – at least on superficial level – how clang does this. It seems to me that an extremely sophisticated algorithm would be needed to achieve this.

    What if you change the code slightly? Like change x != 0 to x != 1 or something.

    1. It does that by transforming the AST in a normal form and then performing some pattern matching on that, trying to find common subsection of codes it can optimize. It can work wonderfully but all these patterns are handcrafted and to be really useful they often require the programmer to know how to write the code the make it easy for the compiler to recognize them.

    1. Current versions of clang and gcc will also optimize this (the K&R implementation for counting bits) to popcount. I tested with the Godbolt compiler explorer):

      static inline size_t _count_bits(uint64_t n) {
      size_t count;
      for (count = 0; n; count++) n &= n - 1;
      return count;

      Another example is this code, which performs a bitwise left shift with wrap (unlike the << operator), and is optimized to either a rotl or rotr instruction depending on the shift size.

      static inline uint64_t rotl(const uint64_t x, int k) {
      return (x << k) | (x >> (64 - k));

      I really wish there were a solid cross-platform library for bitwise operations like these which resolve to native instructions where possible, and fall back to decent software implementations on architectures where those instructions aren’t present. Maybe there is and I don’t know what it’s called, or there is a bigger picture I’m missing.

  3. That’s not magic when you have a basic understanding on hamming weight.

    C is a general-purpose language. You can use it as you like.

    Disable all compiler optimization and than look at the asm code.

  4. And how many CPU cycles does it take for popcnt to complete? Compiler might save on program size, but you’re ultimately dependent on microcode for non-trivial instructions. That’s not magic either.

  5. > What does that mean? It means that C is a high-level language.

    No. This means that C has high-level compilers, if anything.

    They happen to recognize the pattern (or patterns) in the code outlined as special cases for optimization, and finally manage to optimize it into a single assembly instruction.

    C is still as low level as it was.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.