Signed integer division by a power of two can be expensive!

Remember when you learned the long division algorithm in school? It was painful, right?

It turns out that even on modern processors, divisions are expensive. So optimizing compilers try to avoid them whenever possible. An easy case is the division by a power of two. If a compiler sees x / 2 when x is an unsigned integer then it knows that it can simply “shift” the bits of the variable x by 1 because data is represented in binary. A shift is a relatively inexpensive operation: it completes in a single CPU cycle on most processors…

Some programmers cannot resist and they will write x >> 1 instead of x / 2. That’s mostly wasted time, however. Optimizing compilers can be counted to be smart enough to figure it out.

What if x is a signed integer? Sadly, a simple shift is no longer sufficient and several instructions must be used. Most compilers seem to generate about 4 to 5 instructions. Some versions of the Intel compiler generate three separate shifts.

This means that, some of the time, if we use x >> 1 (or x >>> 1 in Java) instead of x / 2, we might get a different performance even if the actual value stored in x is positive.

We might think that it is no big deal. It is still going to be super fast, right?

Let us consider a generally useful algorithm: the binary search. It finds the location of a value in a sorted array. At the core of the algorithm is the need to divide a length by two repeatedly. Looking at Java’s source code, we find how the Java engineers implemented the binary search in the standard library:

public static int BinarySearch(int[] array, int ikey) {
    int low = 0;
    int high = array.length - 1;
    while (low <= high) {
        final int middleIndex = (low + high) >>> 1;
        final int middleValue = array[middleIndex];

        if (middleValue < ikey)
            low = middleIndex + 1;
        else if (middleValue > ikey)
            high = middleIndex - 1;
        else
            return middleIndex;
    }
    return -(low + 1);
}

Notice the shift? Let us rewrite the function with an integer division instead:

public static int BinarySearch(int[] array, int ikey) {
    int low = 0;
    int high = array.length - 1;
    while (low <= high) {
        final int middleIndex = (low + high) / 2;
        final int middleValue = array[middleIndex];

        if (middleValue < ikey)
            low = middleIndex + 1;
        else if (middleValue > ikey)
            high = middleIndex - 1;
        else
            return middleIndex;
    }
    return -(low + 1);
}

It is nearly the same function. You may expect that it will have nearly the same performance.

Maybe surprisingly, that’s not true at all.

I wrote a little Java benchmark. It shows that the version with integer division runs at 2/3 the speed of the version with a shift. That’s a severe performance penalty.

The result is not specific to Java, it also holds in C.

The lesson?

When working with signed integer, do not assume that the compiler will turn divisions by powers of twos into code that nearly as efficiently as a single shift.

Credit: The observation is based on work by Owen Kaser.

6 thoughts on “Signed integer division by a power of two can be expensive!”

  1. Keep in mind that signed right shift are implementation dependent in C! (And signed left shifts are in general undefined). You should put in place compile time checks when using them.

  2. My benchmark result:
    # JMH version: 1.19
    # VM version: JDK 1.8.0_144, VM 25.144-b01
    # VM invoker: C:\work\JDK8\jre\bin\java.exe
    # VM options:

    Benchmark Mode Cnt Score Error Units
    IntBinarySearch.SequentialSearch thrpt 5 49816.892 ± 579.960 ops/s
    IntBinarySearch.branchlessBinarySearch thrpt 5 587342.727 ± 21659.254 ops/s
    IntBinarySearch.branchyBinarySearch thrpt 5 1000197.755 ± 40677.488 ops/s
    IntBinarySearch.branchyBinarySearchWithDivision thrpt 5 856292.173 ± 56034.160 ops/s
    IntBinarySearch.standardBinarySearch thrpt 5 1019794.676 ± 19766.743 ops/s

Leave a Reply

Your email address will not be published. Required fields are marked *