Remember when you learned the long division algorithm in school? It was painful, right?

It turns out that even on modern processors, divisions are expensive. So optimizing compilers try to avoid them whenever possible. An easy case is the division by a power of two. If a compiler sees `x / 2` when `x` is an unsigned integer then it knows that it can simply “shift” the bits of the variable `x` by 1 because data is represented in binary. A shift is a relatively inexpensive operation: it completes in a single CPU cycle on most processors…

Some programmers cannot resist and they will write `x >> 1` instead of `x / 2`. That’s mostly wasted time, however. Optimizing compilers can be counted to be smart enough to figure it out.

What if `x` is a signed integer? Sadly, a simple shift is no longer sufficient and several instructions must be used. Most compilers seem to generate about 4 to 5 instructions. Some versions of the Intel compiler generate three separate shifts.

This means that, some of the time, if we use `x >> 1` (or `x >>> 1` in Java) instead of `x / 2`, we might get a different performance even if the actual value stored in `x` is positive.

We might think that it is no big deal. It is still going to be super fast, right?

Let us consider a generally useful algorithm: the binary search. It finds the location of a value in a sorted array. At the core of the algorithm is the need to divide a length by two repeatedly. Looking at Java’s source code, we find how the Java engineers implemented the binary search in the standard library:

public static int BinarySearch(int[] array, int ikey) { int low = 0; int high = array.length - 1; while (low <= high) { final int middleIndex = (low + high) >>> 1; final int middleValue = array[middleIndex]; if (middleValue < ikey) low = middleIndex + 1; else if (middleValue > ikey) high = middleIndex - 1; else return middleIndex; } return -(low + 1); }

Notice the shift? Let us rewrite the function with an integer division instead:

public static int BinarySearch(int[] array, int ikey) { int low = 0; int high = array.length - 1; while (low <= high) { final int middleIndex = (low + high) / 2; final int middleValue = array[middleIndex]; if (middleValue < ikey) low = middleIndex + 1; else if (middleValue > ikey) high = middleIndex - 1; else return middleIndex; } return -(low + 1); }

It is nearly the same function. You may expect that it will have nearly the same performance.

Maybe surprisingly, that’s not true at all.

I wrote a little Java benchmark. It shows that the version with integer division runs at 2/3 the speed of the version with a shift. That’s a severe performance penalty.

The result is not specific to Java, it also holds in C.

The lesson?

When working with signed integer, do not assume that the compiler will turn divisions by powers of twos into code that nearly as efficiently as a single shift.

**Credit**: The observation is based on work by Owen Kaser.

Performance aside, the way you calculated middleIndex (both versions) is buggy (https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html).

Sure, though if you have arrays spanning 4GB of memory, you probably have other problems to worry about.

That is a nice pointer, but the >>> should be right, actually!

http://stackoverflow.com/a/19058871/2127435

The page you are linking even says that >>> 1 is one of methods to fix it.

Keep in mind that signed right shift are implementation dependent in C! (And signed left shifts are in general undefined). You should put in place compile time checks when using them.