# Bit packing is fast, but integer logarithm is slow

In How fast is bit packing?, we saw how to store non-negative integers smaller than 2^{N} using *N* bits per integer by a technique called bit packing. A careful C++ bit packing implementation is fast: e.g., over 1 billion integers per second.

However, before you pack the integers, you might need to scan them to determine the number of bits needed (*N*). Unfortunately, it is a relatively expensive process.

Given a positive integer *x*, we seek the smallest integer *N* such that the integer *x* is less than 2^{N}. The value *N* is often called the *integer logarithm* of *x*.

There are several clever techniques to compute the integer logarithm using portable C code. Yet you can do better using processor-specific instructions. The GNU GCC compiler makes this easy with a special function that counts the number of leading zeros for 32-bit integers (`__builtin_clz`). Even so, it is relatively slow.

Thankfully, you can avoid computing the integer logarithm of each integer by a simple test involving a right shift:

if((x>>b) !=0)

b = integer_logarithm(x);

With proper loop unrolling, this is nearly as fast as bit packing.

**Update**: Preston Bannister correctly points out that you can do much better. Simply compute the logical or between all integers and then compute the integer logarithm of the result. It is much, much faster.

To experiment with this problem, I wrote a small program which finds the maximum integer logarithm of a large array of random integers. It then packs the integers using this logarithm.

- I find that I can pack between 1 billion and 2 billions integers per second.
- I compute the maximum integer logarithm at a rate of 3 billion integers per second.

When plotting the speeds as functions of the actual maximum integer logarithm, we see that the computation of the logarithm is not sensitive to the value of the actual logarithm, except for the approach based on the `__builtin_clz` function which is slower when the logarithm is less than 8.

In my tests, I used the GNU GCC 4.6.2 compiler on an Intel core i7 processor. My code is freely available.

**Conclusion** When packing an array of integers, finding the maximum logarithm can take anywhere from 1/4 to 1/3 of the running time. However, brute force techniques that compute the integer logarithm of every integer are much slower.

First, OR together all the integers (very fast), then right shift the result until you get all zeros (count the shifts).

Or am I missing something?

Comment by Preston L. Bannister — 5/4/2012 @ 17:17

Thanks for the blog and the code! I like your blog.

You should get a github/bitbucket account so we can browse your past code easier.

Thanks!

Comment by Luke — 5/4/2012 @ 18:58

@Luke

Thanks you are right.

Yes, I should create a repository on github for this code, you are right.

Comment by Daniel Lemire — 5/4/2012 @ 19:10

@Preston

As usual, you make me look like an idiot.

I’m updating my blog post.

Comment by Daniel Lemire — 5/4/2012 @ 20:15