Bit Hacking (with Go code)

At a fundamental level, a programmer needs to manipulate bits. Modern processors operate over data by loading in ‘registers’ and not individual bits. Thus a programmer must know how to manipulate the bits within a register. Generally, we can do so while programming with 8-bit, 16-bit, 32-bit and 64-bit integers. For example, suppose that I want to set an individual bit to value 1. Let us pick the bit an index 12 in a 64-bit words. The word with just the bit at index 12 set is 1<<12 : the number 1 shifted to the left 12 times, or 4096. In Go, we format numbers using the fmt.Printf function: we use a string with formatting instructions followed by the values we want to print. We begin a formatting sequence with the letter % which has a special meaning (if one wants to print %, one most use the string %%). It can be followed by the letter b which stands for binary, the letter d (for decimal) or x (for hexadecimal). Sometimes we want to specify the minimal length (in characters) of the output, and we do so by a leading number: e.g, fmt.Printf("%100d", 4096) prints a 100-character string that ends with 4096 and begins with spaces. We can specify zero as a padding character rather than the space by adding it as a prefix (e.g., "%0100d"). In Go, we may print thus the individual bits in a word as in the following example:

package main

import "fmt"

func main() {
    var x uint64 = 1 << 12
    fmt.Printf("%064b", x)
}

Running this program we get a binary string representing 1<<12:

0000000000000000000000000000000000000000000000000001000000000000

The general convention when printing numbers is that the most significant digits are printed first followed by the least significant digits: e.g., we write 1234 when we mean 1000 + 200 + 30 + 4. Similarly, Go prints the most significant bits first, and so the number 1<<12 has 64-13=51 leading zeros followed by a 1 with 12 trailing zeros.

We might find it interesting to revisit how Go represents negative integers. Let us take the 64-bit integer -2. Using two’s complement notation, the number should be represented as the unsigned number (1<<64)-2 which should be a word made entirely one ones, except for the second last bit. We can use the fact that a cast operation in Go (e.g., uint64(x)) preserves the binary representation:

package main

import "fmt"

func main() {
    var x int64 = -2
    fmt.Printf("%064b", uint64(x))
}

This program will print 1111111111111111111111111111111111111111111111111111111111111110 as expected.

Go has some relevant binary operators that we often use to manipulate bits:

&    bitwise AND
|    bitwise OR
^    bitwise XOR
&^   bitwise AND NOT

Furthermore, the symbol ^ is also used to flip all bits a word when used as an unary operation: a ^ b computes the bitwise XOR of a and b whereas ^a flips all bits of a. We can verify that we have a|b == (a^b) | (a&b) == (a^b) + (a&b).

We have other useful identities. For example, given two integers a and b, we have that a+b = (a^b) + 2*(a&b). In the identity 2*(a&b) represents the carries whereas a^b represents the addition without the carries. Consider for example 0b1001 + 0b10001. We have that 0b1 + 0b1 == 0b10 and this is the 2*(a&b) component, whereas 0b1000 + 0b10000 == 0b11000 is captured by a^b. We have that 2*(a|b) = 2*(a&b) + 2*(a^b), thus a+b = (a^b) + 2*(a&b) becomes a+b = 2*(a|b) - (a^b). These relationships are valid whether we consider unsigned or signed integers, since the operations (bitwise logical, addition and subtraction) are identical at the bits level.

Setting, clearing and flipping bits

We know how to create a 64-bit word with just one bit set to 1 (e.g., 1<<12). Conversely, we can also create a word that is made of 1s except for a 0 at bit index 12 by flipping all bits: ^uint64(1<<12). Before flipping all bits of an expression, it is sometimes useful to specify its type (taking uint64 or uint32) so that the result is unambiguous.

We can then use these words to affect an existing word:

  1. If we want to set the 12th bit of word w to one: w |= 1<<12.
  2. If we want to clear (set to zero) the 12th bit of word w: w &^= 1<<12 (which is equivalent to w = w & ^uint64(1<<12)).
  3. If we just want to flip (send zeros to ones and ones to zeros) the 12th bit: w ^= 1<<12.

We may also affect a range of bits. For example, we know that the word (1<<12)-1 has all but the 11 least significant bits set to zeros, and the 11 least significant bits set to ones.

  1. If we want to set the 11 least significant bits of the word w to ones: w |= (1<<12)-1.
  2. If we want to clear (set to zero) the 11 least signficant bits of word w: w &^= (1<<12)-1.
  3. If we want to flip the 11 least signficant bits: w ^= (1<<12)-1.
    The expression (1<<12)-1 is general in the sense that if we want to select the 60 least significant bits, we might do (1<<60)-1. It even works with 64 bits: (1<<64)-1 has all bits set to 1.

We can also generate a word that has an arbitrary range of bits set: the word ((1<<13)-1) ^ ((1<<2)-1) has the bits from index 2 to index 12 (inclusively) set to 1, other bits are set to 0. With such a construction, we can set, clear or flip an arbitrary range of bits within a word, efficiently.

We can set any bit we like in a word. But what about querying the bit sets ? We can check the 12th bit is set in the word u by checking whether w & (1<<12) is non-zero. Indeed, the expression w & (1<<12) has value 1<<12 if the 12th bit is set in w and, otherwise, it has value zero. We can extend such a check: we can verify whether any of the bits from index 2 to index 12 (inclusively) set to 1 by computing w & ((1<<13)-1) ^ ((1<<2)-1). The result is zero if and only if no bit in the specified range is set to one.

Efficient and safe operations over integers

By thinking about values in terms of their bit representation, we can write more efficient code or, equivalent, have a better appreciation for what optimized binary code might look like. Consider the problem of checking if two numbers have the same sign: we want to know whether they are both smaller than zero, or both greater than or equal to zero. A naive implementation might look as follows:

func SlowSameSign(x, y int64) bool {
return ((x < 0) && (y < 0)) || ((x >= 0) && (y >= 0))
}

However, let us think about what distinguishes negative integers from other integers: they have their last bit set. That is, their most significant bit as an unsigned value is one. If we take the exclusive or (xor) of two integers, then the result will have its last bit set to zero if their sign is the same. That is, the result is positive (or zero) if and only if the signs agree. We may therefore prefer the following function to determine if two integers have the same sign:

func SameSign(x, y int64) bool {
    return (x ^ y) >= 0
}

Suppose that we want to check whether x and y differ by at most 1. Maybe x is smaller than y, but it could be larger.

Let us consider the problem of computing the average of two integers. We have the following correct function:

func Average(x, y uint16) uint16 {
    if y > x {
        return (y-x)/2 + x
    } else {
        return (x-y)/2 + y
    }
}

With a better knowledge of the integer representation, we can do better.

We have another relevant identity x == 2*(x>>1) + (x&1). It means that x/2 is within [(x>>1), (x>>1)+1). That is, x>>1 is the greatest integer no larger than x/2. Conversely, we have that (x+(x&1))>>1 is the smallest integer no smaller than x/2.

We have that x+y = (x^y) + 2*(x&y). Hence we have that (x+y)>>1 == ((x^y)>>1) + (x&y) (ignoring overflows in x+y). Hence, ((x^y)>>1) + (x&y) is the greatest integer no larger than (x+y)/2. We also have that x+y = 2*(x|y) - (x^y) or x+y + (x^y)&1= 2*(x|y) - (x^y) + (x^y)&1 and so (x+y+(x^y)&1)>>1 == (x|y) - ((x^y)>>1) (ignoring overflows in x+y+(x^y)&1). It follows that (x|y) - ((x^y)>>1) is the smallest integer no smaller than (x+y)/2. The difference between (x|y) - ((x^y)>>1) and ((x^y)>>1) + (x&y) is (x^y)&1. Hence, we have the following two fast functions:

func FastAverage1(x, y uint16) uint16 {
    return (x|y) - ((x^y)>>1)
}
func FastAverage2(x, y uint16) uint16 {
    return ((x^y)>>1) + (x&y)
}

Though we use the type uint16, it works irrespective of the integer size (uint8, uint16, uint32, uint64) and it also applies to signed integers (int8, int16, int32, int64).

Efficient Unicode processing

In UTF-16, we may have surrogate pairs: the first word in the pair is in the range 0xd800 to 0xdbff whereas the second word is in the range from 0xdc00 to 0xdfff. How may we detect efficiency surrogate pairs? If the values are stored using an uint16 type, then it would seem that we could detect a value part of a surrogate pair with two comparisons: (x>=0xd800) && (x<=0xdfff). However, it may prove more efficient to use the fact that subtractions naturally wrap-around: 0-0xd800==0x2800. Thus x-0xd800 will range between 0 and 0xdfff-0xd800 inclusively whenever we have a value that is part of a surrogate pair. However, any other value will be larger than 0xdfff-0xd800=0x7fff. Thus, a single comparison is needed : (x-0xd800)<=0x7ff.
Once we have determined that we have a value that might correspond to a surrogate pair, we may check that the first value x1 is valid (in the range 0xd800 to 0xdbff) with the condition (x-0xd800)<=0x3ff, and similarly for the second value x2: (x-0xdc00)<=0x3ff. We may then reconstruct the code point as (1<<20) + ((x-0xd800)<<10) + x-0xdc00. In practice, you may not need to concern yourself with such an optimization since your compiler might do it for you. Nevertheless, it is important to keep in mind that what might seem like multiple comparisons could actually be implemented as a single one.

Basic SWAR

Modern processors have specialized instructions capable of operating over multiple units of data with a single instruction (called SIMD for Single Instruction Multiple Data). We can do several operations using a single instruction (or few) instructions with a technique called SWAR (SIMD within a register) (Lamport, 1975). Typically, we are given a 64-bit word w (uint64) and we want to treat it as a vector of eight 8-bit words (uint8).

Given a byte value (uint8) I can have it replicated over all bytes of a word with a single multiplication: x * uint64(0x0101010101010101). For example, we have 0x12 * uint64(0x0101010101010101) == 0x1212121212121212. This approach can be generalized in various ways. For example, we have that 0x7 * uint64(0x1101011101110101) == 0x7707077707770707.

For convenience, let us define b80 to be the uint64 equal to 0x8080808080808080 and b01 be the uint64 equal to 0x0101010101010101. We can check whether all bytes are smaller than 128. We first replicate the byte value with all but the most significant bit set to zero (0x80 * b01 or b80) and then we compute the bitwise AND with our 64-bit word and check whether the result is zero: (w & b80)) == 0. It might compile to two or three instructions on a processor.

We can check whether any byte is zero, assuming that we have checked that they are smaller than 128, with an expression such as ((w - b01) & b80) == 0. If we are not sure that they are smaller than zero, we can simply add an operation: (((w - b01)|w) & b80) == 0. Checking that a byte is zero allows us to check whether two words, w1 and w2, have a matching byte value since, when this happens, w1^w2 has a zero byte value.

We can also design more complicated operations if we assume that all byte values are no larger than 128. For example, we may check that all byte values are no larger than a 7-bit value (t) by the following routine: ((w + (0x80 - t) * b01) & b80) == 0. If the value t is a constant, then the multiplication would be evaluated at compile time and it should be barely more expensive than checking whether all bytes are smaller than 128. In Go, we check that no byte value is larger than 77, assuming that all byte values are smaller than 128 by verifying thaat b80 & (w+(128-77) * b01) is zero. Similarly, we can check that all byte values are at least as large a 7-bit t, assuming that they are also all smaller than 128: ((b80 - w) + t * b01) & b80) == 0. We can generalize further. Suppose we want to check that all bytes are at least as large at the 7-bit value a and no larger than the 7-bit value b. It suffices to check that ((w + b80 - a * b01) ^ (w + b80 - b * b01)) & b80 == 0.

Rotating and Reversing Bits

Given a word, we say that we rotate the bits if we shift left or right the bits, while moving back the leftover bits at the beginning. To illustrate the concept, suppose that we are given the 8-bit integer 0b1111000 and we want to rotate it left by 3 bits. The Go language provides a function for this purpose (bits.RotateLeft8 from the math/bits package): we get 0b10000111. In Go, there is no rotate right operation. However, rotating left by 3 bits is the same as rotating right by 5 bits when processing 8-bit integers. Go provide rotation functions for 8-bit, 16-bit, 32-bit and 64-bit integers.

Suppose that you would like to know if two 64-bit words (w1 and w2) have matching byte values, irrespective of the ordering. We know how to check that they have matching ordered byte values efficiently (e.g., (((w1^w2 - b01)|(w1^w2)) & b80) == 0. To compare all bytes with all other bytes, we can repeat the same operation as many times as they are bytes in a word (eight times for 64-bit integers): each time, we rotate one of the words by 8 bits:

(((w1^w2 - b01)|(w1^w2)) & b80) == 0
w1 = bits.RotateLeft64(w1,8)
(((w1^w2 - b01)|(w1^w2)) & b80) == 0
w1 = bits.RotateLeft64(w1,8)
...

We recall that words can be interpreted as little-endian or big-endian depending on whether the first bytes are the least significant or the most significant. Go allows you to reverse the order of the bytes in a 64-bit word with the function bits.ReverseBytes64 from the math/bits package. There are similar functions for 16-bit and 32-bit words. We have that bits.ReverseBytes16(0xcc00) == 0x00cc. Reversing the bytes in a 16-bit word, and rotating by 8 bits, are equivalent operations.

We can also reverse bits. We have that bits.Reverse16(0b1111001101010101) == 0b1010101011001111. Go has functions to reverse bits for 8-bit, 16-bit, 32-bit and 64-bit words. Many processors have fast instructions to reverse the bit orders, and it can be a fast operation.

Fast Bit Counting

It can be useful to count the number of bits set to 1 in a word. Go has fast functions for this purpose in the math/bits package for words having 8 bits, 16 bits, 32 bits and 64 bits. Thus we have that bits.OnesCount16(0b1111001101010101) == 10.

Similarly, we sometimes want to count the number of trailing or leading zeros. The number of trailing zeros is the number of consecutive zero bits appearing in the least significant positions.For example, the word 0b1 has no trailing zero, whereas the word 0b100 has two trailing zeros. When the input is a power of two, the number of trailing zeros is the logarithm in base two. We can use the Go functions bits.TrailingZeros8, bits.TrailingZeros16
and so forth to compute the number of trailing zeros. The number of leading zeros is similar, but we start from the most significant positions. Thus the 8-bit integer 0b10000000 has zero leading zeros,
while the integer 0b00100000 has two leading zeros. We can use the functions bits.LeadingZeros8, bits.LeadingZeros16
and so forth.

While the number of trailing zeros gives directly the logarithm of powers of two, we can use the number of leading zeros to compute the logarithm of any integer, rounded up to the nearest integer. For 32-bit integers, the following function provides the correct result:

func Log2Up(x uint32) int {
    return 31 - bits.LeadingZeros32(x|1)
}

We can also compute other logarithms. Intuitively, this ought to be possible because if log_b is the logarithm in base b, then log_b (x) = \log_2(x)/\log_2(b). In other words, all logarithms are within a constant factor (e.g., 1/log_2(b)).

For example, we might be interested in the number of decimal digits necessary to represent an integer (e.g., the integer 100 requires three digits). The general formula is ceil(log(x+1)) where the logarithm should be taken in base 10. We can show that the following function (designed by an engineer called Kendall Willets) computes the desired number of digits for 32-bit integers:

func DigitCount(x uint32) uint32 {
    var table = []uint64{
        4294967296, 8589934582, 8589934582,
        8589934582, 12884901788, 12884901788,
        12884901788, 17179868184, 17179868184,
        17179868184, 21474826480, 21474826480,
        21474826480, 21474826480, 25769703776,
        25769703776, 25769703776, 30063771072,
        30063771072, 30063771072, 34349738368,
        34349738368, 34349738368, 34349738368,
        38554705664, 38554705664, 38554705664,
        41949672960, 41949672960, 41949672960,
        42949672960, 42949672960}
    return uint32((uint64(x) + table[Log2Up(x)]) >> 32)
}

Though the function is a bit mysterious, its computation mostly involves computing the number of trailing zeros, and using the result to lookup a value in a table. It translates in only a few CPU instructions and is efficient.

Indexing Bits

Given a word, it is sometimes useful to compute the position of the set bits (bits set to 1). For example, given the word 0b11000111, we would like to have the indexes 0, 1, 2, 6, 7 corresponding to the 5 bits with value 1. We can determine efficiently how many indexes we need to produce thanks to the bits.OnesCount functions. The bits.TrailingZeros functions can serve to identify the position of a bit. We may also use the fact that x & (x-1) set to zero the least significant 1-bit of x. The following Go function generates an array of indexes:

func Indexes(x uint64) []int {
    var ind = make([]int, bits.OnesCount64(x))
    pos := 0
    for x != 0 {
        ind[pos] = bits.TrailingZeros64(x)
        x &= x - 1
        pos += 1
    }
    return ind
}

Given 0b11000111, it produces the array 0, 1, 2, 6, 7:

var x = uint64(0b11000111)
for _, v := range Indexes(x) {
    fmt.Println(v)
}

If we want to compute the bits in reverse order (7, 6, 2, 1, 0), we can do so with a bit-reversal function, like so:

for _, v := range Indexes(bits.Reverse64(x)) {
    fmt.Println(63 - v)
}

Conclusion

As a programmer, you may access, set, copy, or move individual bit values efficiently. With some care, you can avoid arithmetic overflows without much of a performance penalty. With SWAR, you can use a single word as if it was made of several subwords. Though most of these operations are only rarely needed, it is important to know that they are available.

Daniel Lemire, "Bit Hacking (with Go code)," in Daniel Lemire's blog, February 7, 2023.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

12 thoughts on “Bit Hacking (with Go code)”

  1. Interesting article, skimmed through it, lots of known techniques that are worth publicizing more!

    The discussion of the knowledge of the internal representation of values made me think back to something I stumbled on recently in one Intel library.
    It was a way to compute leading binary zeroes for architectures that don’t have CLZ (I assume). Roughly this (simplified):

    typedef union {
    int64_t ui64;
    double dbl;
    } doubleint;

    int clz(int64_t x) {
    doubleint tmp;
    if (x >= 0x0020000000000000ull) { // x >= 2^53
    // split the 64-bit value in two 32-bit halves to avoid rounding errors
    tmp.dbl = (double) (x >> 32); // exact conversion
    return 31 - ((((unsigned int) (tmp.ui64 >> 52)) & 0x7ff) - 0x3ff);
    } else { // if x < 2^53
    tmp.dbl = (double) x; // exact conversion
    return 63 - ((((unsigned int) (tmp.ui64 >> 52)) & 0x7ff) - 0x3ff);
    }
    }

    What happens here is that the author exploits the fact that double normalizes the mantisa, and then extracts the number of leading zeroes from the double's exponent.

    Fast? Yes, if we don’t have CLZ. So no 🙂

    Beautiful? For me, yes! 🙂

  2. lots of known techniques that are worth publicizing more!

    I think you can spend all your life being a very productive programmer without knowing any of this well. But I think you should know that it exists if you want to do serious programming.

  3. “Once we have determined that we have a value that might correspond to a surrogate pair, we may check that the first value x1 is valid (in the range 0xd800 to 0xdbff) with the condition (x-0xd800)<=0x3ff, and similarly for the second value x2: (x-0xdc00)<=0x3ff. ”

    An initial (separate) test (x-0xd800) <=x7ff) of high and low code unit is superfluous; ((x-0xd800)<=0x3ff) && ((x-0xdc00)<=0x3ff) does the job.
    What you but failed to mention is: the latter can be simplified to ((x-0xd800)|(x-0xdc00))<=0x3ff eventually saving the conditional branch for &&

  4. “We may then reconstruct the code point as ((x-0xd800)<<10) + x-0xdc00.”

    Nope! The code point is (1<<20) + ((x-0xd800)<<10) + x-0xdc00 or the equivalent (1<<20) | ((x-0xd800)<<10) | (x-0xdc00).

  5. Consider for example 0b1001 + 0b01001. We have that 0b1 + 0b1 == 0b10 and this is the 2*(a&b) component, whereas 0b1000 + 0b01000 == 0b11000 is captured by a^b.

    Hello, there is an error here, 0b1000 + 0b01000 == 0b11000 is incorrect, the second term should be 0b10_000, not 0b01_000.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.