Most generic compression techniques are based on variations on run-length encoding (RLE) and Lempel-Ziv compression. Compared to these techniques and on the right data set, frame-of-reference and delta coding can be faster for a comparable compression rate.

Mathematically, frame-of-reference and delta coding use the same principle: we apply an invertible transformation that maps a set of (relatively) large integers to mostly smaller integers. (This is a common pattern when compressing data).

Suppose that you wish to compress a sequence of (non-negative) integers. Consider the following sequence:

`107,108,110,115,120,125,132,132,131,135.`

We could store these 10 numbers as 8-bit integers using 80 bits in total. For example, we have that 135 is 10000111 in binary notation.

The frame-of-reference approach begins by computing the range and minimum of the array. We see that the numbers range from 107 and 135. Thus, instead of coding the original sequence, we can subtract 107 from each value and code this difference instead:

`0, 1, 3, 8, 13, 18, 25, 25, 24, 28.`

We can code each offset value using no more than 5 bits. For example, 28 is 11100 in binary notation. Of course, we still need to store the minimum value (107) using 8 bits, and we need at least 3 bits to record the fact that only 5 bits per value are used. Nevertheless, the total (8+3+9*5=45) is much less than the original 80 bits. In actual compression software, you would decompose the data into blocks that are maybe larger than 10 values (say 16, 128 or 2048 values). The overhead of storing the minimal value would be small. Moreover, there are computational side benefits to this format: if we seek the value 1000, we know it cannot be in the block if its minimum is 107 and we use only 5 bits to store the offset from 107.

Frame-of-reference works when the range of values in each block is relatively small. We can sometimes get better compression if the difference between the values is small. In this case, it is useful to look at the differences between successive values (e.g., 108-107=1, 110-108=2, 115-110=5):

`1,2,5,5,5,7,0,-1,4.`

Given this set of differences and the initial value (107), we can reconstruct the original sequence. *Delta coding* is the compression strategy where we store these differences instead of the original values. Some people like to think of delta coding as a predictive scheme: you constantly predict that the next value will be like the previous one, and you just code the difference between your prediction and the observed value.

In binary, the values 1,2,5,7 and 4 can be written as 001, 010, 101, 111, 100. If we did not have a negative value (-1), we could store these differences using only 3 bits per value. The negative value comes from the fact that our values are not entirely sorted (just locally so). However, as we shall see, this single negative value will cause us some trouble. How do we code the -1?

- The original values are 8-bit values. This means that -1 and 256-1 are the same numbers (modulo 256). That is 25+255 modulo 256 is 24. In effect, we compute differences in an integer ring. The differences become
`1,2,5,5,5,7,0,255,4.`

Computing the modulo with a power of two is fast because computers use the binary format natively. - If you know the value that was predicted (25 in our case). You know that the range of differences goes from -25 to 230. Thus for differences
*x*between -25 and 25, we store them as 2*x*if it is positive and as -2*x*-1 if it is negative. Otherwise, we store it as*x*+25. One problem with this approach is that it may require much branching: the processor has to constantly check conditions before proceeding further. There may be a substantial penalty to pay when using modern superscalar processors.

Thankfully, you can use a trick called zig-zag encoding to avoid the branching. In effect, you map`x`to`(x << 1) ^ (x >> 31)`(in C or Java). This transformation can then be reversed by

mapping the result`y`to`((y >>> 1) ^ ((y << 31) >> 31)`where`>>>`is the unsigned right shift. - We can replace subtractions by bitwise exclusive or (xor) operations. It bypasses the issue entirely because xoring integers never generates negative values. The successive xor values are
`7,2,29,11,5,249,0,7,4.`

A benefit of the xor operation is that it is symmetric: x xor y is y xor x. This means that inverting the order of the original list, we would simply invert the order of the list of differences. Obviously, computing the xor is quite fast.

Once we have the list of differences as non-negative numbers, we can then try to store them by using as few bits as possible. Unfortunately, in our case, we could to the conclusion that we need 8 bits to store the differences. We remarked however that for all but one value, 3 bits per difference would suffice.

So a sensible solution is to code the first 3 bits of each differences: `001, 010, 101, 101, 101, 111, 000, 111, 100.`

And then we add a pointer to the second last difference to indicate that we are missing 5 bits (11111). The cost of coding this exception is about 13 bits. So the total storage cost would be (8+3+9*3+13=51). In this case, frame-of-reference is preferable to delta coding, but both are preferable to the original 8-bit coding which used 80 bits.

There are many possible variations. For example, you can also use exception technique with the frame-of-reference approach when almost all values fit in a range of values, except for a few.

**Further reading**: the document SZIP Compression in HDF Products and the corresponding CCSDS 120.0-G-2 data compression standard describe the application of delta coding for scientific data. Michael Dipperstein’s page provides a nice overview of generic compression techniques. The specific exception technique I described is from the NewPFD scheme first described in:

H. Yan, S. Ding, T. Suel, Inverted index compression and query processing with optimized document ordering, in: WWW ā09, 2009.

See also my blog post How fast is bit packing?

I would also recommend a a link to http://www.springerlink.com/content/j66851228120170t/

Delta and Golomb codes are kinda slow to be used in a high-throughput search engine (though they definitely make sense in other apps). Byte and word-aligned codes work much faster in this case.

@Itman

Golomb-Rice is definitively a bit slow for high-throughput applications. I am not sure that it is true for Yan et al.’s NewPFD. According to Delbru et al., it is just as fast, and even faster, than word-aligned codes (S9-64):

R. Delbru, S. Campinas, K. Samp Giovanni Tummarello, Adaptive frame of reference for compressing inverted lists, DERI Technical Report 2010-12-16, 2010.

Delbru also posted a comparison on the Lucene mailing list.

Cool, I am certainly going to check it out. With respect to the Golomb and Delta codes, by slow I mean really slow. Even the not-so-efficient VBC is twice is fast as Golomb.

@Itman

I agree regarding Golomb-Rice coding, and I believe that’s because it introduces a lot of branching. But the Delbru reference hints that delta coding can be quite fast when properly implemented. Moreover, szip which uses delta coding can be quite a bit faster than gzip during compression.

Why do you think that delta coding is necessarily slow? For example, if I notice that within blocks successive xors are small integers, and I pack them, this ought to be very fast (no branching, and only very fast operations). In fact, I am pretty sure I can implement it using no more than 1 CPU cycle per value on average with pipelining, and maybe less. Of course, it may compress poorly, but that’s another story.

From my experience: just re-ran a test on my laptop. Well, of course, I cannot guarantee that this holds true for all implementations and platforms. New architectures have amazing surprises like

1) Unfolding loops does not help

2) Aligned reading is as good as unaligned

Anyways, for my implementations the differences between VBC and Delta is not even two-fold, it is 10+-fold. Perhaps, I should rexamine my code some day.

@Itman

Would you share your code with me? I’d like to examine it. (I don’t need a license to it, I just would like to see exactly what you are comparing.)

Sure, no problem. I will it do it next week, because I want to do a quick check myself.

Just ran into this page after googling “delta compression xor”.

It seems to me the hoops to jump through in the unsorted case for delta encoding (negative deltas) go away if you encode your deltas using base -2 . Then, all you need is for the absolute value of the differences to be small. There are then tricks to compute using the -2 encoded numbers directly.

Can you elaborate?

What I mean is, related to the problem with coding up the -1,

You offered three solutions: xor rather than delta which is good except in cases where the hamming distance is large (even though the delta is small), a second solution that I didn’t fully follow and one where “we add a pointer to the second last difference to indicate that we are missing 5 bits (11111).”.

Another approach is the following: Using base -2 to encode the deltas. Hacker’s delight explains base -2 better than I can. But roughly: a number b_k, … , b1 b0 corresponds to b0 – 2 (b1) + 4 (b2) – 8 (b3) …. + (-2)^k (b_k). In practice, this means 0 maps to ‘0’, 1 maps to ‘1’, -1 maps to ’11’, 2 maps to ‘110’, -2 maps to ’10’, 3 maps to ‘111’. The property I care about in this case is that numbers with small absolute values will have small numbers non-zero bits. A positive number may be penalized in this case since it requires slightly more bits.

I haven’t done a full analysis of when this will be a win over the other methods (how many bits does it actually take for your sequence, how much does encoding/ decoding cost), but there are tricks to operate add the numbers without fully decoding them. It was just exciting because it looks like a possible application of this encoding š

I have updated the blog post with zig-zag encoding. I think it is closely related to your concept of base -2.