Current practical database compression techniques stress speed over compression:
- Vectorwise is using Super-scalar RAM-CPU cache compression which includes a carefully implemented dictionary coder.
- C-store—and presumably Vertica—is using similar compression techniques as well as simple run-length encoding of projection indexes.
- Oracle is compressing its bitmaps using Byte-aligned bitmap compression—a variation on run-length encoding.
- Wu et al.’s Fastbit as well as my very own Lemur Bitmap Index C++ Library use less aggressive compression techniques, for faster results. In fact, one my recent empirical results is that on a two-CPU dualcore machine, using 64-bit words instead of 32-bit words in word-aligned compression—which nearly halves the compression—can make the processing faster.
- LucidDB similarly compresses its bitmap indexes with a simple variation on run-length encoding.
Given that we have more and more CPU cores—and generally more powerful CPUs—shouldn’t we compress the data more aggressively?
As the CPUs get more powerful, shouldn’t all database become I/O bound? That is, the main difficulty is to bring enough data into the CPUs?
- As we have more CPU cores, we also have more bandwidth to bring data to the the cores. Otherwise, CPU cores would be constantly data-starved in most multimedia and business applications.
- Multicore CPUs are not the only game in town: RAM and storage have also been revolutionized. In 2009, it is not unpractical to run entirely database applications from RAM. After all, many business databases fit in 16 GB for RAM. And even when we do not have enough RAM, we have SSDs.
- Lightweight compression techniques often favor vectorization which is also getting more and more important and powerful.
Hence, for most database applications fast decompression remains preferable to aggressive compression.