Compressed bitset libraries in C and C++

The bitset data structure is a clever way to represent efficiently sets of integers. It supports fast set operations such as union, difference, intersection. For better scalability, we compress bitsets. Bitsets are not always the right data structure, but when they are applicable, they work well.

Here are some open-source libraries implementing (compressed) bitsets in C and C++:

  • CRoaring: It implements the Roaring compressed format in C, with a C++ wrapper. Works with GCC, clang, and Visual Studio. Hosted on GitHub.
  • EWAHBoolArray: It implements the EWAH compressed format in C++. A C version of this library is included, in part, within Git, a tool familiar to many programmers. Similar to WAH and Concise (see below), but faster. Works with GCC, clang, and Visual Studio. Hosted on GitHub.
  • cbitset: It implements an uncompressed bitset in C. Works with GCC and clang. Hosted on GitHub.
  • Concise: This C++ library implements both the WAH and CONCISE compressed formats. Works with GCC and clang. Hosted on GitHub.
  • BitMagic: This C++ library implements its own compressed format. Somewhat similar to Roaring, but can use more memory. Works with GCC, clang, Visual Studio. Hosted on sourceforge.

I can vouch for all of these libraries: I would use them in production. They are all available under liberal licenses. I should add that I am involved with all of them, except BitMagic.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

9 thoughts on “Compressed bitset libraries in C and C++”

  1. Hi Daniel,

    Nice post! I’ve peeked around a few of these myself before in a search for a good dynamically-sized bitset for C++. Do you have any thoughts on if it might make sense to add rank and select operations to any of these representations? Usually, when I’m using a compressed (or even uncompressed) bit-vector, rank, select and access are the 3 operations I’m most in need of.

  2. How would you compare the bitset library in stl bundled in standard c++ with the libraries you listed. Is there any use case where it is clearly advantageous to use third party bitset libraries as opposed to the standard library? Thanks!

  3. Hello and thank you for the links.
    Are those libraries suitable for building a Bloom filter?
    I understand they are overkill, but do their storage and read access performances make the a good choice?

  4. None of these libraries expose functions to shift (as in bit shift) bits around, I guess these operations don’t play well with compression. Are there any compressed bitset algorithms that could potentially support these operations with reasonable efficiency?

Leave a Reply to Daniel Lemire Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.