The bitset data structure is a clever way to represent efficiently sets of integers. It supports fast set operations such as union, difference, intersection. For better scalability, we compress bitsets. Bitsets are not always the right data structure, but when they are applicable, they work well.
Here are some open-source libraries implementing (compressed) bitsets in C and C++:
- CRoaring: It implements the Roaring compressed format in C, with a C++ wrapper. Works with GCC, clang, and Visual Studio. Hosted on GitHub.
- EWAHBoolArray: It implements the EWAH compressed format in C++. A C version of this library is included, in part, within Git, a tool familiar to many programmers. Similar to WAH and Concise (see below), but faster. Works with GCC, clang, and Visual Studio. Hosted on GitHub.
- cbitset: It implements an uncompressed bitset in C. Works with GCC and clang. Hosted on GitHub.
- Concise: This C++ library implements both the WAH and CONCISE compressed formats. Works with GCC and clang. Hosted on GitHub.
- BitMagic: This C++ library implements its own compressed format. Somewhat similar to Roaring, but can use more memory. Works with GCC, clang, Visual Studio. Hosted on sourceforge.
I can vouch for all of these libraries: I would use them in production. They are all available under liberal licenses. I should add that I am involved with all of them, except BitMagic.
Hi Daniel,
Nice post! I’ve peeked around a few of these myself before in a search for a good dynamically-sized bitset for C++. Do you have any thoughts on if it might make sense to add rank and select operations to any of these representations? Usually, when I’m using a compressed (or even uncompressed) bit-vector, rank, select and access are the 3 operations I’m most in need of.
CRoaring supports rank and select in both the C and C++ interfaces (and in Java and Go), https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/roaring.h
How would you compare the bitset library in stl bundled in standard c++ with the libraries you listed. Is there any use case where it is clearly advantageous to use third party bitset libraries as opposed to the standard library? Thanks!
I present a comparison with STL data structures in this recent talk reproduced on YouTube: https://youtu.be/ubykHUyNi_0
I have also covered the topic in various ways on my blog over the years… http://lemire.me/blog/2012/11/13/fast-sets-of-integers/
http://lemire.me/blog/2017/01/27/how-expensive-are-the-union-and-intersection-of-two-unordered_set-in-c/
And I’ll write more.
Hello and thank you for the links.
Are those libraries suitable for building a Bloom filter?
I understand they are overkill, but do their storage and read access performances make the a good choice?
None of these libraries expose functions to shift (as in bit shift) bits around, I guess these operations don’t play well with compression. Are there any compressed bitset algorithms that could potentially support these operations with reasonable efficiency?
The javaewah has a shift function, see http://www.javadoc.io/doc/com.googlecode.javaewah/JavaEWAH/1.1.6
It has not been implemented in the C/C++ libraries simply because there has been no interest shown for the feature.
Of course, a shift operation in generally linear time… so shifting bitsets arbitrarily and frequently is maybe unwise.
There is also FastBit ( https://sdm.lbl.gov/fastbit/ ).
Are those libraries reasonable for building a Bloom channel?
Probably not.