Real data sets for bitmap testing
packaged by D. Lemire on April 3rd 2014
As of May 2016, the data is no longer generally distributed as a zip file. Instead it can be found more conveniently on GitHub at at least two locations:
- As part of the CRoaring library as a collection of text files organized in repositories.
- As part of the RoaringBitmap library as a collection of zip files.
The original file (real-roaring-datasets.3april2014.zip, 48MB) is still available for those who insist on having it by sftp at [email protected] and [email protected] (password: sftpuser).
Essentially, each file represents a set of integer values. You can create bitmaps out of these files.
The description of the data sets is provided in our papers:
- Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience, 2016. (arXiv:1603.06549)
- Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), 2016. (arXiv:1402.6407)
- Owen Kaser and Daniel Lemire, Compressed bitmap indexes: beyond unions and intersections, Software: Practice and Experience 46 (2), 2016. (arXiv:1402.4466)