Document identifier data set

Packaged by D. Lemire on April 3rd 2014

Based on data sets prepared by L. Boytsov using software available at https://github.com/searchivarius/IndexTextCollect

Update: As for May 2023, I no longer make the file available. I recommend that you consider Real data sets for bitmap testing and RealisticTabularDataSets.

File name: IntegerCompression2014.3april2014.zip

File size: 70GB

You can get the file by sftp through [email protected] or [email protected] (password sftpuser). You can find free sftp clients online. If you have Mac or Linux box, then sftp comes by default.

Please report any problem you have with downloading the file.

The archive contains a README file describing the content. The data is further described in our papers, see for example:

This data can be used with the SIMDCompressionAndIntersection software library.