Software

I take the production of software seriously. You can find most of my software on GitHub.

Some noteworthy contributions:

  • Roaring bitmaps have been widely adopted: Apache Lucene, Solr, Elasticsearch, Metamarkets’ Druid, Apache Spark, Apache Hive, Apache Tez, Netflix Atlas, LinkedIn Pinot, Pilosa, Microsoft Visual Studio Team Services (VSTS), eBay’s Apache Kylin, and so forth.
  • JavaFastPFOR and FastPFor have been included in Terrier, Apache Parquet, Apache Lucene, and Apache NiFi.
  • EWAHBoolArray and JavaEWAH have been included in Git (i.e., GitHub), jGit, Apache Hive, and so forth. JavaEWAH is part of standard Linux distributions like Ubuntu and RedHat.

Recent Publications

You can find my work on arXiv, on Google Scholar, on DBLP, on the ACM Portal, on R Libre and elsewhere.

  • Faster Population Counts Using AVX2 Instructions
    Computer Journal (to appear)

    Details PDF (arXiv) Code

  • On Desirable Semantics of Functional Dependencies over Databases with Incomplete Information
    Fundamenta Informaticae (to appear)

    Details PDF (arXiv)

  • Roaring Bitmaps: Implementation of an Optimized Software Library
    under review

    Details PDF (arXiv) Code

  • Stream VByte: Faster Byte-Oriented Integer Compression
    Information Processing Letters 130, 2018

    Details PDF (arXiv) Code

  • Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions
    Information Systems 66, 2017

    Details PDF (arXiv) Code

  • Regular and almost universal hashing: an efficient implementation
    Software: Practice and Experience 47 (10), 2017

    Details PDF (arXiv) Code

  • Better bitmap performance with Roaring bitmaps
    Software: Practice and Experience 46 (5), 2016

    Details PDF (arXiv) Slides Code Project

  • Compressed bitmap indexes: beyond unions and intersections
    Software: Practice and Experience 46 (2), 2016

    Details PDF (arXiv) Code

  • Consistently faster and smaller compressed bitmaps with Roaring
    Software: Practice and Experience 46 (11), 2016

    Details PDF (arXiv) Slides Code Project

  • Faster 64-bit universal hashing using carry-less multiplications
    Journal of Cryptographic Engineering 6(3), 2016

    Details PDF (arXiv) Code

  • SIMD Compression and the Intersection of Sorted Integers
    Software: Practice and Experience 46 (6), 2016

    Details PDF (arXiv) Slides Code

  • A General SIMD-based Approach to Accelerating Compression Algorithms
    ACM Transactions on Information Systems 33 (3), 2015

    Details PDF (arXiv)

  • Decoding billions of integers per second through vectorization
    Software: Practice & Experience 45 (1), 2015

    Details PDF (arXiv) Slides Code

  • Functional dependencies with null markers
    Computer Journal 58 (5), 2015

    Details PDF (arXiv)

  • Measuring academic influence: Not all citations are equal
    Journal of the Association for Information Science and Technology 66 (2), 2015

    Details PDF (arXiv) Dataset

  • Multidimensional Bloom Filters
    Information Systems (54), 2015

    Details PDF (arXiv) Code

  • Vectorized VByte Decoding
    International Symposium on Web Algorithms 2015, 2015

    Details PDF (arXiv) Slides Code

  • Strongly universal string hashing is fast
    Computer Journal 57 (11), 2014

    Details PDF (arXiv) Code

  • Diamond Dicing
    Data & Knowledge Engineering 86, 2013

    Details PDF (arXiv)

  • Reordering Rows for Better Compression: Beyond the Lexicographic Order
    ACM Transactions on Database Systems 37 (3), 2012

    Details PDF (arXiv) Slides Code Code Code

  • The universality of iterated hashing over variable-length strings
    Discrete Applied Mathematics 160 (4-5), 2012

    Details PDF (arXiv)

  • Time Series Classification by Class-Specific Mahalanobis Distances
    Advances in Data Analysis and Classification 6 (3), 2012

    Details PDF (arXiv)

  • A Call to Arms: Revisiting Database Design
    SIGMOD Record 40 (3), 2011

    Details PDF (arXiv)

  • Reordering Columns for Smaller Indexes
    Information Sciences 181 (12), 2011

    Details PDF (arXiv)

  • Recursive n-gram hashing is pairwise independent, at best
    Computer Speech & Language 24 (4), pages 698-710, 2010

    Details PDF (arXiv) Code

  • Sorting improves word-aligned bitmap indexes
    Data & Knowledge Engineering 69 (1), 2010

    Details PDF (arXiv) Code

  • An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation
    International Journal of Computer Mathematics 86 (7), 2009

    Details PDF (arXiv) Code

  • Faster retrieval with a two-pass dynamic-time-warping lower bound
    Pattern recognition 42 (9), 2009

    Details PDF (arXiv) Code

  • Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
    ACM Transactions on Algorithms 4(1): 14 (2008)

    Details PDF (arXiv) Code

  • Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes
    DOLAP 2008

    Details PDF (arXiv) Code

  • Pruning Attribute Values From Data Cubes with Diamond Dicing
    IDEAS 2008

    Details PDF (arXiv)

  • A Better Alternative to Piecewise Linear Time Series Segmentation
    SIAM Data Mining 2007

    Details PDF (arXiv) Code

  • A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP
    DOLAP 2007, pp. 17-24, 2007

    Details PDF (arXiv) Slides Code

  • Removing Manually-Generated Boilerplate from Electronic Texts: Experiments with Project Gutenberg e-Books
    CASCON 2007

    Details PDF (arXiv) Code

  • Tag-Cloud Drawing: Algorithms for Cloud Visualization
    Tagging and Metadata for Social Information Organization (WWW 2007)

    Details PDF (arXiv) Slides Code Dataset

  • Attribute Value Reordering For Efficient Hybrid OLAP
    Information Sciences 176 (16) 2006

    Details PDF (arXiv)

  • Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element
    Nordic Journal of Computing 13 (4), pages 328-339, 2006

    Details PDF (arXiv) Code Code

  • Collaborative filtering and inference rules for context‐aware learning object recommendation
    Interactive Technology and Smart Education 2 (3), 2005

    Details PDF

  • Scale and Translation Invariant Collaborative Filtering Systems
    Information Retrieval 8 (1), 2005

    Details PDF

  • Slope One Predictors for Online Rating-Based Collaborative Filtering
    In SIAM Data Mining (SDM 2005), Newport Beach, California, April 21-23, 2005

    Details PDF (arXiv)

  • Fourier analysis of 2-point Hermite interpolatory subdivision schemes
    Journal of Fourier Analysis and Applications 7 (5), 2001

    Details PDF

  • Wavelet time entropy, T wave morphology and myocardial ischemia
    IEEE Transactions on Biomedical Engineering 47 (7), 2000

    Details PDF

  • Une famille d'ondelettes biorthogonales sur l'intervalle obtenue par un schéma d'interpolation itérative
    Annales des Sciences Mathématiques du Québec 23 (1), 1999

    Details PDF

Talks

Projects

Roaring Bitmaps

Fast compressed bitmaps, widely used. (picture: Edge Earth)

Students

I supervise graduate students at the University of Quebec (TÉLUQ and UQAM). I also co-supervise students the University of New Brunswick, at the École Polytechnique and at Concordia University.

Recent graduates:

  • Shany Carle (M.Sc., 2017);
  • Joseph Tumusenge (M.Sc., 2016);
  • Samy Chambi (Ph.D., 2016, with Robert Godin);
  • Jing Li (Ph.D., 2016, with Yuhong Yan);
  • Claude Corriolan (M.Sc., 2016);
  • Hazel Webb (Ph.D., 2010, with Owen Kaser).

I’m recruiting students and postdoctoral fellows for my lab. If you love writing crazily fast software and want to come to Montreal, drop me a line. Link to an impressive GitHub profile is an asset. Speaking French is an asset if you want to pursue an academic program with me. Some of my best students have been women.

Services

I have served in the program committee of several international conferences such as

  • ACM Conference on Information and Knowledge Management (ACM CIKM)
  • ACM Conference on Web Search and Data Mining (ACM WSDM)
  • ACM Conference on Information Retrieval (ACM SIGIR)
  • ACM Conference on Recommender Systems (ACM RecSys)
  • ACM/IEEE Joint Conference on Digital Libraries (JCDL)

I recently served on the following program committees:

  • SPIRE 2017 - 24th International Symposium on String Processing and Information Retrieval (September 26-29, 2017; Palermo, Italy)
  • WWW 2018 - Twenty-seventh International WWW Conference (April 23-27 2018; Lyon, France)
  • DOLAP 2018 - Nineteenth International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data (March 26–29, 2018; Vienna, Austria)
  • CIKM 2017 - Twenty-Sixth ACM International Conference on Information and Knowledge Management (November 6-10, 2017; Singapore)

I was an external referee for the following Ph.D. students:

  • Mohammed Shaaban at Université Pierre et Marie Curie (2017) - supervised by Patrick Garda.
  • Mehdi Boukhechba at UQAC (2016) - supervised by Abdenour Bouzouane and Charles Gouin-Vallerand.
  • Hicham Assoudi at UQAM (2016) - supervised by Hakim Lounis.
  • Khaled Dehdouh at Lyon 2 (2015) - supervised by Omar Boussaid.
  • Martin Leginus at Aalborg University (2015) - supervised by Peter Dolog.
  • Ahmad Taleb at Université Concordia (2011) - supervised by Todd Eavis.

I served as a committee member for several funding bodies:

  • FQRNT: review committee 03F (theoretical computer science) since 2007.
  • FQRNT: review committee 309 (team projects in computer science) since 2006.
  • NSERC: Research Tools and Instruments Grants Program (2012-2015)
  • NSERC: Computer Science Evaluation Group (EG 1507) for the Discovery Grants Program (2018-2020)

Contact

  • [email protected]
  • University of Quebec (TELUQ), 5800 Saint-Denis, Office 1105, Montreal (Quebec) H2S 3L5 Canada
  • email for appointment