Software

I take the production of software seriously. You can find most of my software on GitHub.

Some noteworthy contributions:

  • Roaring bitmaps have been widely adopted: Apache Lucene, Solr, Elasticsearch, Metamarkets’ Druid, Apache Spark, Apache Hive, Apache Tez, Apache CarbonData, Netflix Atlas, LinkedIn Pinot, Pilosa, Microsoft Visual Studio Team Services (VSTS), eBay’s Apache Kylin, and so forth.
  • JavaFastPFOR and FastPFor have been included in Terrier, Apache Parquet, Apache Lucene, and Apache NiFi.
  • EWAHBoolArray and JavaEWAH have been included in Git (i.e., GitHub), jGit, Apache Hive, and so forth. JavaEWAH is part of standard Linux distributions like Ubuntu and RedHat.

Recent Publications

You can find my work on arXiv, on Google Scholar, on DBLP, on the ACM Portal, on R Libre and elsewhere.

  • Fast Random Integer Generation in an Interval
    ACM Transactions on Modeling and Computer Simulation (to appear)

    Details PDF (arXiv) Code

  • Faster Base64 Encoding and Decoding using AVX2 Instructions
    ACM Transactions on the Web (to appear)

    Details PDF (arXiv) Code

  • Roaring Bitmaps: Implementation of an Optimized Software Library
    Software: Practice and Experience 48 (4), 2018

    Details PDF (arXiv) Code

  • Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
    SIGMOD’18, 2018

    Details PDF (arXiv) Code

  • Faster Population Counts Using AVX2 Instructions
    Computer Journal 61 (1), 2018

    Details PDF (arXiv) Code

  • On Desirable Semantics of Functional Dependencies over Databases with Incomplete Information
    Fundamenta Informaticae 158 (4), 2018

    Details PDF (arXiv)

  • Stream VByte: Faster Byte-Oriented Integer Compression
    Information Processing Letters 130, 2018

    Details PDF (arXiv) Code

  • Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions
    Information Systems 66, 2017

    Details PDF (arXiv) Code

  • Regular and almost universal hashing: an efficient implementation
    Software: Practice and Experience 47 (10), 2017

    Details PDF (arXiv) Code

  • Better bitmap performance with Roaring bitmaps
    Software: Practice and Experience 46 (5), 2016

    Details PDF (arXiv) Slides Code Project

  • Compressed bitmap indexes: beyond unions and intersections
    Software: Practice and Experience 46 (2), 2016

    Details PDF (arXiv) Code

  • Consistently faster and smaller compressed bitmaps with Roaring
    Software: Practice and Experience 46 (11), 2016

    Details PDF (arXiv) Slides Code Project

  • Faster 64-bit universal hashing using carry-less multiplications
    Journal of Cryptographic Engineering 6(3), 2016

    Details PDF (arXiv) Code

  • SIMD Compression and the Intersection of Sorted Integers
    Software: Practice and Experience 46 (6), 2016

    Details PDF (arXiv) Slides Code

  • A General SIMD-based Approach to Accelerating Compression Algorithms
    ACM Transactions on Information Systems 33 (3), 2015

    Details PDF (arXiv)

  • Decoding billions of integers per second through vectorization
    Software: Practice & Experience 45 (1), 2015

    Details PDF (arXiv) Slides Code

  • Functional dependencies with null markers
    Computer Journal 58 (5), 2015

    Details PDF (arXiv)

  • Measuring academic influence: Not all citations are equal
    Journal of the Association for Information Science and Technology 66 (2), 2015

    Details PDF (arXiv) Dataset

  • Multidimensional Bloom Filters
    Information Systems (54), 2015

    Details PDF (arXiv) Code

  • Vectorized VByte Decoding
    International Symposium on Web Algorithms 2015, 2015

    Details PDF (arXiv) Slides Code

  • Strongly universal string hashing is fast
    Computer Journal 57 (11), 2014

    Details PDF (arXiv) Code

  • Diamond Dicing
    Data & Knowledge Engineering 86, 2013

    Details PDF (arXiv)

  • Reordering Rows for Better Compression: Beyond the Lexicographic Order
    ACM Transactions on Database Systems 37 (3), 2012

    Details PDF (arXiv) Slides Code Code Code

  • The universality of iterated hashing over variable-length strings
    Discrete Applied Mathematics 160 (4-5), 2012

    Details PDF (arXiv)

  • Time Series Classification by Class-Specific Mahalanobis Distances
    Advances in Data Analysis and Classification 6 (3), 2012

    Details PDF (arXiv)

  • A Call to Arms: Revisiting Database Design
    SIGMOD Record 40 (3), 2011

    Details PDF (arXiv)

  • Extracting, Transforming and Archiving Scientific Data
    In VLDL 2011, Berlin, Germany, 2011

    Details PDF (arXiv)

  • Reordering Columns for Smaller Indexes
    Information Sciences 181 (12), 2011

    Details PDF (arXiv)

  • Recursive n-gram hashing is pairwise independent, at best
    Computer Speech & Language 24 (4), pages 698-710, 2010

    Details PDF (arXiv) Code

  • Sorting improves word-aligned bitmap indexes
    Data & Knowledge Engineering 69 (1), 2010

    Details PDF (arXiv) Code

  • An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation
    International Journal of Computer Mathematics 86 (7), 2009

    Details PDF (arXiv) Code

  • Faster retrieval with a two-pass dynamic-time-warping lower bound
    Pattern recognition 42 (9), 2009

    Details PDF (arXiv) Code

  • Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
    ACM Transactions on Algorithms 4(1): 14 (2008)

    Details PDF (arXiv) Code

  • Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes
    DOLAP 2008

    Details PDF (arXiv) Code

  • Pruning Attribute Values From Data Cubes with Diamond Dicing
    IDEAS 2008

    Details PDF (arXiv)

  • A Better Alternative to Piecewise Linear Time Series Segmentation
    SIAM Data Mining 2007

    Details PDF (arXiv) Code

  • A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP
    DOLAP 2007, pp. 17-24, 2007

    Details PDF (arXiv) Slides Code

  • Monotonicity Analysis over Chains and Curves
    In Curves and Surfaces 2006, Saint-Malo, France, 2007

    Details PDF (arXiv)

  • Removing Manually-Generated Boilerplate from Electronic Texts: Experiments with Project Gutenberg e-Books
    CASCON 2007

    Details PDF (arXiv) Code

  • Tag-Cloud Drawing: Algorithms for Cloud Visualization
    Tagging and Metadata for Social Information Organization (WWW 2007)

    Details PDF (arXiv) Slides Code Dataset

  • Attribute Value Reordering For Efficient Hybrid OLAP
    Information Sciences 176 (16) 2006

    Details PDF (arXiv)

  • Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element
    Nordic Journal of Computing 13 (4), pages 328-339, 2006

    Details PDF (arXiv) Code Code

  • Collaborative filtering and inference rules for context‐aware learning object recommendation
    Interactive Technology and Smart Education 2 (3), 2005

    Details PDF

  • Scale and Translation Invariant Collaborative Filtering Systems
    Information Retrieval 8 (1), 2005

    Details PDF

  • Slope One Predictors for Online Rating-Based Collaborative Filtering
    In SIAM Data Mining (SDM 2005), Newport Beach, California, April 21-23, 2005

    Details PDF (arXiv)

  • A family of 4-point dyadic high resolution subdivision schemes
    In Curves and Surfaces 2002, Saint-Malo, France, 2003

    Details PDF Code

  • Fourier analysis of 2-point Hermite interpolatory subdivision schemes
    Journal of Fourier Analysis and Applications 7 (5), 2001

    Details PDF

  • Wavelet time entropy, T wave morphology and myocardial ischemia
    IEEE Transactions on Biomedical Engineering 47 (7), 2000

    Details PDF

  • Une famille d'ondelettes biorthogonales sur l'intervalle obtenue par un schéma d'interpolation itérative
    Annales des Sciences Mathématiques du Québec 23 (1), 1999

    Details PDF

Talks

Projects

GapminderVR

GapminderVR: Data Visualization in Virtual Reality

MaskedVByte

MaskedVByte: SIMD-accelerated VByte

Roaring Bitmaps

Fast compressed bitmaps, widely used. (picture: Edge Earth)

VRSpaceTrader

VRSpaceTrader: Rearrange your environment for better cognition

Laboratory

We are lucky to have a fully equipped laboratory with a dedicated technician. We have a server farm that has been used worldwide for experiments in software performance (e.g., by researchers such as Agner Fog). Some of our machines have the following specifications:

  • Haswell microarchitecture: Intel® Core™ i7-4770 CPU @ 3.40GHz
  • Knights Landing microarchitecture: Intel® Xeon Phi™ CPU 7210 @ 1.30GHz (64 cores)
  • Skylake microarchitecture: Intel® Core™ i7-6700 CPU @ 3.40GHz
  • Skylake-X microarchitecture: Intel® Xeon® W-2104 CPU @ 3.20GHz
  • AMD OPTERON A1100 SOC (Cortex A57)

We also have several workstations and space in our laboratory to explore virtual reality as a tool in data science.

Students

I’m recruiting students and postdoctoral fellows for my lab. If you love writing crazily fast software and want to come to Montreal, drop me a line. Link to an impressive GitHub profile is an asset. Speaking French is an asset if you want to pursue an academic program with me. Some of my best students are women.

If you are a Canadian undergraduate student with at least a B average, you might be interested in coming to work with me under an NSERC Undergraduate Student Research Awards. The awards help pay for a full-time research project in our Montreal labs. The application deadlines are:

  • March 1st for the Summer term;
  • July 1st for the Fall term;
  • November 1st for the Winter term.

It is an ongoing competition: I can receive applications for every term. Please allow at least a week to put together an application with my help. Email me if you are interested.

If you are interested in pursuing a master in information technology full-time under my supervision in Montreal and you know some French, I take applications for NSERC Graduate Scholarships. You need to have a strong academic profile to apply. You should be a Canadian citizen or permanent resident of Canada. The deadline is December first of each year. You must plan ahead. I take applications every year. Please get in touch by email if you are interested.

If you are interested in pursuing a Ph.D. in cognitive computing full-time under my supervision in Montreal and you know some French, I take applications for NSERC graduate scholarships. You need to have a strong academic profile to apply. You should be a Canadian citizen or permanent resident of Canada. The deadline is November 1st of each year. You must plan ahead. I take applications every year. Please get in touch by email if you are interested.

I supervise graduate students at the University of Quebec (TÉLUQ and UQAM). I also co-supervise students the University of New Brunswick, at the École Polytechnique and at Concordia University.

Recent graduates:

  • Erick Aokou Koffi (Ph.D., 2018);
  • Badis Merdaoui (Ph.D., 2017);
  • Maxime Boisvert (M.Sc., 2017);
  • Shany Carle (M.Sc., 2017);
  • Joseph Tumusenge (M.Sc., 2016);
  • Samy Chambi (Ph.D., 2016, with Robert Godin);
  • Jing Li (Ph.D., 2016, with Yuhong Yan);
  • Claude Corriolan (M.Sc., 2016);
  • Hazel Webb (Ph.D., 2010, with Owen Kaser).

Current Ph.D. students:

Current M.Sc. students:

Recent research assistants (undergraduate):

Mentoring

  • Homma Kazutaka, Google Summer of Code, Summer 2018 (co-mentor with Harlan Haskins)

News

Erick Aokou Koffi successfully defended his Ph.D. thesis.

CONTINUE READING

Peter A. Boncz, Goetz Graefe, Bingsheng He, and Kai-Uwe Sattler organized a seminar on Database Architectures for Modern Hardware in Dagstuhl (Germany).

CONTINUE READING

Kareem El Gebaly successfully defended his Ph.D. thesis.

CONTINUE READING

Badis Merdaoui successfully defended his Ph.D. thesis.

CONTINUE READING

Services

I organize in Montreal ongoing series of workshops open to the public such as the technolab workshops and the tribalab workshops.

I have served in the program committee of several international conferences such as

  • ACM Conference on Information and Knowledge Management (ACM CIKM)
  • ACM Conference on Web Search and Data Mining (ACM WSDM)
  • ACM Conference on Information Retrieval (ACM SIGIR)
  • ACM Conference on Recommender Systems (ACM RecSys)
  • ACM/IEEE Joint Conference on Digital Libraries (JCDL)

In June 2018, I participated to the Dagstuhl Seminar 18251: Database Architectures for Modern Hardware.

I recently served on the following program committees:

  • DOLAP 2019 - 21st International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data (March 26, 2019; Lisbon, Portugal)
  • CIKM 2018 - Twenty-Seventh ACM International Conference on Information and Knowledge Management (October 22-26, 2018; Turing, Italy)
  • ASD 2018 - 12th edition of the Conference on Advances of Decisional Systems : Big data & Applications (May 2018; Marrakech, Morocco)
  • WABiD* 2018 - 1st International Workshop on Advances on Big Data Management, Analytics and Security (September 2018; Budapest, Hungary)
  • RecSys 2018 - 12th ACM Recommender Systems Conference (October 2018; Vancouver, Canada)
  • WWW 2018 - Twenty-seventh International WWW Conference (April 23-27 2018; Lyon, France)
  • DOLAP 2018 - 20th International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data (March 26–29, 2018; Vienna, Austria)
  • CIKM 2017 - Twenty-Sixth ACM International Conference on Information and Knowledge Management (November 6-10, 2017; Singapore)
  • SPIRE 2017 - 24th International Symposium on String Processing and Information Retrieval (September 26-29, 2017; Palermo, Italy)

I was an external referee for the following Ph.D. students:

  • Mohammed Shaaban at Université Pierre et Marie Curie (2017) - supervised by Patrick Garda.
  • Mehdi Boukhechba at UQAC (2016) - supervised by Abdenour Bouzouane and Charles Gouin-Vallerand.
  • Hicham Assoudi at UQAM (2016) - supervised by Hakim Lounis.
  • Khaled Dehdouh at Lyon 2 (2015) - supervised by Omar Boussaid.
  • Martin Leginus at Aalborg University (2015) - supervised by Peter Dolog.
  • Ahmad Taleb at Université Concordia (2011) - supervised by Todd Eavis.

I served as a committee member for several funding bodies:

  • FQRNT: review committee 03F (theoretical computer science) since 2007.
  • FQRNT: review committee 309 (team projects in computer science) since 2006.
  • NSERC: Research Tools and Instruments Grants Program (2012-2015)
  • NSERC: Computer Science Evaluation Group (EG 1507) for the Discovery Grants Program, comité 1507 (2018-2020)

Contact

  • [email protected]
  • University of Quebec (TELUQ), 5800 Saint-Denis, Office 1105, Montreal (Quebec) H2S 3L5 Canada
  • email for appointment