Perfect Hashing

Suppose you could build a collision-free hash table, how fast would it be? It would be extremely fast, almost as fast as looking up data in an array.

As it turns out, collision-free hash tables have been possible for quite something and that’s called perfect hashing. See for example GNU gperf, for an implementation.

One way to building a collision-free hash table is to use two hashes: r mod q and r mod p. It is important that p and q be coprimes and that q be large enough to store all of your data. Then, these two hashes are used to create two layers of hash tables (h1 and h2): given the key a, you retrieve its value h(a) by computing h1(h2(a)). In other words, h1 stores the values and h2 uses your keys. Using two hash tables buys you the freedom of choosing (through heuristics) the values of h2, or equivalently, the keys of h1.

The downsides are that construction time might be slower and that the data structure cannot be easily updated. The upsides is that you can use very little storage and have tremendously fast look-ups.

Reference:
S. Lefebvre, H. Hoppe, Perfect spatial hashing, ACM SIGGRAPH 2006.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax