Should you cache hash values even for trivial classes?

Hash tables are a fundamental data structure in computing, used to implement maps and sets. In software, we use hash values to determine where objects are located within hash tables.

In my previous blog post, I showed that the bulk of the running time when checking values in a hash table could be taken up by the computation of the hash values themselves. I got some pushback because I implemented an array of three integers as a Java ArrayList<Integer>. This seems fair to me but there are obviously more efficient ways to represent three integers in Java.

So I decided to do as suggested, and use a trivial class:

class Triple {
  int x;
  int y;
  int z;

  public Triple(int mx, int my, int mz) {
    x = mx;
    y = my;
    z = my;
  }

  public int hashCode() {
    return x + 31 * y + 31 * 31 * z;
  }
}

I also implemented a version with precomputed hash values:

class BufferedTriple {
  int x;
  int y;
  int z;
  int hashcode;

  public BufferedTriple(int mx, int my, int mz) {
    x = mx;
    y = my;
    z = my;
    hashcode =  x + 31 * y + 31 * 31 * z;
  }

  public int hashCode() {
    return hashcode;
  }
}

Given these two classes, I create large hash tables (10 million entries) and I check 100 different target keys.

So is the class with the cached hash value faster? Yes, it is about 25% faster:

Triple BufferedTriple
0.4 us 0.3 us

My code is available.

I should add that Java’s hash tables re-hash the hash values that we provide, so the benefits of the cached hash value are less than they could be.

Moreover, Strings have cached hash values by default in Java. I’m definitively not the first to notice that caching hash values can be valuable.

I should add that this was not the point that I wanted to make in my original blog post. I do not particularly care whether you cache your hash values. The point I wanted to make is that even something as cheap as the computation of a hash value can be a limiting factor, even when you have lots of data in RAM.

Further reading: For greater speed, try batching your out-of-cache data accesses

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

7 thoughts on “Should you cache hash values even for trivial classes?”

  1. Daniel,

    Java Objects are bloated. For most efficient storage of triples, one needs to use type[3], e.g., int[3].

      1. Sorry, I forgot about this. BTW, I tried to keep integers as 3-int array INSIDE an object, but this only slowed things down. In a hindsight, this apparently only creates an extra “lookup layer”.

        1. Sorry, I forgot about this.

          Honestly, I looked it up while preparing this blog post. I could not believe that arrays did not hash properly (as values) in Java. Mind you, I still cannot believe that Java lacks value types.

    1. for me it’s not the computation in itself, but accessing data for computing the hash. Zven in L1, accessing data will be slower than performing add and mul operations

      This can be tested. Lay out the objects in L1 cache and check how many instructions you get per cycle while computing hash values. If you are correct, then you will get far fewer than one instruction per cycle (the CPU will spin empty).

      I believe you are not correct but I’d be interested in seeing the numbers.

  2. Mmm, there must be some big trade off somewhere. To begin, an increase of memory footprint and a decrease in the cache utilization. Mutability will have cost as well. Did you try it in the context of C++?

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.