Modern software spends much time hashing objects. There are many fancy hash functions that are super fast. However, without getting fancy, we can easily double the speed of commonly used hash functions.
Java conveniently provides fast hash functions in its Arrays class. The Java engineers like to use a simple polynomial hash function:
for (int i = 0; i < len; i++) { h = 31 * h + val[i]; }
That function is very fast. Unfortunately, as it is written, it is not optimally fast for long arrays. The problem comes from the multiplication. To hash n elements, we need to execute n multiplications, and each multiplication relies on the result from the previous iteration. This introduces a data dependency. If your processor takes 3 cycles to complete the multiplication, then it might be idle half the time. (The compiler might use a shift followed by an addition to simulate the multiplication, but the idea is the same.) To compensate for the latency problem, you might unloop the function as follows:
for (; i + 3 < len; i += 4) { h = 31 * 31 * 31 * 31 * h + 31 * 31 * 31 * val[i] + 31 * 31 * val[i + 1] + 31 * val[i + 2] + val[i + 3]; } for (; i < len; i++) { h = 31 * h + val[i]; }
This new function breaks the data dependency. The four multiplications from the first loop can be done together. In the worst case, your processor can issue the multiplications one after the other, but without waiting for the previous one to complete. What is even better is that it can enter the next loop even before all the multiplications have time to finish, and begin new multiplications that do not depend on the variable h. For better effect, you can extend this process to blocks of 8 values, instead of blocks of 4 values.
So how much faster is the result? To hash a 64-byte char array on my machine…
- the standard Java function takes 54 nanoseconds,
- the version processing blocks of 4 values takes 36 nanoseconds,
- and the version processing blocks of 8 values takes 32 nanoseconds.
So a little bit of easy unrolling almost doubles the execution speed for moderately long strings compared to the standard Java implementation.
You can check my source code.
Further reading:
- Faster 64-bit universal hashing using carry-less multiplications, Journal of Cryptographic Engineering, 2015.
- Strongly universal string hashing is fast, Computer Journal 57(11), 2014.
See also Duff’s device for an entertaining and slightly related hack.
Looks like an excellent candidate for vectorization.
The complexity of a hash function isn’t always O(1), and sometimes the hash function has a larger complexity than the actual hash operations.
It’s weird that the security of a hash almost directly corresponds to the number of data dependencies.
@Leonid
Indeed, hashing can be vectorized, see our recent paper: http://arxiv.org/abs/1503.03465 For long strings, you can nearly hash 8 bytes per CPU cycle!
Daniel, you might want to check out the MetroHash algorithm family. The fastest variants can do >8 bytes/cycle on long strings without vectorization (slow variants are no worse than 5 bytes/cycle) and are among the fastest functions for short keys as well. More importantly, the randomness is extremely robust, similar to SHA-1.
While vectorized variants of MetroHash are on my TODO list, a subtle design element of the MetroHash functions is that they are engineered to saturate multiple ALU ports on modern super-scalar architectures, which gives vector-like performance without vectorization. Main repository is here:
https://github.com/jandrewrogers/MetroHash
The algorithm is surprisingly simple, a minimal sequence of multiplications and rotates, but the anomalously strong statistical properties are the product of extensive computer analysis.
Do you have some documentation beside the source code? E.g., regarding the statistical properties?
I do have quite a bit of data on these functions, I will see if I can dig some up. They have been subjected to much more than the standard SMHasher diagnostics and analysis.
The backstory is that I designed software that in theory could learn how to design and optimize hash functions within some simple constraints. An enormous amount of compute time later, the software was churning out hundreds of algorithms that were far better than the existing art. More interesting to me, a small subset of the functions appear at least as robustly random across a very wide range of tests as cryptographic functions, which I can’t quite explain yet.
@Daniel, very nice. You could have mentioned this in the post. 🙂
You are right. I might write a more involved blog post later on fast “advanced” techniques to compute hash functions.
Can you get a speedup for byte[] or ByteBuffer hashing with a similar technique? (My first attempt was not fruitful)
Correction: for a byte[] the speedup is there, but for ByteBuffer it’s not (unless you get the backing byte[] of course), probably due some overhead of calling buf.get(i).
Unfortunately, a ByteBuffer is an expensive abstraction in Java. Arrays have been optimized in Java to the point where they have basically the same speed as in C++, most of the time. Not so with ByteBuffer.