Accelerating PHP hashing by “unoptimizing” it

Hashing is a software trick that can map strings to fixed-length integers, such as 32-bit integers. It is ubiquitous in modern software.

Languages like Java and PHP have the ability to store strings with their corresponding hash values. Still, the hash value must be computed at least once.

How much of a burden can this be? Suppose that we use 10 cycles per byte to hash a string. For a long 100-kilobyte string, that would be about a million CPU cycles. If your CPU runs at 2 GHz, you have 2 billion cycles per second. Hence, hashing your string should take no more than half a millisecond. Put another way, you can hash 2000 such strings per second.

Simon Hardy-Francis pointed out to me that this can still represent a performance bottleneck if your PHP application needs to repeatedly load large new strings.

So what does PHP use as a hash function? It uses fundamentally the Java hash function, a simple polynomial hash with an odd multiplier… (coprime with 2)

for (int i = 0; i < len; i++) {
  hash = 33 * hash + str[i];

(Java multiplies by 31 instead of 33 but it is the same idea.)

A polynomial hash function with an odd multiplier is found everywhere and has a long history. It is the hash function used by the Karp-Rabin string search algorithm.

As I have pointed out in another post, for better performance, you want to unroll this function like so…

for (; i + 3 < len; i += 4) {
   h = 33 * 33 * 33 * 33 * h 
       + 33 * 33 * 33 * str[i] 
       + 33 * 33 * str[i + 1] 
       + 33 * str[i + 2] 
       + str[i + 3];
for (; i < len; i++) {
   h = 33 * h + str[i];

The reason this might help might be that it breaks the data dependency: instead of having to wait for the previous multiplication to finish before another one can be issued, you can issue one new multiplication per cycle for up to four cycles in a row. Unrolling more might accelerating the code further.

The PHP developers implement the hash function with an extra optimization, however. Crediting Bernstein for the idea, they point out that…

the multiply operation can be replaced by a faster operation based on just one shift plus either a single addition or subtraction operation

It is true that a shift followed by an addition might be slightly cheaper than a multiplication, but modern compilers are quite good at working this out on their own. They can transform your multiplications by a constant as they see fit.

In any case, so the PHP implementation is an optimized version of the following…

for (int i = 0; i < len; i++) {
  hash = ((hash << 5) + hash) + str[i];

The code is actually quite a bit more complicated because it is heavily unrolled, but it is algorithmically equivalent. Their code strongly discourages the compiler from ever using a multiplication.

So are the PHP developers correct? Should we work hard to avoid multiplications in C using Bernstein’s trick? Let us put this theory to the test on a recent x64 processor. As usual, my code is available.

Polynomial hashing (cycles per byte) on Intel Skylake
PHP (no multiplier)PHP (with multiplier)

The multiplication-free PHP approach is 33% slower! Gregory Pakosz pointed out that you can do even better by unrolling the version with multiplier further, reaching 1.5 cycles per byte.

Embedded processors with slow multiplications might give different outcomes. But then, where do you expect PHP processes to run? Overwhelmingly, they run on Intel processors produced in the last ten years… and these processors have fast multipliers.

So I think that the PHP developers are leaving performance on the table. They could easily optimize the computation of the hash function without changing the result of the function. What is more, the code would be more readable if they left the multiplications! If you need to multiply by 33, just do it the simplest possible manner! If it is cheaper to do a shift, the compiler can probably figure it out before you do. If you do not trust your compiler, then, at least, run benchmarks!

Let us look at the larger issue. How fast are 1.5 or 1.75 cycles per byte? Not very fast. Google’s CityHash uses about 0.25 cycles per byte whereas the state-of-the-art CLHash uses about 0.1 cycles per byte on recent Intel processors. So with a more modern algorithm, PHP developers could multiply the speed of their hash functions… but that’s for another day.

4 thoughts on “Accelerating PHP hashing by “unoptimizing” it”

  1. I raced the DJB hash, the DJB hash with multiplier, and murmurhash3 on a Haswell CPU with ~ 8000 real keys which get hashed when ‘php -v’ in invoked.

    I found that they all run at about the same speed on the Haswell. mmh3 never gets into its stride because although it doesn’t hash inefficiently byte by byte, it does do for the tail block of up to 15 bytes, and all of the real keys are under 39 bytes long.

    So I raced again using the real set of keys which had been padded to the next 16 byte boundary. This means mmh3 never had to run its slow byte by byte tail code. The result is that mmh3 is nearly 50% faster than the DJB hash.

    This makes me wonder how easy it would be to replace the DJB hash in PHP with mmh3, and change PHP strings so that they are always stored padded to the next 16 byte boundary. It would be interesting to see if any larger PHP scripts end up running faster?

Leave a Reply

Your email address will not be published. Required fields are marked *