In my latest post, I explained how you could accelerate 32-bit integer divisions by transforming them into 64-bit floating-point divisions. Indeed, 64-bit floating-point numbers can represent accurately all 32-bit integers on most processors.
It is a strange result: Intel processors seem to do a lot better with floating-point divisions than integer divisions.
Recall the numbers that I got for the throughput of division operations:
|64-bit integer division||25 cycles|
|32-bit integer division (compile-time constant)||2+ cycles|
|32-bit integer division||8 cycles|
|32-bit integer division via 64-bit float||4 cycles|
I decided to run the same test on a 64-bit ARM processor (AMD A1100):
|64-bit integer division||7 ns|
|32-bit integer division (compile-time constant)||2 ns|
|32-bit integer division||6 ns|
|32-bit integer division via 64-bit float||18 ns|
These numbers are rough, my benchmark is naive (see code). Still, on this particular ARM processor, 64-bit floating-point divisions are not faster (in throughput) than 32-bit integer divisions. So ARM processors differ from Intel x64 processors quite a bit in this respect.