In my blog post A fast alternative to the modulo reduction, I described how one might map 64-bit values to an interval of integers (say from 0 to N) with minimal bias and without using an expensive division. All one needs to do is to compute x * N ÷ 264 where ‘÷’ is the integer division. A division by a power of two is just a shift. Such an approach is simple and efficient.
Let us consider the opposite problem.
Suppose that I give you integers in the range from 0 (inclusively) to some value N (exclusively). We want to map these values to integers spanning the whole 64-bit range. Obviously, since we only have N distinct values to start with, we cannot expect our function to cover all possible 64-bit values. Nevertheless, we would like to be “as fair as possible”. Let us use only integers.
Let 264 ÷ N be the integer division of 264 by N. Let 264 % N be the remainder of the integer division. Because N is a constant, these are constants as well.
As a first attempt, I might try to map integer x using the function (264 ÷ N) * x. It works beautifully when N is a power of two, but, in general, it is a tad crude. In particular, the result of this multiplication can never exceed N – (264 % N) whereas it starts at value 0 when x is 0 so it is biased toward smaller values when (264 % N) is non-zero (which is always true when N is not a power of two).
Let 264 % N be the remainder of the integer division. To compensate, we need to add a value that can up to 264 % N. Mapping integers in the interval from 0 to N to integers in the interval from 0 to 264 % N cannot be done with a simple multiplication. We do not want to use divisions in general because they are expensive. However, we can use a shift, that is, a division by a power of two. So let us look for a map of the form (264 ÷ N) * x + (u * x)÷264 for some unknown value u. We know that x can never exceed N, but that when x reaches N, the value of (u * x)÷264 should be close to 264 % N. So we set (u * N)÷264 to 264 % N and solve for u, which gives us that u must be at least (264 % N) * 264 ÷ N.
The values (264 ÷ N) and (264 % N) * 264 ÷ N need to computed just once: let us call them m and n respectively. We finally get the function m * x + (n * x)÷264. In Python, the code might look as follows…
def high(x,y): return (x*y >> 64) % 2**64 def inv(x): n = ((2**64) % N)*2**64 // N m = 2**64 // N return m*(x+1) + high(x,n)
How good is this formula? To test it out, I can test how well it can invert the approach that goes in the other direction. That is, if I plug x * N ÷ 264, I would hope to get back x, or something very close for suitable values of N. That is, I want m * x * N ÷ 264 + (n * x * N ÷ 264)÷264 to be almost x. In general, I cannot hope to find x exactly because some information was lost in the process. Yet I expect to have a lower bound on the error of ceil(264/ N). I benchmark my little routine using a probabilistic test and the results are promising. In the worst case, I can be orders of magnitude larger than the lower bound, but most of the time, my estimated error is lower than the lower bound. This suggests that though my approach is not quite suitable to get back x, with a little bit of extra mathematics and a few more instructions, I could probably make it work exactly within a strict error bound.