Computing the inverse of odd integers

Given x, its (multiplicative) inverse is another value y such that x*y = y*x = 1. We all know that the multiplicative inverse of x is 1/x and it exists as long as x is non-zero. That’s for real numbers, or at least, rational numbers.

But the idea of a multiplicative inverse is more general.

It certainly fails integers in general. I.e., there is no integer x such that 2 x is 1. But, maybe surprisingly, all odd integers have an inverse if you use normal computer arithmetic. Indeed, when your processor computes x*y, then it actually outputs x*y modulo 232 or x*y modulo 264 depending on whether you use 32-bit or 64-bit instructions. (The value x*y modulo 264 can be defined as the remainder of the division of x*y by 264.)

Let me briefly explain why there must be an inverse. You can skip this part if you want. Take any odd integer x. Because x is odd, then it is not divisible by 2. In fact, that’s what it means to be odd. But this also means that powers of x will also be odd. E.g., xk is also odd for any integer k. Ok. So xk modulo 264 is never going to be zero. The only way it could be zero is if xk were divisible by 264, but that’s impossible because it is an odd value. At the same time, we only have a finite number of distinct odd values smaller than 264. So it must be the case that xk modulo 264 = xk' modulo 264 for some pair of powers k and k'. Assume without loss of generality that k is larger than k'. Then we have that xk-k' modulo 264 = 1 modulo 264 (I am not proving this last step, but you can figure it out from the previous one). And thus it follows that xk-k'-1 is the inverse of x. If you did not follow this sketch of a proof, don’t worry.

So how do you find this inverse? You can brute force it, which works well for 32-bit values, but not so well for 64-bit values.

Wikipedia has a page on this, entitled modular multiplicative inverses. It points to an Euclidean algorithm that appears to rely on repeated divisions.

Thankfully, you can solve for the inverse efficiently using very little code.

One approach is based on “Newton’s method”. That is, we start with a guess and from the guess, we get a better one, and so forth, until we naturally converge to the right value. So we need some formula f(y), so that we can repeatedly call y = f(y) until y converges.

A useful recurrence formula is f(y) = y (2 - y x) modulo 264. You can verify that if y is the 64-bit inverse of x, then this will output y. So the formula passes a basic sanity test. But would calling y = f(y) repeatedly converge to the inverse?

Suppose that y is not quite the inverse, suppose that x*y = 1 + z * 2N for some z and some N that is smaller than 64. So y is the inverse “for the first N bits” (where “first” means “least significant”). That is, y modulo 2N = 1.

It is easy to find such a y for N greater than zero. Indeed, let y = 1, then y = 1 + z 21.

Ok, so substituting x*y = 1 + z 2N in y*(2 - y x ) modulo 264, we get (2 - ( 1 + z*2N) ) modulo 264 or ( 1 - z 2N ) modulo 264. So I set y' = f(y) = y*(1 - z*2N ) modulo 264. What is y'? It is ( 1 + z 2N ) (1 - z 2N ) modulo 264 or ( 1 - z2 * 22 N ) modulo 264.

That is, if y was the inverse “for the first N bits”, then y' = f(y) is the inverse “for the first 2 N bits”. In some sense, I double my precision each time I call the recurrence formula. This is great! This means that I will quickly converge to the inverse.

Can we do better, as an initial guess, than y = 1? Yes. We can start with a very interesting observation: if we use 3-bit words, instead of 32-bit or 64-bit words, then every number is its own inverse. E.g., you can check that 3*3 modulo 8 = 1. Marc Reynolds points that you can get 4 bits of accuracy by starting out with x * x + x - 1. Brian Kessler points out that you can do even better: ( 3 * x ) XOR 2 provides 5 bits of accuracy.

So a good initial guess is y = ( 3 * x ) XOR 2, and that already buys us 5 bits. The first call to the recurrence formula gives me 10 bits, then 20 bits for the second call, then 40 bits, then 80 bits. So, we need to call our recurrence formula 2 times for 16-bit values, 3 times for 32-bit values and 4 times for 64-bit values. I could actually go to 128-bit values by calling the recurrence formula 5 times.

Here is the code to compute the inverse of a 64-bit integer:

uint64_t f64(uint64_t x, uint64_t y) {
  return y * (2 - y * x); 
}

uint64_t findInverse64(uint64_t x) {
  uint64_t y = (3 * x) ^ 2; // 5 bits
  y = f64(x, y); // 10 bits
  y = f64(x, y); // 20 bits
  y = f64(x, y); // 40 bits
  y = f64(x, y); // 80 bits
  return y;
}


I wrote a complete command-line program that can invert any odd number quickly.

Each call to the recurrence formula should consume about 5 CPU cycles so that the whole function should take no more than 25 cycles or no more than the cost of a single integer division. Actually, it might be cheaper than a single integer division.

Because of the way we construct the inverse, if you somehow knew the 32-bit inverse, you could call the recurrence formula just once to get the 64-bit inverse.

How did we arrive at this formula ( y ( 2 - y x ) )? It is actually a straight-forward application of Newton’s method as one would apply it to finding the zero of g(y) = 1/y - x. So there is no magic involved.

My code seems to assume that I am working with unsigned integers, but the same algorithm works with signed integers, and in binary form, it will provide the same results.

Reference and further reading:

Credit: Marc Reynolds asked on Twitter for an informal reference on computing the multiplicative inverse modulo a power of two. It motivated me to write this blog post. He finally wrote a decent article on the subject with many interesting remarks.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

8 thoughts on “Computing the inverse of odd integers”

  1. Interesting article!
    Those using C++14 or newer can use the constexpr specifier to evaluate the functions at compile time:

    constexpr uint64_t f64(uint64_t x, uint64_t y) {
    return y * ( 2 – y * x );
    }

    constexpr uint64_t findInverse64(uint64_t x) {
    uint64_t y = x;
    y = f64(x,y);
    y = f64(x,y);
    y = f64(x,y);
    y = f64(x,y);
    y = f64(x,y);
    return y;
    }

  2. In terms of applications, this came up for me recently when looking at an algorithm for exact division (remainder known to be zero).

    You can actually do a little better and get 5 correct starting bits with y = (3 * x) ^ 2 and in that case you also only need 4 iterations for the 64 bit inverse.

    That’s according to “personal communication from Montgomery” from this paper: https://arxiv.org/pdf/1303.0328.pdf

  3. For older machine, this may be faster to calculate 64-bits inverse:

    static uint32_t inverse32(uint32_t a)
    {
    uint32_t x = (3*a) ^ 2;
    x *= 2 - a * x;
    x *= 2 - a * x;
    x *= 2 - a * x;
    return x;
    }

    static uint64_t inverse64(uint64_t a) // (a x) mod 2^64 = 1
    {
    uint32_t terms, a0 = a; // a = a1<<32 | a0
    uint32_t x0 = inverse32(a0); // a0 x0 = 1
    terms = (uint32_t) (a >> 32) * x0; // term a1 x0
    terms += ((uint64_t) a0 * x0) >> 32; // term a0 x0 / 2^32
    uint32_t x1 = x0 * -terms; // a0 x1 + terms = 0
    return (uint64_t) x1 << 32 | x0; // x = a^-1 mod 2^64
    }

  4. I might be late to the party, but here’s a solution to remove some more cycles by hacking an 8bit approximation with a small table (especially efficient if you have the bextr instruction).

    uint64_t inverse64(uint64_t a) {
    uint8_t t[] = {
    2, 174, 210, 190, 66, 174, 210, 254,
    2, 46, 82, 190, 66, 46, 82, 254,
    };
    uint64_t x;
    x = t[(a>>1)&15] - a; // 8bit precision
    x *= 2 - a * x; // 16
    x *= 2 - a * x; // 32
    x *= 2 - a * x; // 64
    return x;
    }

    1. Here’s a solution to remove some more cycles by hacking an 8bit approximation with a small table

      It is interesting, but I doubt that it will be much faster.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.