Suppose that you want to visit all values in an array exactly once in “random order”. You could do it by shuffling your array but it requires some extra storage.

You want your code to use just a tiny bit of memory, and you want the code to be super fast. You do not want to assume that your array size is a power of two.

One way to do it is to use the fact that `(a x + b) modulo n` will visit all integer values in `[0,n)` exactly once as `x` iterates through the integers in `[0, n)`, as long as `a` is coprime with `n`. Being coprime just means that the greatest common divisor between `a` and `n` is 1. There are fast functions to compute the greatest common divisor between `a` and `n`.

A trivial coprime number would be `a = 1`, but that’s bad for obvious reasons. So we pick a coprime number in `[n/2,n)` instead. There is always at least one no matter what `n` is.

Enumerating all coprime numbers in `[n/2,n)` could get tiresome when `n` is very large, so maybe we just look at up to 100,000 of them. There is no need to actually store them in memory, we can just select one at random, so it requires very little memory.

To see why the mathematics work, suppose that `( a x + b ) modulo n = ( a x' + b ) modulo n`, then `a (x - x') modulo n = 0` which only happens when `(x - x')` is a multiple of `n` because `a` and `n` are coprime. Thus if you map a consecutive range of `n` values `x`, you will get `n` distinct values `( a x + b ) modulo n`. The choice of the parameter `a` is critical however: if you set `a` to 1 or 2, even if it is coprime with `n`, the result will not look random.

The running code is ridiculously simple:

public int getCurrentValue() { return ( (long) index * prime + offset ) % ( maxrange); } public boolean hasNext() { return index < maxrange; } public int next() { int answer = getCurrentValue(); index ++; return answer; }

You can optimize this code by avoiding multiplications and remainder computations:

public int next() { runningvalue += prime; if(runningvalue >= maxrange) runningvalue -= maxrange; index ++; // runningvalue == getCurrentValue()) return runningvalue; }

Of course, it is not really random in the sense that no (good) statistician should accept the result as a fair shuffle of the indexes. Still, it might be “good enough” to fool your colleagues into thinking that it is random.

While my implementation assumes that you are visiting the values in order, you can go back in time, or jump forward and backward arbitrarily.

I make my Java code available. It can be made more elegant, but it should work just fine in your projects.

If you can find ways to make the result “look” more random without significantly making it slower and without increasing memory usage, please let us know.

You can find ready-made solutions to visit all values in an array with a power of two number of elements. And by restricting your traversal to the subset of elements in `[0,n)` from a larger virtual array having a power of two size, you will have an alternative to the approach I describe, with the caveat that your main code will require branching. The computational complexity of a call to “next” becomes `O(n)` whereas I use a small, finite, number of instructions.

**Follow-up**: Benchmarking algorithms to visit all values in an array in random order

For a “more random” solution, why not using exponentiation? x \mapsto x^h is injective modulo every prime p for which gcd(p-1, h) = 1. For a small enough h, for instance h = 3, it is reasonably fast to compute.

I don’t want to assume that the size of the interval is a prime. You can, of course, patch things up, but I also want the code to be simple and easy to predict.

Another neat trick in the same direction is the following: for each r>2, the map k -> (5^k mod 2^r) runs exactly once through all numbers whose binary expansion ends in 01 before looping. So you can visit all values in an array of size 2^(r-2) with the function ((5^k mod 2^r) – 1) / 4. It should be reasonably fast, because it’s just a multiplication by 5 and a few shifts.

I don’t want to assume that the interval is a power of two.

If you have range of ‘n’ with a configurable power-of-two permutation then you can generate on ceil(log_2(n)) and reject elements >= n. So worst case rejection is ~1/2 or ~2 elements per draw on average. If I’m doing the math right then average rejection rate (assuming ‘n’ is uniform) is ~.18. Of course only worth considering if you need to speed up the sampling of elements.

Yes, you are right… with the possible upside that you can get well known statistical properties… but with the downsides that…

1) you cannot easily randomly access the indexes… e.g, in my implementation, I can ask what was the kth index,

2) in your proposal, the running time of accessing the next integer is O(n), not constant time…

3) my implementation is branchless… so no branch misprediction…

If performance is a priority, I think that my approach is going to be much faster in realistic scenarios, albeit, we need to benchmark it.

Also, in your code, why don’t you update your index with answer = (old_answer + prime) % maxrange, instead of doing the multiplication? Or do you need to have non-sequential access to your array as well? If so, then my tricks with the multiplicative order don’t work. 🙂

Yes, thanks. You don’t need the multiplication, you are right. But it is probably the least of your concerns here. The modulo reduction is probably want you want to optimize away first.

I did point out in my post that this code can be further optimized.

Even though it has limitations, I’m very fond of the shuffle algorithm that appears in Knuth. It is essentially selection sort using a random comparison: for each i, choose a random element at position i or higher, emit it, swap it with the element at position i.

For this problem, the shuffled array is not the goal, but is merely a by-product; we need to move elements out of the way so that we do not consider them more than once.

So, we can implement the algorithm with no extra storage, and better randomness than your algorithm, if: (a) we allow the array to be permuted while the algorithm is running, (b) we have a random number generator that can run backwards, (c) 2x running time is acceptable. The idea is to reverse the permutation at the end of the algorithm, running the random number generator backwards to put each element back into its original position.

Of course, this is a significant extra cost over your algorithm. But acceptable, I think, for applications that require good randomness.

I don’t know whether there exist “good” random number generators that are reversible, but I assume that there are, since a random number generator that loses information is not a very good random number generator.

Good point.

This suggests a follow-up blog post.

Also a few weeks ago a post on the same problem appeared: http://fabiensanglard.net/fizzlefade/index.php mentioning Linear Feedback Shift Registers

Thanks. I have not read in detail the post you offer, but I suspect it is a different problem.

Some years ago I was experimenting with one of the murmur hash algorithms and discovered that commenting out part of the algorithm resulted in randomly visiting each array element once if the array is a power of two large. I’ve used it a couple of times over the years. Very handy.

The bit mixing function executed at the end of murmur hash is invertible so it is definitively the case that it will visit all values exactly once.

This is actually not the case. The mixing function is:

int mixing(int h, int len) {

h ^= len;

h ^= h >> 16;

h *= 0x85ebca6b;

h ^= h >> 13;

h *= 0xc2b2ae35;

h ^= h >> 16;

return h % len;

}

This will *not* generate all values exactly once, as a simple experiment with 16 values shows: 12 8 7 4 1 5 6 12 15 10 13 8 0 10 13 11

If you remove the xor len and mod len then the remaining part is a bijection (or invertible/permutation) function. To make it a sequence of all integers you need to feed it the output of some full-period sequence. Just 0,1,2…. will result in pretty respectable results.

Yes. Marc is correct. What I refer to as “the bit mixing function” is equivalent to function you use (at least for 32-bit integers), without the “len” parameter. You will find it in widespread use.

There are other possibilities, see section 7 of this paper : https://arxiv.org/pdf/1609.09840.pdf

A Feistel Network? http://antirez.com/news/113

Again, as other proposals above, it assumes that the size of the interval is a power of two.

I haven’t looked into it, but the author of that blog says it is possible to generalize the Feistel network to any radix (see the comment by antirez in the link).

the author of that blog says it is possible to generalize the Feistel network to any radixWell, we do know one way to do it, it has been described in the comments here, and it involves branching and O(n) complexity for the “next” calls.

It is possible that he has something more clever in mind… but if it is described in the blog post in question, then I missed it.

If c = (x – ax)%m then x maps to itself. However I believe sufficient conditions are given in theorem A here: http://www.math.cornell.edu/~mec/Winter2009/Luo/Linear%20Congruential%20Generator/linear%20congruential%20gen1.html , or you can stay with c=0 or a=0 (the other coefficient being relatively prime).

If c = (x – ax)%m then x maps to itself.My proposed technique

is not a recurrence formula. So while what you write is correct (albeit I think I use ‘b’ and not ‘c’), it does not matter.You are correct, I misremembered it as iterating x = bx+c. Iteration may be another way to do it then.

Well it’s additive recurrence…just in closed-form for the ‘i’th element. >;)

I have added a benchmark against LCG: https://lemire.me/blog/2017/09/26/benchmarking-algorithms-to-visit-all-values-in-an-array-in-random-order/

Any solution that works for all powers of two trivially works for any size. Choose the smallest power of two greater than or equal to the size you want. If the algorithm gives an index too big just skip that one and go on to the next “random” index. This could possibly be faster than a solution that natively handles arbitrary sizes, because modulo instruction is quite slow on most hardware.

As pointed out earlier, you do not need the modulo instruction.

I have added benchmarks in a separate post: https://lemire.me/blog/2017/09/26/benchmarking-algorithms-to-visit-all-values-in-an-array-in-random-order/

You can also use an RNG to choose a permutation-index [0, n!). Each index represents a specific ordering of all of the elements. For example, 0 might represent the (original) order 0, 1, …, n – 2, n – 1; 1 might represent the order 0, 1, …, n – 1, n – 2; n!-1 might represent the order n – 1, n – 2, …, 1, 0; etc.

The interesting part is defining a specific algorithm that satisfies the above properties in an efficient way. It’s similar to algorithms that generate all possible permutations… but in this case, we want to jump to a specific one. I’ve coded something like this up before.

One such algorithm could be related to an efficient solution to Project Euler: https://projecteuler.net/index.php?section=problems&id=024. https://r.prevos.net/euler-problem-24/

I think you refer to Lehmer codes.

Did you post your code?

Ah, I think Lehmer code is on point. Although I’m not sure I knew that it was called that. I don’t have the specific code I wrote… think that was about 10 years ago. I’ve whipped something up though that is roughly equivalent (although not specifically Lehmer) https://gist.github.com/devinrsmith/0320756f5fc9b38fbd23c464dd516de9

This is very similar in spirit the solution that Julian Hyde proposed.

Hi Daniel,

I tried you C implementation of shuffle() / shuffleReallyFast()

from here :

https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2017/09/25

and result of shuffling of array initialized with items

0, 1, 2, 3, .. 99

is this:

20, 71, 22, 73, 24, 75, 26, 77, 28, 79, ..

Is this expected behaviour ? Output is just two interleaved growing

sequences with stride of 2 (result of shuffleLCG() looks reasonably random).

Thanks !

The C code is not randomized, see the remark in the first line of

visit.c:https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2017/09/25/visit.c#L1

I only wrote the C code to benchmark speed.

In any case, as I make clear in my blog post, this fast approach is crude and will not fool anyone who is looking for true randomness.

Thanks a lot for explanation Daniel !