Ranged random-number generation is slow in Python…

A colleague has been running simulations using a library written in Python. She has having serious performance problems… Her application is parallelizable, but Python does not make parallelization easy. She could switch to another language, but that’s expensive.

Further investigations reveals that her simulation relies heavily on random-number generation. Every little step involves a random number. So how good is Python at generating random numbers?

Python has a nice framework to quickly benchmark functions: the timeit module.

How fast is the Python random-number generator?

$ python -m timeit -s 'import random' 'random.random()'
10000000 loops, best of 3: 0.0363 usec per loop

So over 100 CPU cycles to generate one random floating point numbers. However, timeit includes an overhead of about 30 cycles or so to every operation, related to the function-call overhead. It is not unreasonable.

What if you want to generate an integer in a range [0,1000]? It gets ugly.

$ python -m timeit -s 'import random' 'random.randint(0,1000)'
1000000 loops, best of 3: 0.847 usec per loop

Wow! We are now taking over 2000 CPU cycles per random integer. This can easily becoming a limiting factor when writing simulation code. I tried to read Python’s source code for random.randint, but I could not figure out quickly what it is doing.

If we accept a very small (negligible) bias, we can do it by multiplication instead…

$ python -m timeit -s 'import random' 'int(random.random() * 1001)'
1000000 loops, best of 3: 0.206 usec per loop

We are down to 400 CPU cycles per integer. It is still a lot… but it is four times faster to avoid Python’s default API (random.randint).

The nice thing with Python is that it is easy to write a C function and access it from Python. Of course, it comes with some significant overhead. I do not hope to use far fewer than 100 cycles per random value by calling a C function. However, the ranged random-number generators are expensive enough that a C function might help. So I took a simple function in C that generates a good-quality (unbiased) ranged random number and made it available to Python:

$python -m timeit -s 'import fastrand' 'fastrand.pcg32bounded(1001)'
10000000 loops, best of 3: 0.0693 usec per loop

That is about 10 times faster than Python’s native random.randint.

The lesson is that random.randint should probably not be used in performance-sensitive code.

My source code is available (Python and C).

Update: Marcel Ball reports in the comments that this performance problem does not affect PyPy, only the regular Python. David Andersen points out that using the numpy library via python -m timeit -s 'import numpy' 'numpy.random.randint(0, 1000)' is much faster.

Credit: This blog post benefited from an exchange with Nathan Kurz.

8 thoughts on “Ranged random-number generation is slow in Python…”

  1. Depending on the libraries they are using for Python I strongly suggest looking into PyPy (pretty much 100% compatible with anything pure python; still a bit hit-and-miss for things that go out to native code – they are working on a PyPy compatible NumPy version which is one of the big ones for scientific computing).

    Just ran this for a comparison on my computer:

    PyPy 4.0.1
    $ pypy -m timeit -s ‘import random’ ‘random.randint(0, 1000)’
    100000000 loops, best of 3: 0.0117 usec per loop

    Python 2.7.10
    $ python -m timeit -s ‘import random’ ‘random.randint(0, 1000)’
    1000000 loops, best of 3: 0.874 usec per loop

    1. Thanks. I have updated my blog post. Interestingly, you can also apparently simply switch to the randint function provided by numpy.

      As far as my colleague is concerned, they are relying on numpy, though not for random-number generation. So PyPy is probably not the solution for them.

      1. Yeah – PyPy, and the NumPy version for PyPy, have come a long way. Still parts of NumPy that need to be completed yet – but they’ve been making a lot of progress.

  2. I’ve always found that the Numpy random number generators are very good if you’re able to generate the numbers in advance and then draw from the sample. It is however much slower for generating single numbers.

    I’ve just tested on my own machine, and the main limiting factor seems to be the interfacing above the underlying C code, as it generates 10^10 integers in the range (0, 1000] in about 2 microseconds, and 10^4 in roughly the same time.

  3. Jikes, it gets worse with a newer python:

    # python 2.7
    λ python -m timeit -s “import random” “random.randint(0, 1000)”
    1000000 loops, best of 3: 1.64 usec per loop

    # python 3.5
    [dev35] λ python -m timeit -s “import random” “random.randint(0, 1000)”
    100000 loops, best of 3: 2.2 usec per loop

    Or use numpy if the problem is vectorizeable (one float -> numpy array of floats), but even a single one seems to be faster:

    [dev35] λ python -m timeit -s “import numpy” “numpy.random.randint(0, 1000)”
    1000000 loops, best of 3: 0.45 usec per loop

    # array of 1000 ints:
    [dev35] λ python -m timeit -s “import numpy” “numpy.random.randint(0, 1000, 1000)”
    100000 loops, best of 3: 10.8 usec per loop

Leave a Reply

Your email address will not be published. Required fields are marked *