Speed up Python without effort using generator expressions

Here are two ways to count the numbers from 1 to 1000000 in Python. First, the classical way:

sum([i for i in xrange(1000000)])

Runs 0.8s on an old Linux box. It uses quite a bit of memory.

Second, the better way:

sum((i for i in xrange(1000000)))

Runs 0.2s on the same box. It uses a tiny amount of memory.

That is right. The second option is 4 times faster! According to my tests, the second option remains better even if you only sum the numbers between 1 and 1000, though the benefits are then tiny.

Note: Yes, I am aware that I could simply do 1000000*(1000000-1)/2.

Source: Parand.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

3 thoughts on “Speed up Python without effort using generator expressions”

  1. Not only that, but expression 2 doesn’t leak a variable!

    $ python
    Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
    [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> i
    Traceback (most recent call last):
    File “”, line 1, in
    NameError: name ‘i’ is not defined
    >>> sum([i for i in xrange(100000)])
    4999950000L
    >>> i
    99999
    >>> sum(j for j in xrange(100000))
    4999950000L
    >>> j
    Traceback (most recent call last):
    File “”, line 1, in
    NameError: name ‘j’ is not defined
    >>>

  2. Running code through cProfile is not fair since it adds an overhead for each function call.

    Run these commands instead:

    echo “sum([x for x in xrange(1000000)])” > t1.py

    echo “sum((x for x in xrange(1000000)))” > t2.py

    python t1.py

    python t2.py

  3. Hi Daniel!
    Reading this post I try myself to do it. The results were not the same as you, I post it here. I work on python and I think that is correct that this happens, because generators in some operations are more expensive that simple list. when you have to read a big file generators are the solution, but in other cases lists are the solution!

    cProfile.run(‘sum((x for x in xrange(1000000)))’)
    1000004 function calls in 1.305 CPU seconds

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1000001 0.565 0.000 0.565 0.000 :1()
    1 0.000 0.000 1.305 1.305 :1()
    1 0.000 0.000 0.000 0.000 {method ‘disable’ of ‘_lsprof.Profiler’ objects}
    1 0.740 0.740 1.305 1.305 {sum}

    cProfile.run(‘sum([x for x in xrange(1000000)])’)
    3 function calls in 0.540 CPU seconds

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1 0.357 0.357 0.540 0.540 :1()
    1 0.000 0.000 0.000 0.000 {method ‘disable’ of ‘_lsprof.Profiler’ objects}
    1 0.182 0.182 0.182 0.182 {sum}

Leave a Reply

Your email address will not be published. The comment form expects plain text. If you need to format your text, you can use HTML elements such strong, blockquote, cite, code and em. For formatting code as HTML automatically, I recommend tohtml.com.

You may subscribe to this blog by email.