Here are two ways to count the numbers from 1 to 1000000 in Python. First, the classical way:

sum([i for i in xrange(1000000)])

Runs 0.8s on an old Linux box. It uses quite a bit of memory.

Second, the better way:

sum((i for i in xrange(1000000)))

Runs 0.2s on the same box. It uses a tiny amount of memory.

That is right. The second option is 4 times faster! According to my tests, the second option remains better even if you only sum the numbers between 1 and 1000, though the benefits are then tiny.

Disclaimer: Yes, I am aware that I could simply do 1000000*(1000000-1)/2.

Source: Parand.

3 Comments »

  1. Not only that, but expression 2 doesn’t leak a variable!

    $ python
    Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
    [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> i
    Traceback (most recent call last):
    File “”, line 1, in
    NameError: name ‘i’ is not defined
    >>> sum([i for i in xrange(100000)])
    4999950000L
    >>> i
    99999
    >>> sum(j for j in xrange(100000))
    4999950000L
    >>> j
    Traceback (most recent call last):
    File “”, line 1, in
    NameError: name ‘j’ is not defined
    >>>

    Comment by Carlos — 5/9/2008 @ 11:30

  2. Hi Daniel!
    Reading this post I try myself to do it. The results were not the same as you, I post it here. I work on python and I think that is correct that this happens, because generators in some operations are more expensive that simple list. when you have to read a big file generators are the solution, but in other cases lists are the solution!

    cProfile.run(‘sum((x for x in xrange(1000000)))’)
    1000004 function calls in 1.305 CPU seconds

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1000001 0.565 0.000 0.565 0.000 :1()
    1 0.000 0.000 1.305 1.305 :1()
    1 0.000 0.000 0.000 0.000 {method ‘disable’ of ‘_lsprof.Profiler’ objects}
    1 0.740 0.740 1.305 1.305 {sum}

    cProfile.run(‘sum([x for x in xrange(1000000)])’)
    3 function calls in 0.540 CPU seconds

    Ordered by: standard name

    ncalls tottime percall cumtime percall filename:lineno(function)
    1 0.357 0.357 0.540 0.540 :1()
    1 0.000 0.000 0.000 0.000 {method ‘disable’ of ‘_lsprof.Profiler’ objects}
    1 0.182 0.182 0.182 0.182 {sum}

    Comment by Andres — 19/9/2008 @ 9:47

  3. Running code through cProfile is not fair since it adds an overhead for each function call.

    Run these commands instead:

    echo “sum([x for x in xrange(1000000)])” > t1.py

    echo “sum((x for x in xrange(1000000)))” > t2.py

    python t1.py

    python t2.py

    Comment by Daniel Lemire — 19/9/2008 @ 10:53

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress