Most Computer Science textbooks assume that algorithms are written directly into machine language for an idealized machine under a Von Neumann architecture. Alas, at best, these idealized models provide “ballpark” guidance to writing high performance software. They get you to avoid quadratic-time algorithms when linear-time algorithms are available. Yet there can be orders of magnitude of difference between two linear-time implementations.
Consider the following example. Sometimes you want to find the location of a maximum in an array. This operation is often called arg max. The Python language lacks a builtin arg max operation. In one of the embarrassing moments of this blog, I proposed a fancy way to compute arg max, using a single pass through the data and a constant amount of memory. It is what Computer Scientists would consider a good solution. The inelegant and inefficient alternative is to first compute the maximum, and then scan the array again to find a matching location. To my surprise, this simplistic solution was much more efficient than what I proposed. Today, I repeated the benchmark under Python 3.01:
def maxarg(arr): counter = 0 arg = 0 m = arr for x in arr: if x > m: m = x arg = counter counter += 1 return arg
max([array[i],i] for i in range(len(array)))
max((array[i],i) for i in range(len(array)))
(Numbers and benchmark updated as of June 15, 2011.)
All of these solutions except for the first one go through the data only once. Yet the first solution is nearly twice as fast as any other alternative.
It used to be that to write high performance software, you needed to know a lot about how the hardware worked. This era is coming to an end. The language interpreters are the new machines.
Challenge: Can you beat the silly arg max implementation (array.index(max(array))) using pure Python?
Appendix: In C++, the one-pass arg max is about twice as fast.
Code: Source code posted on my blog is available from a github repository.