Entropy-efficient Computing
Microprocessors and storage devices are subject to the second law of thermodynamics: using them turn usable energy (oil, hydrogen) into unusable energy (heat). Data centers are already limited by their power usage and heat production. Moreover, many new devices need to operate for a long time with little power: (1) smart phones (2) powerful computing devices inserted into our bodies (3) robots shipped in space.
Our approach to entropic efficiency remains crude. We improve power supply. We shut down disks and CPUs when they are idle. Deeper questions arise however:
- Except maybe for embarrassingly parallel problems (such as serving web pages), parallel computing trades entropic efficiency for short running times. If entropic efficiency is your goal, will you stay away from non-trivial parallelism?
- We analyze algorithms by considering their running time. For example, shuffling an array takes time O(n) whereas sorting it takes time O(n log n). Yet, unlike sorting, I expect that shuffling an array should be possible without any entropy cost (heat generation)!
Suppose I give you a problem, and I ask you to solve it using as little entropy as possible. How would you go about it?
Further reading:
- M. P. Frank, Physical Limits of Computing, Computing in Science and Engineering, vol. 4 (3), May/June 2002, p. 16-25.
- R. Landauer, Irreversibility and Heat Generation in the Computing Process, IBM. Journal of Research and Development, vol. 5, no 3, 1961.
- (Updated) C. H. Bennett, Logical reversibility of computation, IBM journal of Research and Development, 1973.
There are several measures of the entropy, mostly known in CS is Shennon’s entropy. I have attempted to apply this kind entropy analysis to algorithms, but have not progress too far – basically it is possible to calculate entropy of the function output using histograms and dictionaries as in image processing. But how will you choose between 110 functions for same problem?
I am not sure you would like to calculate Boltzman’s entropy for the algorithm as algorithms can be implemented not on common hardware and thus thermodynamics entropy will be different.
But if I am going to minimize Boltzman’s entropy I would start from vectorisation and minimization of O number, then moving into hardware level optimization such as SSE2 instruction and so on.
Comment by Alex Mikhalev — 17/12/2009 @ 10:20
Daniel,
I’m not sure I follow your reasoning on shuffling without entropy cost. I assume that “entropy cost” would be defined as the generation of thermodynamic entropy, as that is the result of converting usable energy to unusable.
We must make a distinction between thermodynamic entropy and information entropy in this case. I would assume that any process carried by a computer that involves the conversion of electrical power to heat will generate entropy. Even if shuffling would be performed without energy use, the information entropy of the shuffled array is higher than the original.
JN
Comment by JN — 17/12/2009 @ 10:35
@Mikhalev I think it is entirely possible to address this problem from first principles, independently of specific hardware. Please see the two references I have added to my post and in particular Frank 2002.
More recently, there seems to be claims that quantum computation is reversible, thus does not produce any entropy (heat).
Comment by Daniel Lemire — 17/12/2009 @ 10:36
@JN Shannon’s entropy is related to Boltzman’s entropy. If you lower the entropy of an array in memory, then some other entropy must increase, thus heat must be generated.
If you raise the entropy of some array, then I expect there is no need (in principle) for much heat dissipation.
Comment by Daniel Lemire — 17/12/2009 @ 10:40
I don’t understand the distinction between shuffling and sorting. Didn’t Charlie Bennett show that any computation can be effected reversibly in principle? If this is correct, then the theoretical limit on entropy production for any computation is zero.
Regarding quantum computation, it is no more inherently reversible or irreversible than classical computation. Early on in QC research there were claims that QC *had* to be reversible due to the unitarity of the dynamics, unlike classical computing, which could be done either reversibly or irreversibly. However, this ignored the possibility of using measurements within the computation. There are both reversible and irreversible models of quantum computation, just as there are in the classical case. For example, the cluster state model is definitely irreversible, whereas the circuit model is reversible. The claims you are referring to are probably being made in the context of the quantum circuit model, since this is the most common model that is used. However, in this case, it should be absolutely no surprise that it is reversible because it is a simple generalization of the *classical* reversible circuit model.
Comment by Matt Leifer — 17/12/2009 @ 11:08
@Leifer
Given a fixed-memory computer, I don’t see how computation can be reversible. Any reference to this demonstration by Charlie Bennett?
As for the sorted array versus the shuffled array, I am working from a probabilistic model. Suppose the orderings of an array were picked at random. What is the probability that it will be sorted?
Comment by Daniel Lemire — 17/12/2009 @ 11:17
C. H. Bennett, “Logical reversibility of computation”, IBM J. Res. Dev., vol 17, pp 525-532 (1973).
It’s a classic. The trade-off between memory and reversibility has subsequently been studied at some length; I’ve only skimmed that literature, though, and don’t know the best references.
Lecerf also proved results about reversibility 10 years or so before Bennett. The paper is in French, though, which means I’ve never read it. I wish a translation were available.
Comment by Michael Nielsen — 17/12/2009 @ 12:25
@Nielsen
Thanks for the reference.
Fair enough. Reversible computations can buy high entropic efficiency. However, we don’t work with such machines. My hard drive generates heat when I write to it.
So, my question still stand.
Comment by Daniel Lemire — 17/12/2009 @ 13:02
OK, that makes sense now. I didn’t get the fixed memory bit before.
Does anyone know to what extent fixed memory quantum computing has been studied? I mean, I know of a few results about memory and time tradeoffs for particular algorithms, but they all assume that memory scales with input size. Can we do any quantum computation with constant memory? Given that it impacts reversibility, it seems like you might need to use some clever tricks to get it to work.
Comment by Matt Leifer — 17/12/2009 @ 13:09
shuffling an array takes time O(n2) whereas sorting it takes time O(n log n)
Depends…
For a given key size sorting an array is only O(n), the bucket sort!
This seemingly counterintuitive fact is because the sorting time is actually proportional to the total keys volume (key size x number of keys).
Usually the keys are way more larger than the minimum needed value 2**n bits so both statements are “true” sorting is O(n log n) in the worst case and O(n) in most practical cases.
Comment by Kevembuangga — 17/12/2009 @ 13:44
@Kevembuangga True. Presumably, you can also shuffle an array quite fast if there are few distinct values.
Comment by Daniel Lemire — 17/12/2009 @ 13:54
Oops! I made it upside down the minimum key size is log2(n) bits not 2**n.
BTW, a bucket sort analysis is here.
Comment by Kevembuangga — 17/12/2009 @ 14:05
rst I wanted to mention reversible computing… But then it stroke me as something we, computer scientists, do very often: refer back to theoretical results.
While these are great to get a strong insight on a problem, limiting our vision to theoretical results can sometime stop us from investigating problems on which we can make progress and have significant impact on the world. Last year I listened to a talk by UBC’s Kevin Leyton-Brown on empirical algorithms. To quote from his webpage: “Despite the fact that algorithms are human artifacts, I believe that their theoretical analysis should be supplemented using the same approaches that are used to study natural phenomena.” (http://people.cs.ubc.ca/~kevinlb/research.html)
I encourage you to read the rest. He makes the point that very complex algorithms have been devised to solve computationally hard problems. For these algorithms, empirical experiments can often give us a better insight into which variations are better, in practice, for the kind of problems we’re facing.
The computers we use — together with their array of peripherals — are also complex and ever-changing beasts. A principled and empirical study of their energy efficiency under different algorithms and different design decisions strikes me as an interesting research project. Is quicksort significantly less energy efficient than mergesort? Is there a way to design software as to minimize energy consuption? Are there “best practices” we can put in place so that both execution time and energy efficiency can be taken into account in a design decision? After all, the goal is to minimize money… And money is not just about time anymore.
Comment by Philippe Beaudoin — 17/12/2009 @ 14:22
Missing from the beginning of my comment:
At first [...]
Comment by Philippe Beaudoin — 17/12/2009 @ 14:23
Thank you Philippe.
Comment by Daniel Lemire — 17/12/2009 @ 14:49
Such awesome folks in this comment thread! I envy you Daniel. Just sayin’.
Comment by Seb — 17/12/2009 @ 15:08
If we want to minimize heat generation, then just do as little work as possible. So it goes reduces to time complexity in theory?
In practise those crude tricks of improving hardware are likely to be best. Also, teaching programmers to use tricks such as doing work in bursts so that hardware is not on all the time.
Comment by J Kujala — 18/12/2009 @ 3:07
Daniel thank you very much for this discussion. The connection between Boltzman entropy and Shannon entropy is fairly interesting and important for me personally and is interesting scientific project. I will check references and will try to made it practical, as I need the new scientific ways to choose between several algorithms for one problem.
Comment by Alex Mikhalev — 18/12/2009 @ 5:03
@Kujala
(1) Suppose an algorithm runs faster, but keeps writing to disk large data structures, should you prefer it to an algorithm that avoids writing to disk as much as possible, but runs slower? (For example, the second algorithm might be using compression.)
(2) If you can use 4 CPUs to cut the running time in half, is that a good deal entropy-wise?
Comment by Daniel Lemire — 18/12/2009 @ 7:59
Daniel,
If I understand it correctly, a random shuffling takes only n swaps and n calls to rand() function. How can it be O(n^2)?
Comment by Itman — 18/12/2009 @ 11:17
@Itman
That’s true. Shuffling an array can be done in linear time. I fixed my post (after 2000 people saw my mistake… there is a lesson somewhere…)
No sure what I was thinking about…
Comment by Daniel Lemire — 18/12/2009 @ 11:22
@Daniel. No worries. We all make mistakes.
PS: I also wonder why would you expect energy-less shuffling?
Comment by Itman — 18/12/2009 @ 11:36
@Itman Because the shuffled array has less or as much entropy as the initial array (Boltzman’s entropy ). It does not take work to shuffle the molecules of a gas, for example. It takes work to organize these molecules into a crystal though.
Comment by Daniel Lemire — 18/12/2009 @ 11:48
To add a bit of pedantry to this otherwise interesting post, the lower bound on sorting is n.log(n) if you only use comparisons to sort.
Daniel, I have such faith in your knowledge and command of efficient algorithms that when I saw you claimed shuffling an array was O(n^2) I immediately assumed you meant a 2-dimensional, n x n array…
PS: the spam protection bug on Safari is annoying.
Comment by Cyril — 18/12/2009 @ 12:39
@Cyril
Given that I cannot get away by saying that sorting is in O(n log n), I would say that you are right to have faith in this blog. There is intense peer review!
I think that the Safari bug went away with the more recent versions of the browser.
Comment by Daniel Lemire — 18/12/2009 @ 13:40
You note that parallel programming can be used to do the same amount of work in the same time with less energy for embarrassingly parallel tasks. Maybe this isn’t restricted to embarrassingly parallel stuff like serving web pages though. Maybe parallelism can be used to reduce energy consumption.
Comment by beza1e1 — 21/12/2009 @ 17:57
Entropy is one of Physics Foibles. Murphys Law is the layman’s Entropy.
Comment by melvin gldstein — 29/9/2011 @ 12:59