Who is keeping an eye on the tech companies?

A corporation such as Spotify was founded a few years ago by a 23-year-old man, and it now plays a key role in the music industry. YouTube is now the big-boss of music delivery. Had I predicted these things ten years ago, you would have thought me odd.

Kurzweil, the inventor, popularized the notion of a technological singularity. It is a moment in our history where things move so fast, we can no longer comprehend the world nor make even short-term predictions. Are you game to predict the next ten years of the music industry?

Let us think about the present.

There is an explosion of diversity because it becomes affordable. Spotify can offer millions of songs. This was unthinkable in the past. The net result is that more musicians than ever can be found by more people than ever.

Chris Anderson in 2004 promised us a world where more of us could make a living on the “long tail” (the long stream of less popular products and services) without needing a big break. Is it what happened? I think we still debate it. Recently, a study on the Google Play store found that it is a super-star market where the bulk of the money ends up in the pocket of the few, with most people hardly making a penny. Should we blame the bulk of the players for their bad luck?

Having your data stored in the index is not enough to be found. Being visible is not the same thing as being found. When you use Google to search for something, it can return 20,000 relevant pages, but you are unlikely to look at more than 3 or 4. Tools like Spotify are not different. They use recommender systems or, in other words, AI, to help us find what we are looking for. They think for us.

People don’t always realize that tools such as Google know about you and provide personalized results. These tools are mirrors, necessarily imperfect ones. They are not neutral.

What do I mean by neutral? When using polls to predict election results, we measure what people think, but we also change what people think by publishing the results. A top-10 list in music is not just a mirror, it is also an active agent that changes what people listen to.

The rules of engagement also matter. Our democratic systems are often setup to favor the emergence of two dominant parties. It is the intended effect. It couldn’t be more obvious in the USA. Yet who is setting up the rules in the digital world? Do you know?

The big players like Google or Spotify have access to a remarkable wealth of data. They know who you are, where you are and what you do. They know your age, your race, your sexual orientation. No business could have dreamed of having so much data 30 years ago.

As such, it is not harmful.

But any complex and powerful system can have unexpected and undesirable effects. For example, “there’s a general tendency for automated decisions to favor those who belong to the statistically dominant groups.”

At a minimum, I think we should study what systems like Google and Spotify do so we can discuss it openly.

Problems don’t necessarily arise because the big players are malicious. Datta et al. (2015) found that online ads for highly paid jobs tended to be shown more often to men than women. It seems that the reason is that younger women are a prized demographic so they get ads from advertisers with deeper pockets instead.

What could we do in concrete terms? We could use volunteers who agree to have some of their Internet interactions monitored. We could also proceed with automated sousveillance, where we create fake accounts to keep an eye on the big players.

We already keep track of journalists, to check whether they are being fair, I think we should spend a bit more time tracking Google, Apple, Amazon, Spotify… This would seem only prudent in such a fast changing world.

Update to my VR bet with Greg Linden

I have an ongoing bet with Greg Linden stating that we are going to sell 10 million virtual-reality (VR) units per year by 2019.

I have been paying close attention to VR technology and its impact.

What have we learned in the last few months?

  • The technology is fantastic. VR headsets work.
  • The software is currently the weak point. Besides games and “experiences”, we simply have no compelling application. And while there are some good games, none of them is good enough to motivate millions into purchasing a VR headset.
  • A nice surprise: Sony has managed to make VR work with the relatively underpowered PlayStation 4. The initial reports are very positive. I think that’s important: it shows that relatively weak and inexpensive processors are sufficient for VR.

I key indicator is how many VR headsets Sony manages to sell over Christmas.

Intel will add deep-learning instructions to its processors

Some of the latest Intel processors support the AVX-512 family of vector instructions. These instructions operate on blocks of 512 bits (or 64 bytes). The benefit of such wide instructions is that even without increasing the processor clock speed, systems can still process a lot more data. Most code today operators over 64-bit words (8 bytes). In theory, keeping everything else constant, you could go 8 times faster by using AVX-512 instructions instead.

Of course, not all code can make use of vector instructions… but that’s not relevant. What matters is whether your “hot code” (where the processor spends much of its time) can benefit from them. In many systems, the hot code is made of tight loops that need to run billions of times. Just the kind of code that can benefit from vectorization!

The hottest trend in software right now is “deep learning”. It can be used to classify pictures, recognize speech or play the game of Go. Some say that the quickest “get rich quick” scheme right now is to launch a deep-learning venture, and get bought by one of the big players (Facebook, Google, Apple, Microsoft, Amazon). It is made easier by the fact that companies like Google have open sourced their code such as Tensorflow.

Sadly for Intel, it has been mostly left out of the game. Nvidia graphics processors are the standard off-the-shelf approach to running deep-learning code. That’s not to say that Intel lacks good technology. But for the kind of brute-force algebra that’s required by deep learning, Nvidia graphics processors are simply a better fit.

However, Intel is apparently preparing a counter-attack, of sort. In September of this year, they have discreetly revealed that their future processors will support dedicated deep-learning instructions. Intel’s AVX-512 family of instructions is decomposed in sub-families. There will be two new sub-families for deep-learning: AVX512_4VNNIW and AVX512_4FMAPS.

A case study in the performance cost of abstraction (C++’s std::shuffle)

Statisticians and machine-learning experts sometimes need to shuffle data quickly and repeatedly. There is one standard and simple algorithm to shuffle an array, the so-called Fisher-Yates shuffle:

for (i=size; i>1; i--) {
  nextpos = random_numbers_in_range(0,i);
  swap(storage[i-1], storage[nextpos]);

Not very difficult, is it? The C++ programming language, like many others, have a standard function to solve this problem (std::shuffle).

How does it fare against a very basic Fisher-Yates shuffle without any optimization whatsoever? To make sure that the comparison is fair, let us work with the same data (an array of strings) and use the same random number generator (I chose PCG). To avoid caching issues, let us use a small array that fits in cache.

Here is my unoptimized C++ code:

template <class T>
void  shuffle(T *storage, uint32_t size) {
    for (uint32_t i=size; i>1; i--) {
        uint32_t nextpos = pcg32_random_bounded(i);

I claim that this is the “textbook” implementation… meaning that it is the closest thing you can get to taking a textbook and coding up the pseudo-code in C++. (Yes I know that people copy code from Wikipedia or StackOverflow, not textbooks, but humor me.)

The pcg32_random_bounded function I call is implemented in a standard (but suboptimal way) to get a random number in a range with two divisions. You can do it with a single multiplication instead but let us ignore optimizations.

Here are my results, expressed in CPU clock cycles per value… (Skylake processor, clang++ 4.0)

techniqueclock cycles per value
textbook code29

So the textbook code is twice as fast as the standard C++ function.

Why is that?

It is often the case that default random number generators are slow due to concurrency issues. But we provide our own random number generators, so it should not be an issue.

A cursory analysis reveals that the most likely reason for the slowdown is that the standard C++ library tends to use 64-bit arithmetic throughout (on 64-bit systems). I implicitly assume, in my textbook implementation, that you are not going to randomly shuffle arrays containing more than 4 billion elements. I don’t think I am being unreasonable: an array of 4 billion std::string values would use at least 128 GB of RAM. If you need to shuffle that much data, you probably want to parallelize the problem. But, from the point of view of the engineers working on the standard library, they have to work with the requirements set forth by the specification. So 64-bit arithmetic it is! And that’s how I can beat them without any effort.

The absolute numbers might also surprise you. The Fisher-Yates shuffle is very efficient. We do a single pass over the data. We do not really need to look at the data, just move it. We use a fast random number generator (PCG). How can we end up with 30 or 70 cycles per array element?

Part of the problem is our use of std::string. On my system, a single (empty) std::string uses 32 bytes whereas a pointer (char *) uses only 8 bytes. If we fall back on C strings (char *), we can accelerate the processing simply because there is less data to move. Without going overboard with optimizations, I can bring the computational cost to about 7 cycles per element by avoiding divisions and using C strings instead of std::string objects. That’s an order of magnitude faster than the standard shuffle function.

So while std::shuffle is very general, being able to sort just about any array using just about any random-number generator… this generality has a cost in terms of performance.

My code is available.

Variable-length strings can be expensive

Much of our software deals with variable-length strings. For example, my name “Daniel” uses six characters whereas my neighbor’s name (“Philippe”) uses 8 characters.

“Old” software often shied away from variable-length strings. The Pascal programming language supported fixed-length strings. Most relational databases support fixed-length strings. Flat files with fixed-length lines and fixed-length records were common in the old days.

It seems that people today never worry about variable-length strings. All the recent programming languages I know ignore the concept of fixed-length strings.

Yet fixed-length strings have clear engineering benefits. For example, if I were to store “tweets” (from Twitter) in memory, using a flat 140 characters per tweet, I would be able to support fast direct access without having to go through pointers. Memory allocation would be trivial.

However, maybe processors have gotten so smart, and string operations so optimized, that there is no clear performance benefits to fixed-length strings in today’s software.

I don’t think so.

Even today, it is often possible to accelerate software by replacing variable-length strings by fixed-length ones, when you know ahead of time that all your strings are going to be reasonably short.

To examine this idea, I have created lists of strings made simply of the numbers (as strings) from 0 to 1024. Such strings have length ranging between 1 and 4 characters. In C, we use null-terminated characters, so the actual length is between 2 and 5. In C++, they have standard strings (std::string), with significant overhead: on my system, a single std::string uses 32 bytes, not counting the string content itself.

Instead of using variable-length strings, I can “pad” my strings so that they have a fixed length (8 characters), adding null characters as needed.

How long does it take, in CPU cycles per string, to sort a short array of strings (1024 strings)?

string typeCPU cycles per element
C++ std::string520
standard C string300
padded C string140

So for this experiment, replacing variable-length strings by fixed-length strings more than double the performance! And my code isn’t even optimized. For example, to keep the comparison “fair”, I sorted pointers to strings… but my padded C strings fit in a machine word and do not require a pointer. So, in fact, fixed-length strings could be nearly three times faster with enough work.

To summarize: Variable-length strings are a convenient abstraction. You may hear that string operations are very cheap, unlikely to be a bottleneck and so forth… That might be so…

But I submit to you that the omnipresence of variable-length strings as a universal default can make us blind to very sweet optimization opportunities.

My source code is available.

Can Swift code call C code without overhead?

Swift is the latest hot new language from Apple. It is becoming the standard programming language on Apple systems.

I complained in a previous post that Swift 3.0 has only about half of Java’s speed in tests that I care about. That’s not great for high-performance programming.

But we do have a language that produces very fast code: the C language.

Many languages like Objective-C, C++, Python and Go allow you to call C code with relative ease. C++ and Objective-C can call C code with no overhead. Go makes it very easy, but the performance overhead is huge. So it is almost never a good idea to call C from Go for performance. Python also suffers from a significant overhead when calling C code, but since native Python is not so fast, it is often a practical idea to rewrite performance-sensitive code in C and call it from Python. Java makes it hard to call C code, so it is usually not even considered.

What about Swift? We know, as per Apple’s requirements, that Swift must interact constantly with legacy Objective-C code. So we know that it must be good. How good is it?

To put it to the test, I decided to call from Swift a simple Fibonacci recurrence function :

void fibo(int * x, int * y) {
  int c = * y;
  *y = *x + *y;
  *x = c;

(Note: this function can overflow and that is undefined behavior in C.)

How does it fare against pure Swift code?

let c = j;
j = i &+ j;
i = c;

To be clear, this is a really extreme case. You should never rewrite such a tiny piece of code in C for performance. I am intentionally pushing the limits.

I wrote a test that calls these functions 3.2 billion times. The pure Swift takes 9.6 seconds on a Haswell processor… or about 3 nanosecond per call. The C function takes a bit over 13 seconds or about 4 nanoseconds per iteration. Ok. But what if I rewrote the whole thing into one C function, called only once? Then it runs in 11 seconds (it is slower than pure Swift code).

The numbers I have suggest that calling C from Swift is effectively free.

In these tests, I do not pass to Swift any optimization flag. The way you build a swift program is by typing “swift build” which is nice and elegant. To optimize the binary, you can type “swift build --configuration release“. Nice! But benchmark code is part of your tests. Sadly, swift seems to insist on only testing “debug” code for some reason. Typing “swift test --configuration release” fails since the test option does not have a configuration flag. (Calling swift test -Xswiftc -O gives me linking errors.)

I rewrote the code using a pure C program, without any Swift. Sure enough, the program runs in about 11 seconds without any optimization flag. This confirms my theory that Swift is testing the code with all optimizations turned off. What if I turn on all C optimizations? Then I go down to 1.7 seconds (or about half a nanosecond per iteration).

So while calling C from Swift is very cheap, insuring that Swift properly optimizes the code might be trickier.

It seems odd that, by default, Swift runs benchmarks in debug mode. It is not helping programmers who care about performance.

Anyhow, a good way around this problem is to simply build binaries in release mode and measure how long it takes them to run. It is crude, but it gets the job done in this case:

$ swift build --configuration release
$ time ./.build/release/LittleSwiftTest

real       0m2.030s
user       0m2.028s
sys        0m0.000s
$ time ./.build/release/LittleCOverheadTest

real       0m1.778s
user       0m1.776s
sys        0m0.000s

$ clang -Ofast -o purec  code/purec.c
$ time ./purec

real       0m1.747s
user       0m1.744s
sys        0m0.000s

So there is no difference between a straight C program, and a Swift program that calls billions of times a C function. They are both just as fast.

The pure Swift program is slightly slower in this case, however. It suggests that using C for performance-sensitive code could be beneficial in a Swift project.

So I have solid evidence that calling C functions from Swift is very cheap. That is very good news. It means that if for whatever reason, Swift is not fast enough for your needs, you stand a very good chance of being able to rely on C instead.

My Swift source code is available (works under Linux and Mac).

Credit: Thanks to Stephen Canon for helping me realize that I could lower the call overhead by calling directly the C function instead of wrapping it first in a tiny Swift function.

Sorting already sorted arrays is much faster?

If you are reading a random textbook on computer science, it is probably going to tell you all about how good sorting algorithms take linearithmic time. To arrive at this result, they count the number of operations. That’s a good model to teach computer science, but working programmers need more sophisticated models of software performance.

On modern superscalar processors, we expect in-memory sorting to limited by how far ahead the processor can predict where the data will go. Though moving the data in memory is not free, it is a small cost if it can be done predictably.

We know that sorting “already sorted data” can be done in an easy-to-predict manner (just do nothing). So it should be fast. But how much faster is it that sorting randomly shuffled data?

I decided to run an experiment.

I use arrays containing one million distinct 32-bit integers, and I report the time in CPU cycles per value on a Haswell processor. I wrote my code in C++.

functionsorted datashuffled datasorted in reverse

For comparison, it takes roughly n log(n) comparisons to sort an array of size n in the worst case with a good algorithm. In my experiment, log(n) is about 20.

The numbers bear out our analysis. Sorting an already-sorted array takes a fraction of the time needed to sort a shuffled array. One could object that the reason sorting already-sorted arrays is fast is because we do not have to move the data so much. So I also included initial arrays that were sorted in reverse. Interestingly, std::sort is even faster with reversed arrays! This is clear evidence for our thesis.

(The C++ source code is available. My software includes timsort results if you are interested.)

Swift versus Java : the bitset performance test

I claimed online that the performance of Apple’s Swift was not yet on par with Java. People asked me to back my claim with numbers.

I decided to construct one test based on bitsets. A bitset is a fast data structure to implement sets of integers.

Java comes with its own bitset class called java.util.BitSet. I wrote three tests for it: the time it takes to add a million integers in sequence to a bitset, the time it takes to count how many integers are present in the bitset, and the time it takes to iterate through the integers.

Here are the results in Java :

git clone https://github.com/lemire/microbenchmarks.git
cd microbenchmarks
mvn clean install
java -cp target/microbenchmarks-0.0.1-jar-with-dependencies.jar me.lemire.microbenchmarks.bitset.Bitset

Benchmark Mode Samples Score Error Units
m.l.m.b.Bitset.construct avgt 5 0.008 ± 0.002 s/op
m.l.m.b.Bitset.count avgt 5 0.001 ± 0.000 s/op
m.l.m.b.Bitset.iterate avgt 5 0.005 ± 0.001 s/op

So all tests take at most a few milliseconds. Good.

Next I did my best to reproduce the same tests in Swift:

git clone https://github.com/lemire/SwiftBitset.git
cd SwiftBitset

swift test -Xswiftc -Ounchecked -s BitsetTests.BitsetTests/testAddPerformance
Test Case '-[BitsetTests.BitsetTests testAddPerformance]' measured [Time, seconds] average: 0.019

swift test  -Xswiftc -Ounchecked -s BitsetTests.BitsetTests/testCountPerformance
Test Case '-[BitsetTests.BitsetTests testCountPerformance]' measured [Time, seconds] average: 0.004

swift test  -Xswiftc -Ounchecked  -s BitsetTests.BitsetTests/testIteratorPerformance
Test Case '-[BitsetTests.BitsetTests testIteratorPerformance]' measured [Time, seconds] average: 0.010

These tests are rough. I got these numbers of my laptop, without even trying to keep various noise factors in check. Notice however that I disable bound checking in Swift, but not in Java, thus giving something of an unfair advantage to Swift.

But as is evident, SwiftBitset can be 2 times slower than Java’s BitSet. Not a small difference.

SwiftBitset is brand new whereas Java’s BitSet underwent thorough review over the years. It is likely that SwiftBitset leaves at least some performance on the table. So we could probably close some of the performance gap with better code.

Nevertheless, it does not make me hopeful regarding Swift’s performance compared to Java, at least for the type of work I care about. But Swift hopefully uses less memory which might be important on mobile devices.

Some people seem to claim that Swift gives iOS an advantage over android and its use of Java. I’d like to see the numbers.

Update. I decided to add a Go benchmark and a C benchmark. Here are my results:

Java’s BitSet8 ms1 ms5 ms
C’s cbitset11 ms0 ms4 ms
SwiftBitset19 ms4 ms10 ms
Go’s bitset18 ms3 ms13 ms

It might seem surprising to some, but Java can be a fast language. (The C implementation suffers from my Mac’s poor “malloc” performance. Results would be vastly different under Linux.)

My thoughts on Swift

Swift is a new programming language produced by Apple for its iOS devices (primarily the iPhone). It first appeared two years ago and it has been gaining popularity quickly.

Before Swift, Apple programmers were “stuck” with Objective-C. Objective-C is old and hardly ever used outside the Apple ecosystem.

Swift, at least the core language, is fully available on Linux. There are rumours that it should soon become available for Windows.

If you want to build mobile apps, then learning Swift is probably wise. But setting this context aside, how does Swift stands on its own?

To find out, I wrote and published a small Bitset library in Swift 3.0.

  • Like most recent languages (e.g., Rust, Go), Swift 3.0 comes with standard and universal tools to test, build and manage dependencies. In contrast, languages like C, C++ or Java depend on additional tools that are not integrated in the language per se. There is no reason, in 2016, to not include unit testing, benchmarking and dependency management as part of a programming language itself. Swift shines in this respect.
  • Swift feels a lot like Java. It should be easy for Java programmers to learn the language. Swift passes classes per reference and everything else per value though there is an inout parameter annotation to override the default. The ability to turn what would have been a pass-by-reference class in Java and make it a pass-by-value struct opens up optimization opportunities in Swift.
  • In Java, all strings are immutable. In Swift, strings can be either immutable or mutable. I suspect that this may give Swift a performance advantage in some cases.
  • Swift supports automatic type inference which is meant to give the syntax a “high-level” look compared to Java. I am not entirely convinced that it is actual progress in practice.
  • Swift uses automatic reference counting instead of a more Java-like garbage collection. Presumably, this means fewer long pauses which might be advantageous when latency is a problem (as is the case in some user-facing applications). Hopefully, it should also translate into lower memory usage in most cases. For the programmer, it appears to be more or less transparent.
  • Swift has operator overloading like C++. It might even be more powerful than C++ in the sense that you can create your own operators on the fly.
  • By default, Swift “crashes” when an operation overflows (like casting, multiplication, addition…). The intention is noble but I am not sure crashing applications in production is a good default especially if it comes with a performance penalty. Swift also “crashes” when you try to allocate too much memory, with apparently no way for the application to recover sanely. Again, I am not sure why it is a good default though maybe it is.
  • It looks like it is easy to link against C libraries. I built a simple example. Unlike Go, I suspect that the performance will be good.
  • Available benchmarks so far indicate that Swift is slower than languages like Go and Java, which are themselves slower than C. So Swift might not be a wise choice for high-performance programming.

My verdict? Swift compares favourably with Java. I’d be happy to program in Swift if I needed to build an iOS app.

Would I use Swift for other tasks? There is a lot of talk about using Swift to build web applications. I am not sure.

The rise of dark circuits

The latest iPhone 7 from Apple has more computing peak power than most laptops. Apple pulled this off using a technology called ARM big.LITTLE where half of the processor is only used when high performance is needed, otherwise it remains idle.

That’s hardly the sole example of a processor with parts that remain idle most of the time. For example, all recent desktop Intel processors come with an “Intel processor graphics” that can process video, replace a graphics card and so forth. It uses roughly half the silicon of your core processor but in many PCs where there is either no display or where there is a graphics card, most of this silicon is unused most of the time.

If you stop to think about it, it is somewhat remarkable. Silicon processors have gotten so cheap that we can afford to leave much of the silicon unused.

In contrast, much of the progress in computing has to do with miniaturization. Smaller transistors use less power, are cheaper to mass-produce and can enable processors running at a higher frequency. Yet transistors in the CPU of your computer are already only dozens of atoms in diameters. Intel has thousands of smart engineers, but none of them can make a silicon-based transistor with less than one atom. So we are about to hit a wall… a physical wall. Some would argue that this wall is already upon us. We can create wider processors, processors with fancier instructions, processors with more cores… specialized processors… but we have a really hard time squeezing out more conventional performance out of single cores.

You can expect companies like Intel to provide us with more efficient processors the conventional manner (by miniaturizing silicon transistors) up till 2020, and maybe at the extreme limit up till 2025… but then it is game over. We may buy a few extra years by going beyond silicon… but nobody is talking yet about subatomic computing.

I should caution you against excessive pessimism. Currently, for $15, you can buy a Raspberry Pi 3 computer which is probably closer than you imagine to the power of your laptop. In five years, the successor of the Raspberry Pi might still sell for $15 but be just as fast as the iPhone 7… and be faster than most laptops sold today. This means that a $30 light bulb might have the computational power of a small server in today’s terms. So we are not about to run out of computing power… not yet…

Still… where is the next frontier?

We can build 3D processors, to squeeze more transistors into a smaller area… But this only helps you so much if each transistor still uses the same power. We can’t pump more and more power into processors.

You might argue that we can cool chips better or use more powerful batteries… but none of this helps us if we have to grow the energy usage exponentially. Granted, we might be able to heat our homes with computers, at least those of us living in cold regions… but who wants an iPhone that burns through your skin?

How does our brain work despite these limitations? Our neurons are large and we have many of them… much more than we have transistors in any computer. The total computing power of our brain far exceeds the computing power of most powerful silicon processor ever made… How do we not burst into flame? The secret is that our neurons are not all firing at the same time billions of times per second.

You might have heard that we only use 10% of our brain. Then you have been told that this is a myth. There is even a Wikipedia page about this “myth”. But it is not a myth. At any one time, you are probably using less than 1% of your brain:

The cost of a single spike is high, and this severely limits, possibly to fewer than 1%, the number of neurons that can be substantially active concurrently. The high cost of spikes requires the brain not only to use representational codes that rely on very few active neurons, but also to allocate its energy resources flexibly among cortical regions according to task demand. (Lennie, 2003)

So, the truth is that you are not even using 10% of your brain… more like 1%… Your brain is in constant power-saving mode.

This, I should add, can make us optimistic about intelligence enhancement technologies. It seems entirely possible to force the brain into a higher level of activity, with the trade-off that it might use more energy and generate more heat. For our ancestors, energy was scarce and the weather could be torrid. We can afford to control our temperature, and we overeat.

But, even so, there is no way you could get half of your neurons firing simultaneously. Our biology could not sustain it. We would go into shock.

It stands to reason that our computers must follow the same pattern. We can build ever larger chips, with densely packed transistors… but most of these circuits must remain inactive most of the time… that’s what they call “dark silicon“. “Dark silicon” assumes that our technology has to be “silicon-based”, clearly something that may change in the near future, so let us use the term “dark circuits” instead.

Pause to consider: it means that in the near future, you will buy a computer made of circuits that remain mostly inactive most of the time. In fact, we might imagine a law of the sort…

The percentage of dark circuits will double every two years in commodity computers.

That sounds a bit crazy. This means that one day, we might use only 1% of the circuits in your processors at any one time—not unlike our brain. Though it sounds crazy, we will see our first effect of this “law” with the rise of non-volatile memory. Your current computer relies on volatile memory made of transistors that must be constantly “charged” to remain active. As the transistors stop shrinking, this means that the energy usage of RAM per byte will plateau. Hence, the energy usage due to memory will start growing exponentially, assuming that the amount of memory in systems grows exponentially. Exponentially growing energy usage is not good. So we will switch, in part or in full, to non-volatile memory, and that’s an example of “dark circuits”. It is often called “dark memory”.

You may assume that memory systems in a computer do not use much energy, but by several accounts, they often account for half of the energy usage because moving data is expensive. If we are to have computers with gigantic memory capacities, we cannot keep moving most of the data most of the time.

In this hypothetical future, what might programming look like? You have lots and lots of fast memory. You have lots and lots of efficient circuits capable of various computations. But we must increasingly “budget” our memory transfers and accesses. Moving data takes energy and creates heat. Moreover, though you might have gigantic computational power, you cannot afford to keep it on for long, because you will either run out of energy or overheat your systems.

Programming might start to sound a lot like biology.

Credit: This blog post benefited from an email exchange with Nathan Kurz.