Don’t assume that safety comes for free: a Swift case study

Most modern languages try to be “safer” by checking runtime values in the hope of producing more secure and less buggy software. Sadly, it makes it harder to reason about the performance of the code. And, sometimes, the cost can be far from negligible.

Let me illustrate it through a story using Swift programming (Apple’s new programming language).

How do you sum the values in an array? If you are an idiot like myself, you might use a for loop:

var s = 0
for i in array {
    s += i

According to my benchmark, this runs at a speed of 0.4 ns per array element. My Skylake processor runs at a flat speed of 3.4GHz, so that we are using a bit over one CPU cycle per array element (1.36 to be precise). Not bad, but we know we could do better. How could Swift get something so simple “wrong”? We shall come back to this issue.

We can get fancier and compute the sum using “functional programming” with a call to reduce. It is slightly shorter and more likely to get you a job interview:

array.reduce(0,{$0 + $1})

For extra obfuscation and even better luck during interviews, you can even use an equivalent and even shorter syntax:


Though it is more scientific-sounding, this function is actually 50% slower (clocking at 0.6 ns per element). Why is it slower? We are evidently paying a price for the higher level of abstraction.

Can we do better?

Swift’s integers (Int) are 64-bit integers on 64-bit platforms. What happens if the sum of the values cannot be represented using a 64-bit integer (because it is too large)? Swift helpfully checks for an overflow and crashes if it occurs (instead of silently continuing as Java, C or C++ would). You can disable this behavior by prefixing your operators with the ampersand as in this code…

var s = 0
for i in array {
    s &+= i

or this one…

array.reduce(0,{$0 &+ $1})

These are “unsafe” functions in the sense that they might silently overflow.

On my Skylake processor, the silent-overflow (“unsafe”) approach can be twice as fast:

techniqueclock cycles per value
simple loop1.36
"unsafe" simple loop (&+)0.65
"unsafe" reduce (&+)1.87

So between the “safe” reduce and the “unsafe” simple loop, there is a factor of 3. I consider such a difference quite significant. The “unsafe” simple loop has the same speed as a C function: that’s what we want, ideally.

All these functions achieve nearly the same best performance if you disable safety checks at build time (swift build --configuration release -Xswiftc -Ounchecked). This means, in particular, that we get the fancy abstraction (the reduce approach) for free, as long as we disable safety checks. That makes performance a lot easier to understand.

So why the large performance difference? With regular checked arithmetic in a for loop, Swift generates add instructions (addq) followed by a condition jump (jo). Without checked arithmetic, it is able to vectorize the sum (using paddq instructions). (Thanks to John Regher for putting me down on this path of analysis.) To paraphrase Steve Canon, “overflow checking itself isn’t very expensive, but it inhibits other optimizations.”

You might notice that I am using quotes around the terms “safe” and “unsafe”. That’s because I do not think it is universally a good thing that software should crash when an overflow occurs. Think about the software that runs in your brain. Sometimes you experience bugs. For example, an optical illusion is a fault in your vision software. Would you want to fall dead whenever you encounter an optical illusion? That does not sound entirely reasonable, does it? Moreover, would you want this “fall dead” switch to make all of your brain run at half its best speed? In software terms, this means that a mission-critical application could crash because an unimportant function in a secondary routine overflows. This is not a merely hypothetical scenario: an overflow in an unnecessary software component halted the software of the Ariane 5 rocket leading to a catastrophe.

My Swift and C code is freely available.

Further reading: We Need Hardware Traps for Integer Overflow and Understanding Integer Overflow in C/C++

Appendix: Overflow checks are not unique to Swift. You can compile your C and C++ with runtime overflow checks using LLVM clang (
) or GNU GCC (
). The Rust language checks for overflow in debug mode (but not in release mode).

Update: A HackerNews user commented on the performance issues:

Looking at the generated code, we have a couple issues that prevent full optimization. One of them I already knew about ( — as a debugging aid, we guard every trap we emit with the equivalent of an `asm volatile(“”)` barrier, because LLVM will otherwise happily fold all the traps together into one trap. We really want every trap to have a distinct address so that the debugger can map that trap back to the source code that emitted it, but the asm barrier prevents other optimizations as well. As for `reduce`, it looks like the compiler does inline away the closure and turn the inner loop into what you would expect, but for some reason there’s more pre-matter validation of the array than there is with a for loop. That’s a bug; by all means, reduce is intended to optimize the same.

Update 2: Nathan Kurz ran additional tests using arrays of different sizes…

$ swift build --configuration release
$ cset proc -s nohz -e .build/release/reduce

# count  (basic, reduce, unsafe basic, unsafe reduce)
1000      (0.546, 0.661, 0.197, 0.576)
10000     (0.403, 0.598, 0.169, 0.544)
100000    (0.391, 0.595, 0.194, 0.542)
1000000   (0.477, 0.663, 0.294, 0.582)
10000000  (0.507, 0.655, 0.337, 0.608)
100000000 (0.509, 0.655, 0.339, 0.608)
1000000000(0.511, 0.656, 0.345, 0.611)

$ swift build --configuration release  -Xswiftc -Ounchecked
$ cset proc -s nohz -e .build/release/reduce

# count  (basic, reduce, unsafe basic, unsafe reduce)
1000      (0.309, 0.253, 0.180, 0.226)
10000     (0.195, 0.170, 0.168, 0.170)
100000    (0.217, 0.203, 0.196, 0.201)
1000000   (0.292, 0.326, 0.299, 0.252)
10000000  (0.334, 0.337, 0.333, 0.337)
100000000 (0.339, 0.339, 0.340, 0.339)
1000000000(0.344, 0.344, 0.344, 0.344)

Getting a job in the software industry

I am routinely asked about how to break into the software industry as a programmer. It is true that there is high demand for qualified programmers, but not all jobs are equal. There are good and bad tracks. And most jobs aren’t that good.

How do you get a good job as a programmer? Here is my somewhat educated answer:

  • Make sure you really want to be a programmer. It is hard work. Can you spend hours reading or writing technical documentation? Can you spend hours reviewing someone’s code? People don’t get into programming because you like video games or because you like the salaries. It is not going to end well.
  • It is entirely possible to get a good job without any formal training as a programmer. Good programmers can recognize one of their own without checking a resume. However, the overwhelming majority of programmers with great jobs seem to have some college education. At a minimum, you need to be able to communicate clearly (in English) and know basic mathematics. You can do a lot of useful programming without knowing about logarithms, probabilities, standard deviations, functions, set theory… but you are just going to be a lot less convincing in some contexts. So if you have no college education whatsoever, it might be easier to get some. At the other end of the spectrum, having many advanced degrees might not help as much as you’d expect. The prestige of your college probably does not matter a whole lot, and you can get good mileage out of things like Coursera.
  • You need to learn to recognize good jobs. A good tell is the people interviewing you… if there is nobody in charge of recruiting that has a strong background in programming, run away. I’d make an exception if it is a start-up and you are the first programmer being hired… but then you should probably be getting some equity in the company.
  • It is tempting to look at job ads and focus on getting all the right keywords on your resume. But without an in-depth knowledge of the domain, it can be a misleading activity because (1) most jobs being advertised are not good jobs and (2) not all keywords have equal weight.
  • Right now, it probably pays to get yourself a GitHub profile and to contribute usefully to some projects. Show that you can be a team player, show that you can follow-through and produce useful code. Getting a StackOverflow profile and contributing to the community is probably wise. There might be other online communities that are just as good, and these things tend to evolve over time… it is possible that in 5 years, GitHub will be replaced entirely by something better. You need to keep up.
  • Get involved with local programming-oriented meetups. Show up, ask questions, give talks. You can get to meet some of the best engineers from some of the best companies, and they might be quite happy to discuss job prospects.
  • You may need a little bit of shallow knowledge about all sorts of things, especially the latest trends. It probably pays to follow news sites catering to programmers: Hacker News, Slashdot, Reddit, blogs, Twitter…
  • Technical interviews can be quite difficult. I am not exactly sure why the interviews are so difficult, but they can be a lot harder than your average college exam. It probably pays to train with the help of a book or two.

Software evolves by natural selection

Software evolves by natural selection, not by intelligent design. It is a massive trial-and-error process. There are many thousands of programmers working every day to build new things in the hope of replacing the old…

From time to time, you will hear about a new fantastic piece of computer science. For example, right now deep learning is the hot new thing. Some years ago, people were very excited about MapReduce.

As an ecosystem changes, some tools become less likely to be useful while others gain dominance in common use cases. So, because of a myriad of other changes, deep learning is suddenly a lot more useful than it would have been 10 years ago. Deep learning is great to classify pieces of data like images, given that you have the kind of computing power a modern GPU offers. Decades ago, we had much less data, and nothing like a modern GPU, so perceptrons (the ancestors of deep learning) were useless.

What does it mean for the programmer? If software advances by trial and error, then the more quickly you can move and adapt, the better are your chances. It also becomes important to anticipate changes and to try many things. Trial-and-error is an underrated approach in software.

You will do doubt object that you are using the latest stuff… maybe it is deep learning running on the sexiest big-data framework (e.g., Apache Spark) with the fanciest programming language. So you are set, right? Maybe but be concerned that even a brand new bus turns slowly. Don’t be a bus.

So what is the software of the future? Where should you place your bet?

I have been making a boring prediction for more than a decade, and it has served me well: we will have more data. We should think about what could become possible if we were given 10x as much data and 10x as much processing power. What kind of software would we need then?

It is a safe bet that current PC architecture cannot cope. So do not get too attached to it.

On metadata

I remember a time, before the Web, when you would look for relevant academic papers by reading large books with tiny fonts that would list all relevant work in a given area published in a given year. Of course, you could have just gone to the shelves and checked the research articles themselves but, by for a slow human being, this would have been just too time consuming. These large volumes contained nothing by “metadata”: lists of article titles, authors, keywords… They were tremendously valuable to the researchers.

One of the earliest applications of computers was to help manage document collections. In this sense, the Web and Google are very naturally applications for computers. In the early days, computers could not have access to the books themselves but human experts (librarians) could enter the precious metadata in the computers to make them useful.

This was at a time when many computer scientists were worried that computers would be starved for data. Indeed, there are only so many highly paid experts who will sit quietly at a computer entering valuable data.

Back then was an era when people dreamed of intelligent computers. So they imagined a world where human experts would enter data and rules in computer systems, and that these rules would allow these computers to match or even outmatch human intelligence.

History proved these early artificial-intelligence researchers wrong. This “classical” approach toward machine intelligence did not bear fruit. It is not, as you might imagine, that these dreamers lacked support. They got billions of dollars in funding from just about every government agency you can think of, including the military.

More recently, this classical artificial-intelligence approach has lead to the semantic web (1998) and its idea of “linked data”. The thesis of this approach is to supplement the existing web with a thick layer of “RDF” that describes the resources and enables advanced machine reasoning.

There are a few problems that make classical artificial intelligence a bad approach:

Collecting, curating and interpreting a large volume of predicates in an open world is a fundamentally intractable problem. If you must track the list of authors, the publication date and the topic of a set of books, and you have enough highly trained librarians, it will work. But the system fails to scale up.

If you have enough time and money, you can build things up, to a point. Walmart, the world’s largest private employer, has an extensive supply chain that needs to track precise and accurate metadata about a vast range of products. It works. And it made Walmart very rich and powerful.

But we should not underestimate the difficulty. When the American retailer Target tried to enter the Canadian market, they ended up with empty shelves. These empty shelves lead to poor business and to a massive failure that will be a subject of study for years to come. Why were the shelves empty? Because Target tried to build a new inventory management system and failed. What does an inventory management system do? Essentially, it is a metadata system. To build it up quickly, they had their employees work at collecting and entering the data, days after days… And then they face reality: when the metadata was there at all, it was often badly wrong. This made it difficult if not impossible to keep the shelves filled up.

Walmart has a worldwide distribution and supply-chain system, with well-oiled computing systems. But getting it to work is not a secondary or trivial activity. It is Walmart’s core business. It takes enormous effort for everyone involved to keep it going. We know because it is hard for other people to reproduce it, even when they have tremendous resources.

Even the simplest things can become headaches. Creative Commons allows you to specify a license for you work. A population choice is to allow “noncommercial” use. But does that mean? Reasonable people can disagree. Is a class given at a public college a “commercial” or “noncommercial” activity? Dublin core aims at specifying simple things like the author and title of a piece of work, but when you examine the generated metadata, there is widespread disagreement about what the various attributes mean.

In the early days of the web, Yahoo! was the standard search engine. It worked through a taxonomy of websites. You looked for a taxidermist in your town by drilling down to the right category. But despite billions in revenues, Yahoo! could not keep this system going. It collapsed under its own weight.

This should not be underestimated. Most metadata is unreliable. Maintaining high-quality data is simply hard work. And even when people have good intentions, it takes more mental acuity than you might think. For example, in the Target Canada case, product dimensions were sometimes entered in inches, sometimes in centimeters. You might dismiss such errors as the result of hiring poorly trained employees… but the NASA Mars Climate Orbiter crashed because highly paid engineers got confused and used English units instead of metric units.

One of the problems with metadata in the real world is that you are in an adversarial setting. So assuming that the people entering the metadata are motivated and highly trained, you still have to worry that they are going to lie to you. That’s what happened with Yahoo!, and what happens with a lot of metadata. Suppose that I am a book editor and I am asked to classify the book… and I know that customers will browse the content according to my classification… my incentive is strong to fit my product in the most popular categories.

To make matters worse, metadata systems are not objectively neutral entities on their own. They were designed by people with biases. There is no such thing as an objective oracle in the real world. If you have ever had to reconcile different databases, data stored in different formats, you know how painful it can be. There are simply different and equally valid ways to describe the world.

You would think that errors can be objectively assessed… and certainly a box measures about 12 inches or it does not. However, the level and type of determinism in natural languages has evolved and reached a sweet spot. We are neither fully pedantic nor entirely vague in our use of a natural language. There is plenty of room for reasonable people to disagree about the meaning of the words, about how to categorize things. Is Pluto a planet? Well. It is a dwarf planet. Is that like a small planet or not a planet at all?

We silently make some assumptions. One of them is that the world can be understood in a top-down fashion… we start from abstract concepts and work our way to concrete cases. Peter Turney argues that though we think that “red” and “ball” are the fundamental logical atoms, the opposite might be true. We first get a “red ball” as the core logical atom, and the most abstract notions, such as “red” or “ball” are derived from the more concrete ones.

So far, I have been addressing the case of simple metadata. The kind that can help you determine whether a book was published in June or July of this year. But if you want to build “intelligent” software, you need a lot more than that. You need to represent your knowledge about the world in a formal form.

How hard is that? Let us turn to mathematicians. Surely representing mathematics formally is the easiest thing. If you can’t formalize mathematics, what chances do you have in the real world? So for many decades, a collection of super smart mathematicians (under the pseudonym Bourbaki) attempted to give mathematics a formal foundation. It was interesting and challenging, but, ultimately, it just proved too difficult to pursue.

But wait! Isn’t Google able to go through your pictures and find all pictures of a cat? Isn’t Google able to take a picture of your house and find out your street address?

Yes. But this muscle behind these feats has little to do with the prowesses of metadata. You see, just as classical artificial intelligence folks pursued their dream of cataloging all knowledge and achieving intelligence in this manner… Other people thought that intelligence was more like a probabilistic engine. It is more about statistics than reasoning. We usually refer to this approach as “machine learning” meaning that the knowledge enters the machine through “learning”, and not through the wisdom of human experts.

The opposition between the two points of views has been entitled “Neats and scruffies“. The metadata and formal reasoning people are in the “neat” camp, while others are in the scruffies.

Google is the child of the scruffies. Google does not hire large sets of experts to catalog knowledge, the overwhelming majority of their effort is invested in machine learning. Of course, given a web page, Google has metadata… but the bulk of this metadata was derived by the machine.

Suppose you needed to build a representation of the world… would you recruit thousands of people who would fill out forms? How would you build software that can parse textbooks, Wikipedia, reference manuals…? The latter approach has to deal directly with ambiguity and contradictions, but it is also the only way to scale up to billions of facts.

So, at this point, if anyone asks you to invest a lot of time entering metadata in a system… you really ought to be skeptical. That’s not the approach that has taken over the world. It is the approach that has failed again, and again to deliver the promised goods.

Let me summarize. Representing knowledge formally is hard. Harder than intuition dictates. It is really expensive. It does not fare well once adversarial interests come in. You usually end up with faulty, barely usable data, either because people did not make the effort, because they gamed the system or because they made honest mistakes. And that’s for simple data such as maintaining an address book of your customers. Intelligent applications need finer representations, and then it gets really hard and expensive to get right. Meanwhile, approaches based on machine learning are getting better and better every year. They can defeat the best human players at most games, they can classify your pictures automatically, they can translate books for you… We have to embrace machine intelligence. The last thing we want our highly paid experts to do is to be filling out forms.

Further study: Everything is Miscellaneous by David Weinberger.

Credit: Thanks to Andre Vellino and Antonio Badia for their valuable feedback and contributions.

Framed in the past

We can’t predict the future. However, I still read futurologists like Calum Chace, Alvin Toffler, Bruce Sterling, Vernor Vinge, Joël de Rosnay and so forth. A good futurologist won’t even attempt to predict the future for real, but he will offer a new way to think about the present. Indeed, I think that to really understand the present, it useful to first put ourselves into an imagined future, and then look back. A fish can only see the pond if he first jumps out of it.

The default in our culture works in reverse. Instead of imagining the future and thinking back, we put ourselves in the past. Instead of thinking as futurologists, we think as historians.

People buy phones. In 2016. Of course, they actually mean portable computers with a touch screen. Nobody in the 1970s would recognize an iPhone as a phone. It is actually far superior to the Star Trek communicators that people in the 1970s saw on TV. That should not be surprising given that an iPhone is more powerful than any computer that existed in the 1970s. But we still frame these portable computers as things (phone) from the past.

We talk about shows that appear (sometimes exclusively) on Netflix as “TV shows”, whereas a “TV” is entirely optional to watch the shows in question.

By framing radical innovations in the past, we are effectively dismissing the genuine novelty. That is the very purpose of this framing. It is hard to sell something new. So marketers have to relate it to what you know. Everyone knows about phones. They are useful. Companies pay for them. So let us sell phones, even if they are no longer phones.

I was invited to talk at a major music industry event in Montreal recently. People complained repeatedly that they had a harder time selling discs. They were angry. Yes, it is not a typo. In Montreal, people in the music industry still talk about selling discs. Of course, they mean, I suppose, getting royalties from people listening to their music online. Or maybe that’s not what they mean. I am not sure. What is sure is that they frame their problem in the past.

A colleague conducted a set of interviews with education professors specializing in distance learning. They were asked whether technology (the web, basically) changed distance learning. Their answer was unequivocal. It did not. The change is superficial, they say. I am sure that they are being truthful: they view contemporary distance learning as it was 20 years ago.

You will find exactly the same thing in a computer-science department. Did the technology of the past 20 years disrupt software? You will have no trouble finding professors who will carefully explain that nothing at all is new.

Past-tense framing is useful in that it gets people to accept radical change. But that’s not entirely a good thing: it also prevents people (including smart people) from thinking clearly about the present. If the present is like the past, then whatever worked in the past will work in the present.

Historians are fond of quoting Santayana: “Those who cannot remember the past are condemned to repeat it.” I prefer to say that those who can’t see different futures won’t have any say in the present.

As you organize your upcoming industry event, you should probably recruit speakers who can tell you about the future of your industry. Of course, they are going to be wrong, but you can be wrong in interesting and useful ways.

My review of Deus Ex: Mankind Divided (video game)

The Deus Ex series is set in a dystopian futuristic universe. (It reminds me a bit of Japan’s Ghost in the Shell.) The latest game in the series (Deux Ex: Mankind Divided) is set less than 15 years in the future (2029). In this future, many people have artificial organs (eyes, legs, hands) called “augmentations” because they are superior to biological organs. You play as Adam Jensen, a man who barely survived a terrible attack… but was given superior augmentations by the corporation he works for. So you are something of a super-hero. As the name suggests (“mankind divided”), people with augmentations are oppressed and put in ghettos. Oddly enough, you are free to go and generally given a lot of respect. There is a lot of terrorism, organized crime, high-level conspiracies, perverted media…

I am not exactly sure whether the authors believe that they are offering a credible depiction of the future. I don’t think that the game will age well in this respect. In the game, laptops are omnipresent. If we have the technology to replace your lungs with better artificial organs… why would we carry around bulky 2010-era laptops? There is a single virtual-reality setup, and it takes a whole room. I would think that by 2029, virtual-reality gear would be as convenient as a pair of regular glasses? There are many oppressed workers, lots of poverty… but few robots and no talk of technological unemployment. There are cults embracing the technological singularity, but we see no evidence that technology is changing at an accelerated pace.

Ok. It is a video game, not a futurology essay.

So how is the game?

The fun part in Deus Ex is that there are many different ways to do your missions. The game invites you to be creative and to have fun. And you need to be a bit clever: Adam Jensen is hardly invincible, he is just one man. You are just not going to destroy gangsters with machine guns on frontal assault, as you would in other games. You need to look around for an alternate path. You get to hack robots, turrets… knock quite a few heads… steal secret documents… sneak in people’s apartments…

It is the first game in the Deux Ex series that I can recommend. If you have played the previous ones, this one will look familiar… but everything is smoother. The level design was thoroughly tested and finely tuned. I never got stuck. Also, it helps that this game was designed for today’s beefier hardware.

The major downside is that the scenario feels like gibberish. The theme and the mood are well done, but the authors chose to play the “layer-upon-layer mystery” card. There are bad guys pulling the strings, but it all feels overengineered in the end. The final boss is evil alright, but his motivations seem all too complicated. There are too many characters pulling the strings. I chose to ignore all of it while playing.

Other recent games that I have really liked: Uncharted 4 and Last of us. Last of us is my favorite game of all times.

Who is keeping an eye on the tech companies?

A corporation such as Spotify was founded a few years ago by a 23-year-old man, and it now plays a key role in the music industry. YouTube is now the big-boss of music delivery. Had I predicted these things ten years ago, you would have thought me odd.

Kurzweil, the inventor, popularized the notion of a technological singularity. It is a moment in our history where things move so fast, we can no longer comprehend the world nor make even short-term predictions. Are you game to predict the next ten years of the music industry?

Let us think about the present.

There is an explosion of diversity because it becomes affordable. Spotify can offer millions of songs. This was unthinkable in the past. The net result is that more musicians than ever can be found by more people than ever.

Chris Anderson in 2004 promised us a world where more of us could make a living on the “long tail” (the long stream of less popular products and services) without needing a big break. Is it what happened? I think we still debate it. Recently, a study on the Google Play store found that it is a super-star market where the bulk of the money ends up in the pocket of the few, with most people hardly making a penny. Should we blame the bulk of the players for their bad luck?

Having your data stored in the index is not enough to be found. Being visible is not the same thing as being found. When you use Google to search for something, it can return 20,000 relevant pages, but you are unlikely to look at more than 3 or 4. Tools like Spotify are not different. They use recommender systems or, in other words, AI, to help us find what we are looking for. They think for us.

People don’t always realize that tools such as Google know about you and provide personalized results. These tools are mirrors, necessarily imperfect ones. They are not neutral.

What do I mean by neutral? When using polls to predict election results, we measure what people think, but we also change what people think by publishing the results. A top-10 list in music is not just a mirror, it is also an active agent that changes what people listen to.

The rules of engagement also matter. Our democratic systems are often setup to favor the emergence of two dominant parties. It is the intended effect. It couldn’t be more obvious in the USA. Yet who is setting up the rules in the digital world? Do you know?

The big players like Google or Spotify have access to a remarkable wealth of data. They know who you are, where you are and what you do. They know your age, your race, your sexual orientation. No business could have dreamed of having so much data 30 years ago.

As such, it is not harmful.

But any complex and powerful system can have unexpected and undesirable effects. For example, “there’s a general tendency for automated decisions to favor those who belong to the statistically dominant groups.”

At a minimum, I think we should study what systems like Google and Spotify do so we can discuss it openly.

Problems don’t necessarily arise because the big players are malicious. Datta et al. (2015) found that online ads for highly paid jobs tended to be shown more often to men than women. It seems that the reason is that younger women are a prized demographic so they get ads from advertisers with deeper pockets instead.

What could we do in concrete terms? We could use volunteers who agree to have some of their Internet interactions monitored. We could also proceed with automated sousveillance, where we create fake accounts to keep an eye on the big players.

We already keep track of journalists, to check whether they are being fair, I think we should spend a bit more time tracking Google, Apple, Amazon, Spotify… This would seem only prudent in such a fast changing world.

Update to my VR bet with Greg Linden

I have an ongoing bet with Greg Linden stating that we are going to sell 10 million virtual-reality (VR) units per year by 2019.

I have been paying close attention to VR technology and its impact.

What have we learned in the last few months?

  • The technology is fantastic. VR headsets work.
  • The software is currently the weak point. Besides games and “experiences”, we simply have no compelling application. And while there are some good games, none of them is good enough to motivate millions into purchasing a VR headset.
  • A nice surprise: Sony has managed to make VR work with the relatively underpowered PlayStation 4. The initial reports are very positive. I think that’s important: it shows that relatively weak and inexpensive processors are sufficient for VR.

I key indicator is how many VR headsets Sony manages to sell over Christmas.

Intel will add deep-learning instructions to its processors

Some of the latest Intel processors support the AVX-512 family of vector instructions. These instructions operate on blocks of 512 bits (or 64 bytes). The benefit of such wide instructions is that even without increasing the processor clock speed, systems can still process a lot more data. Most code today operators over 64-bit words (8 bytes). In theory, keeping everything else constant, you could go 8 times faster by using AVX-512 instructions instead.

Of course, not all code can make use of vector instructions… but that’s not relevant. What matters is whether your “hot code” (where the processor spends much of its time) can benefit from them. In many systems, the hot code is made of tight loops that need to run billions of times. Just the kind of code that can benefit from vectorization!

The hottest trend in software right now is “deep learning”. It can be used to classify pictures, recognize speech or play the game of Go. Some say that the quickest “get rich quick” scheme right now is to launch a deep-learning venture, and get bought by one of the big players (Facebook, Google, Apple, Microsoft, Amazon). It is made easier by the fact that companies like Google have open sourced their code such as Tensorflow.

Sadly for Intel, it has been mostly left out of the game. Nvidia graphics processors are the standard off-the-shelf approach to running deep-learning code. That’s not to say that Intel lacks good technology. But for the kind of brute-force algebra that’s required by deep learning, Nvidia graphics processors are simply a better fit.

However, Intel is apparently preparing a counter-attack, of sort. In September of this year, they have discreetly revealed that their future processors will support dedicated deep-learning instructions. Intel’s AVX-512 family of instructions is decomposed in sub-families. There will be two new sub-families for deep-learning: AVX512_4VNNIW and AVX512_4FMAPS.

A case study in the performance cost of abstraction (C++’s std::shuffle)

Statisticians and machine-learning experts sometimes need to shuffle data quickly and repeatedly. There is one standard and simple algorithm to shuffle an array, the so-called Fisher-Yates shuffle:

for (i=size; i>1; i--) {
  nextpos = random_numbers_in_range(0,i);
  swap(storage[i-1], storage[nextpos]);

Not very difficult, is it? The C++ programming language, like many others, have a standard function to solve this problem (std::shuffle).

How does it fare against a very basic Fisher-Yates shuffle without any optimization whatsoever? To make sure that the comparison is fair, let us work with the same data (an array of strings) and use the same random number generator (I chose PCG). To avoid caching issues, let us use a small array that fits in cache.

Here is my unoptimized C++ code:

template <class T>
void  shuffle(T *storage, uint32_t size) {
    for (uint32_t i=size; i>1; i--) {
        uint32_t nextpos = pcg32_random_bounded(i);

I claim that this is the “textbook” implementation… meaning that it is the closest thing you can get to taking a textbook and coding up the pseudo-code in C++. (Yes I know that people copy code from Wikipedia or StackOverflow, not textbooks, but humor me.)

The pcg32_random_bounded function I call is implemented in a standard (but suboptimal way) to get a random number in a range with two divisions. You can do it with a single multiplication instead but let us ignore optimizations.

Here are my results, expressed in CPU clock cycles per value… (Skylake processor, clang++ 4.0)

techniqueclock cycles per value
textbook code29

So the textbook code is twice as fast as the standard C++ function.

Why is that?

It is often the case that default random number generators are slow due to concurrency issues. But we provide our own random number generators, so it should not be an issue.

A cursory analysis reveals that the most likely reason for the slowdown is that the standard C++ library tends to use 64-bit arithmetic throughout (on 64-bit systems). I implicitly assume, in my textbook implementation, that you are not going to randomly shuffle arrays containing more than 4 billion elements. I don’t think I am being unreasonable: an array of 4 billion std::string values would use at least 128 GB of RAM. If you need to shuffle that much data, you probably want to parallelize the problem. But, from the point of view of the engineers working on the standard library, they have to work with the requirements set forth by the specification. So 64-bit arithmetic it is! And that’s how I can beat them without any effort.

The absolute numbers might also surprise you. The Fisher-Yates shuffle is very efficient. We do a single pass over the data. We do not really need to look at the data, just move it. We use a fast random number generator (PCG). How can we end up with 30 or 70 cycles per array element?

Part of the problem is our use of std::string. On my system, a single (empty) std::string uses 32 bytes whereas a pointer (char *) uses only 8 bytes. If we fall back on C strings (char *), we can accelerate the processing simply because there is less data to move. Without going overboard with optimizations, I can bring the computational cost to about 7 cycles per element by avoiding divisions and using C strings instead of std::string objects. That’s an order of magnitude faster than the standard shuffle function.

So while std::shuffle is very general, being able to sort just about any array using just about any random-number generator… this generality has a cost in terms of performance.

My code is available.