Innovation as a Fringe Activity

What do these people have in common: Marconi, Alexander Graham Bell, and the Steves Wozniak and Jobs? At least one commonality is that approximately nobody listened to them or cared about what they were doing until there was simply no way to ignore them anymore. In effect, they were living and working on the fringes of society and the only people paying them heed were the relevant subcultures that sprung up around them. Without them, we might still have had the various communications revolutions that came with telephones, radios (and their offspring, cell phones), computers and their offspring, smartphones. Even so, it is unfathomable that the world we know today would exist without innovators working on the fringes of society. From where we stand today, it’s easy to forget that almost every advance was not an obvious and perhaps necessary step. In fact, everything from automobiles to smartphones arose first in subcultures that were initially dismissed and often ridiculed.

Everybody wants innovation. Well, maybe not everybody, but I’m guessing that most people, and certainly most businesses, want new and better things and new and better ways of doing things. I think it’s safe to say that there are some of people and businesses who are interested in going beyond the creation of the merely new or improved product or service to the creation of new markets. Yet, oddly, most people and businesses and governments continue to dismiss, ridicule, or legislate against the activities of innovative subcultures. The most recent such examples come from digital media. The music lovers in the computing subculture were quietly sharing music among themselves until Napster came along. Napster was proof positive that there was a large market for digital music distribution. How did the music industry respond? Not with their own services and low prices and easy to use systems, but with legal challenges, lawsuits, and lobbying for new legislation and enforcement. There was literally nothing stopping the recording industry associations and companies from creating an iTunes-like product that would have made Napster (and maybe Apple!) irrelevant.

Did the movie and television industries learn anything from what happened with music? Of course not. We are still in the throes of a mighty battle to keep programming out of the hands of the consumer. The consumer is winning. The people in charge still haven’t figured out that the longer they fight the changes, the easier it will get for the consumer to get what they want for free. All they need to do is make the content available at a reasonable price. No geographical restrictions based on old distribution models. No anti-piracy messages or ads to tempt people into the pirate market for pure content. A variety of qualities available so that each consumer can get what’s right for their system without having to go to the pirate market for what they need. All of those things are possible today and have been possible for years, but the industry keeps fighting a battle that all before have lost.

The Internet has made it possible for subcultures to pop up almost overnight in response to some cool new thing (i.e. innovation). The Internet has made it possible for subcultures to go mainstream almost as quickly as they appear. Those are key innovations that are rarely recognized as such. Anyone who does not acknowledge those innovations and the associated rise in subcultures is destined for the dust heap. That applies to individuals who fail to adjust their skills, companies who fail to adopt new ways of doing things, and it will also apply to governments who fail to follow their constituents into the future.

What is innovation and why should we care? Depending on your objectives or how you are affected, innovation could be something as simple as letting people pump their own gas or as complex as designing and building a personal computer. We should care because all progress, all advance, all improvement comes from doing something new or at least doing something old in a new way. If you are a business owner, you need to be at least paying attention to innovative products, services, and processes so that you can judge how your business will be affected. If you are not running a business, you are probably working for one and being unaware of innovations in your field means that you risk becoming obsolete along with the skills you now have. Between 1970 and 2010, 40 years, computers, especially home computers, went from deep in fringedom to as common as televisions. Despite this unprecedented shift, home, and small business computing were still fringe activities and generally ridiculed for about 20 years.

Here is my favorite story of how innovation has affected an extremely large market segment: mail order shopping. In Canada, Eaton’s was once the market leader of mail order shopping. The catalogue at the beginning of the 20th Century described products from socks and wedding dresses to tools and houses. Yes, houses. You could order a house which would then be delivered as a kind of kit containing all the necessary lumber, doors, windows, etc. along with the plans. I don’t know how many actually did the building themselves, but that’s not really the point. Think about it: you got the Eaton’s Catalogue delivered by mail, you picked out a house and maybe some socks, you sent that order to Eaton’s by mail, and some time later your socks and house were delivered. Then Sears stepped in and added a distribution network. Roads and trucks were both much improved over the decades. In addition to the same kind of catalogue sales that Eaton’s was doing (maybe not houses), they set up little depots in almost every town and a great many villages. A depot might have been something as simple as a corner in an existing business, but there were catalogues, you could place your order without the cost of an envelope and stamp, and shipping was either free or greatly reduced in price. On top of that, there would be a few items on display so that you could actually inspect what you were ordering. As if that wasn’t enough, returns were as easy as if the depot was a ‘real’ store, making it possible to order a few items of clothing, try them on at home, and return what you didn’t want to the depot. Some depots even had fitting rooms to save the trip home. Needless to say, Sears eventually pushed Eaton’s out of the mail order business and Eaton’s ultimately closed their doors. The Sears depot fed by a fleet of trucks was not just an innovation; it was a disruptive innovation that ultimately forced their main competitor in the field out of business. Sears did not invent the road or the truck, but they did notice that the roads were getting better, road networks were expanding, and trucking was becoming a booming business. That story is not over yet. The Internet came along. At least initially, farmers and villagers had the same level of Internet access as city dwellers because everyone was just using acoustic modems. Note that those farmers and villagers were very important, because they were the heart of the mail order business. Along with the rise of the Internet, people started conducting business online in what can only be called an updated mail order system where the catalogue and orders moved across wires instead of through the mail. What Sears did next was, in my opinion, nothing short of amazing. They ignored the Internet and not just out of ignorance. There were any number of fringe players suggesting that they set up Internet-based catalogue shopping, and I’m willing to bet that some of those people were Sears employees. It was a deliberate dismissal of the technology and potential. They could have had computers in their little depots to accommodate those not quite ready to join the Internet revolution. They already had the distribution network in place, something that Amazon is only just getting around to dealing with and which anyone smaller than Amazon can’t even dream of. I can see how Amazon may have still managed to take over books, but I can’t imagine that they would be the retail powerhouse they are now if Sears had bothered to look into what the Internet had to offer. I’m convinced that we’d all be shopping at Sears instead of Amazon and shipping costs would not be such a major factor in purchase decisions.

Contrast that with Glen-L and Lee Valley. They are both relatively small businesses that did a lot of mail order business. In the case of Glen-L, that was their only business. If you wanted to buy plans to build your own boat, Glen-L was one of the major players and you had no choice but to deal with them through mail or phone orders. Lee Valley had storefronts, but mostly as a way for people to handle the tools they were interested in and for people to get advice from relevant professionals. Both were quick to embrace the Internet and both are still counted among the leaders in their markets.

Step back a bit and think about that. Mail order used to be something that virtually everyone needed to do in order to get products unavailable locally. Then it was something used only by people outside major centres, and even that was dwindling along with the rise in cheap, fast personal transportation. Then along comes online shopping and now the modern equivalent to mail order threatens the very existence of regular stores.

Amazingly, businesses still generally ignore, often willfully, the innovations that are happening around them and suffer accordingly. From those stories, we should have learned that we don’t actually need to be innovators in the sense of creating completely new things like the Internet in order to benefit from innovation. However, if we pay attention, we might be able to completely disrupt our markets while not even really changing much about how we do business. If, like Lee Valley, Sears had embraced the Internet as an alternative way to put out their catalogue and take orders, almost nothing about their actual business would have changed, but it’s likely that Amazon would not be the powerhouse they are today and Sears would not be closing stores, trying to reposition themselves in the market, and just generally sliding off into irrelevance.

In short, and this is my real thesis, you don’t need to be the Steves doing whatever they were doing that led to Apple being the company we see today. No, you don’t need to be one of the Steves, you just need to pay attention to what the Steves are up to and see how that might affect you. Better, you should be talking to them to see if they have any idea how what they’re doing might affect you.

IBM had already transformed themselves from an office equipment manufacturer and supplier to one of the leading technology companies of the era, so they knew more than a little bit about innovation and how to foster it. They employed people like theoretical mathematician Benoit Mandelbrot, he of the Mandelbrot Set and formaliser of fractal geometry, to do whatever it was he was doing. It’s a safe bet that approximately nobody at IBM including C-level executives understood his work, but you can be sure that they were aware of what he was doing and trying to find ways to profit from it. They knew from experience that if they hired enough smart people and gave them the freedom to do strange and unusual things, enough of it would eventually more than pay for itself. When IBM saw what Apple and other upstarts were doing, they recognised an important new market. In very short order, they had their engineers on the case and created the IBM-PC. PC stood for personal computer and it was marketed to homes and small businesses. They didn’t invent the personal computer, they just paid attention to those who did.

Some large businesses already had computerised systems, probably just in accounting, but a few people were actually crazy enough to bring in their own PCs or spend their own money on PCs for their departments or their jobs. Most of that was shut down quickly and forcefully, but a few companies allowed it as long as the job still got done and a rare few actually supported those initiatives by allowing the use of department funds for the purchase of PCs.

Let me repeat that.

Some people were actually crazy enough to bring in their own PCs or spend their own money on PCs for their departments or their jobs. Most of that was shut down quickly and forcefully, but a few companies allowed it as long as the job still got done and a rare few actually supported those initiatives by allowing the use of department funds for the purchase of PCs. If that is not a fringe activity, nothing is. Those are probably the two most important sentences in this essay. Remember them, because I’ll be coming back to them.

Smartphones. A lot of people have them. What is the one place where smartphones are frowned upon and frequently prohibited? Work. Yes, some industries and some companies have been making limited use of them, but in general, bringing a smartphone to work is like bringing a football. It’s not really a problem, but it better stay in your locker during work hours. One of the most common complaints I hear from employers and managers is how difficult it is to find a young worker who isn’t glued to their phone.

Remember these sentences?

Some people were actually crazy enough to bring in their own PCs or spend their own money on PCs for their departments or their jobs. Most of that was shut down quickly and forcefully, but a few companies allowed it as long as the job still got done and a rare few actually supported those initiatives by allowing the use of department funds for the purchase of PCs. Instead of reading “personal computer” everywhere you see “PC”, try reading “pocket computer”. To a very good approximation, everyone under 30 is carrying an Internet-connected pocket computer everywhere they go and most businesses so severely curtail their use that they might as well be left at home. How about this instead: everybody with a pocket computer—okay, smartphone—is allowed to use it as they see fit as long as two conditions are met: The employee submits to productivity monitoring of some kind. Most successful businesses are already pretty good at monitoring productivity, so ‘submitting’ to it is already a given if you want a job at all.

The employee has to sit down once a month to explain how they are using the smartphone to improve efficiency, effectiveness, or just simply make the job more pleasant. It’s time for a couple of thought experiments. First, imagine that you own a restaurant and that you are choosing to allow unrestricted use of smartphones under those conditions. If you are having trouble imagining the outcome, try answering the following questions:

How long will it take for the wait-staff to use their smartphones instead of the little order pads and pens so that they can text the orders to the kitchen? How will that affect costs? How will that affect efficiency, given that the only time they go to the kitchen is to check on and pick up orders?

How long until the kitchen staff starts texting the wait-staff for clarification and orders ready to be picked up? Again, how will that affect efficiency and effectiveness? How long until regular customers start texting their favourite wait-staff to book a reservation? How long until wait-staff start texting regulars when a favourite meal is selected as the daily special?

How long until the customers start placing their own orders instead of waiting for someone to come to their table? How does that merger of take-out and in-house orders affect your business? Does that even work at all? If it just causes pandemonium, can you find a way through the mess? How long until one of the staff or a friend creates a simple app to streamline the system that evolves?

What happens if you are the only restaurant in town doing that? Will you drive customers away? Will you attract new customers? Assuming it attracts more customers than it drives away, is it more profitable to keep it to yourself or to market the system to other restaurants, including competitors? What happens if everyone starts doing it? Can you create or maintain a successful restaurant business by catering to those who value the personal service of a waiter or waitress?

San Francisco was basically ground-zero for Google Glass. The general public either ridiculed or feared the technology. People wearing Google Glasses were called ‘glassholes’. They were routinely banned from restaurants, bars, and other private places where the public mingled. Critically, they were also banned by most employers. Imagine your restaurant in San Francisco. What would have happened to business if you had welcomed that subculture? What would have happened to business if you had gone beyond merely welcoming them, but actually did things that took advantage of the fact that your customers were wearing Google Glasses? What would happen if you gave your staff Google Glasses and then had them spend time with those early adopters learning how to apply the technology to their jobs?

Now imagine that you have a manufacturing plant with an assembly line. How long until the welders are sending the drafting department photos of a design problem? Or letting inventory know of a shortage? Or a scheduled maintenance notice pops up in Google Glass when someone happened to look at a machine that was due for service? Or…

Everything I’ve said comes down to one simple thing. New things come from doing new things and those doing new things are usually part of one subculture or another. They are, in some way, on the fringes. As long as you keep doing things the same way, you will get the same results. You might think that’s just fine and the way things should be, but don’t be surprised if something comes along to destroy it. If you find a way to let people find new ways of doing things at any level, you will have a much better chance of surviving sweeping changes in whatever industry you are in.

For example, if you are a truck driver or own a trucking company, you better be thinking about what happens when the truck itself can drive itself from the outskirts of one city to the outskirts of another. It will look like a slow start because it will be limited to restricted access divided highways in places with no snow, but once it starts, it will transform the industry in a decade or two. Think about where computers were in 1984. Basically governments, very large companies and a few basements and schools had computers and they were used only in limited ways for specific purposes. Importantly, those with a computer at home were routinely ridiculed or at least dismissed. A decade later, almost every shipping and receiving department had a computer, as did virtually every trucking company, big and small. The Internet was just starting to make itself known to the general public, again soliciting ridicule and dismissal. How about adding another decade? By 2004 even grandma was at least thinking about using email and it was basically the end of the road for any employee unwilling to learn how to use a computer. As with computers, truckers will definitely not be the only ones affected by self-driving vehicles, so who in your company is thinking about those things and what are you doing to support them in helping your business survive?

If you’re a manufacturer, there is a pretty good chance that you have at least one employee who is part of a ‘maker’ club. Can you offer space or equipment? Can you pay the membership fees as a benefit to any employee who is interested in joining? If you got stopped at wondering what a maker is, then I recommend a quick Internet search. In a nutshell, a maker is to manufacturing and robotics what hotrodders were to automobile repair and design. That’s where most of the real talent was. Whether you were an engineer with Ford, a mechanic at a local dealership, or just an accountant, if you had a passion for cars you were souping them up and making them over in ways that ultimately informed the whole industry. Just as automakers eventually learned to pay at least a little attention to what the hotrodders were up to, so should every manufacturer be paying attention to what the makers are up to.

What is the one thing that ties together all of these successes? Most of the innovation comes from the fringes. If you don’t know who Elon Musk is, you better learn, because if there ever was a fringe actor, he is, and he has transformed banking (PayPal), auctions and shopping (eBay), and is about to transform personal transportation (Tesla). The most important thing to realize about Elon Musk is that he is tapping into subcultures as a way to change the world. The electric car subculture is at least 30 years old, consisting primarily of people converting cars to electric and the few small businesses that they created to serve the other members. Musk is taking their success, refining it, and marketing it to other subcultures (environmentalists, performance driving enthusiasts, luxury driving enthusiasts, etc.) and generally forcing everyone to sit up and take notice. It’s fair to say that, along with Google, he is one of the leaders in self-driving technology. Unlike Google, who is primarily doing research, Musk is putting systems into place in cars that people can buy today. So go out there and find the people who are doing things you just don’t get. Talk to them. Find ways to support them. The current craze is Pokemon Go. Is there a way to use that to attract new business? I don’t know, but I bet the avid players know. What does the technology itself (augmented reality) tell you about what the future holds for your business? Given that it’s basically a database used to overlay relevant entries (in this case a cartoon character) onto the camera display and a way to interact with that data (character), I would guess there are a number of possibilities. Maybe pointing the camera at a machine in your shop will display the service history on that machine. Maybe someone will ultimately transform the factory floor by building the cameras and displays into glasses. (Oh, wait, someone already tried that with Google Glass, but the fringe group that experimented with them were just ‘glassholes’.) Anyway, I’m not the person to ask. The right person to ask is the employee whose job performance is suffering because they’re so engrossed in the game. Yes, you need to address the performance issue, but maybe it’s by getting them to think about what that technology could mean to their job and, by extension, to your company instead of just threatening them with dismissal or imposing an outright ban.

(Credit: This is a guest post by Ron Porter)

Faster dictionary decoding with SIMD instructions

A particularly fast and effective compression technique is dictionary coding. Intuitively, it works as follow. Suppose you are given a long document made of millions of words, but containing only 65536 distinct words. You can create a map from words to short integers or indexes (in [0,65536)). So the word “the” might be replaced by 0, the word “friend” by 1, and so forth. You then replace your document with an array of 16-bit integers. So you use only 16 bits per word.

In general, given a dictionary of size N, you only need ceil(log2(N+1)) bits to represent each word. Your dictionary can be implemented, simply, as an array pointers (using 64 bits per pointer).

It may help reduce memory usage if words are often repeated. But it can also speed up processing. It much faster for a processor to seek out a given integer in a flat array than it is to seek a given word.

You can also use nice tricks to pack and unpack integers very fast. That is, given arrays of 32-bit integers that fit in b bits, you can quickly pack and unpack them. You can easily process billions of such integers per second on a commodity processor.

In my example, I have used the notions of document and word, but dictionary coding is more often found in database systems to code columns or tuples. Systems like Oracle, Apache Kylin, and Apache Parquet use dictionary coding.

What if you want to reconstruct the data by looking it up in the dictionary?

Even if you can unpack the integers so that the processor can get the address in the dictionary, the look-up risks becoming a bottleneck. And there is a lot of data in motion… you have to unpack the indexes, then read them back, then access the dictionary. The code might look something like this…

unpack(compressed_data, tmpbuffer, array_length, b);
for(size_t i = 0; i < array_length; ++i) {
    out[i] = dictionary[tmpbuffer[i]];
}

Surely, there is no way around looking up the data in the dictionary, so you are stuck?

Except that recent Intel processors, and the upcoming AMD Zen processors have gather instructions that can quickly look-up several values at once. In C and C++, you can use the _mm_i32gather_epi64 intrinsic. It allows you to drastically reduce the number of instructions. You no longer need to write out the unpacked indexes, and read them back.

So how effective is it? The answer, unsurprisingly, depends on the size of the dictionary and your access pattern. In my example, I assumed that you had a dictionary made of 65536 words. Such a large dictionary requires half a megabyte. It won’t fit in fast CPU cache. Because dictionary coding only makes sense for when the dictionary size is less than the main data, it would only make sense for very large data. If you have lots of data, a more practical approach might be to partition the problem so have many small dictionaries. A large dictionary might still make sense, but only if most of it is never used.

I have implemented dictionary decoding and run it on a recent Intel processor (Skylake). The speed-up from the SIMD/gather approach is comfortably a factor of two.

Number of CPU cycles per value decoded
dictionary size (# keys)scalarSIMD (gather)
5123.11.2
10243.11.2
20483.11.2
40963.31.3
81923.71.7

2x is a nice gain. But we are only getting started. My Skylake processor only supports 256-bit SIMD vectors. This means that I can only gather four 64-bit values from my dictionary at once. Soon, our processors will benefit from AVX-512 and be able to gather eight 64-bit values at once. I don’t yet live in this future, so I put AVX-512 to the test on high-throughput Intel hardware (Knights Landing). Short story: you gain another factor of two… achieving a total speed-up of almost 4x over the basic code.

While the benefits are going to be even larger in the future, I should stress that benefits are likely much smaller on older processors (Haswell or before). For this work, technology is still fast evolving and there are large differences between slightly recent and bleeding-edge processors.

What is optimally fast on today’s hardware might be slow on tomorrow’s hardware.

Some relevant software:

Further reading:

Credit: Work done with Eric Daniel from the parquet-cpp project.

How many reversible integer operations do you know?

Most operations on a computer are not reversible… meaning that once done, you can never go back. For example, if you divide integers by 2 to get a new integer, some information is lost (whether the original number was odd or even). With fixed-size arithmetic, multiplying by two is also irreversible because you lose the value of the most significant bit.

Let us consider fixed-size integers (say 32-bit integers). We want functions that take as input one fixed integer and output another fixed-size integer.

How many reversible operations do we know?

  1. Trivially, you can add or subtract by a fixed quantity. To reverse the operation, just flip the sign or switch from add to subtract.
  2. You can compute the exclusive or (XOR) with a fixed quantity. This operation is its own inverse.
  3. You can multiply by an odd integer. You’d think that reversing such a multiplication could be accomplished by a simple integer division, but that is not the case. Still, it is reversible. By extension, the carryless (or polynomial) multiplication supported by modern processors can also be reversible.
  4. You can rotate the bits right or left using the ror or rol instructions on an Intel processor or with a couple of shifts such as (x >>> (-b)) | ( x << b)) or (x << (-b)) | ( x >>> b)) in Java. To reverse, just rotate the other way. If you care about signed integers, there is an interesting variation that is also invertible: the "signed" rotate (defined as (x >> (-b)) | ( x << b)) in Java) which propagates the signed bit of two's complement encoding.
  5. You can XOR the rotations of a value as long as you have an odd number of them.
  6. You can compute the addition of a value with its shifts (e.g., x + ( x << a) ). This is somewhat equivalent to multiplication by an odd integer.
  7. You can compute the XOR of a value with its shifts (e.g., x ^ ( x >> a) or x ^ ( x << a) ). This is somewhat equivalent to a carryless (or polynomial) multiplication.
  8. You can reverse the bytes of an integer (bswap on Intel processors). This function is its own inverse. You can also reverse the order of the bits (rbit on ARM processors).
  9. (New!) Jukka Suomela points out that you can do bit interleaving (e.g., interleave the least significant 16 bits with most significant 16 bits) with instructions such as pdep on Intel processors. You can also compute the lexicographically-next-bit permutation.

You can then compose these operations, generating new reversible operations.

Pedantic note: some of these operations are not reversible on some hardware and in some programming languages. For example, signed integer overflows are undefined in C and C++.

Let us talk about the Luddite problem…

This morning I woke up to an interview on the radio (yes, I still have a radio somehow) with pharmacists who try to fulfill prescriptions by mail. This is 2016. I can probably get almost all the chemical compounds necessary to create most drugs delivered to my home from China… but I need to go wait in line if I need penicillin. For the most part, we fulfill prescriptions the way we did in the 1970s.

Medicine is largely held back to the 1970s in general. I cannot help but to gasp at doctors who proudly display their stethoscope… straight out of the nineteenth century.

Sometimes, as is the case in schools, we put a veneer of novelty… For example, many school programs proudly include a tablet these days… Oh! Come on! Strapping on a tablet to an obsolete system is not going to modernize it.

My own school, up to a few months ago, had the following “online enrollment system”. You clicked and it gave you a PDF which you printed and sent by mail. That was “online” to them because (gulp!) the form was available as a PDF online. I can afford to mock my employer because they have since fixed this silly issue… while many other schools have not.

So, evidently, we are openly refusing to move in the future… And I think it draws on public opinion. If most people thought it ridiculous that a stethoscope is the state-of-the-art in 2016, most doctors would quickly upgrade, forced to do so by social pressure.

The reason doctors still have nineteenth-century stethoscopes is the same reason we keep lecturing… even if we know that a lecture is a highly inefficient pedagogical approach (barely better than nothing).

It is what people want, it is what people expect. Many people fear what is different, new… even if they won’t admit their fear. Or rather, they prefer to paint their fear as “caution”. (Because having doctors use anything but nineteenth-century stethoscopes would be incautious!)

Recently, we have made progress regarding gene editing. It is, of course, a potential cure for tragic genetic diseases. It is a potent tool against cancer… we can take immune cells, tweak them and reinject them… to fight off currently incurable cancers. It could also be part of a cure for the diseases of aging… tweak some of your cells and instead of being a weak 75-year-old who walks with a cane and can barely carry grocery bags… you can be a 75-year-old who is as strong as a 30-year-old and just as unlikely to fall…

Yet 68% of Americans and worried about gene editing. They are not excited, they are worried.

So what is going on?

There is widespread ignorance about science and technology. For example, we have people with computer chips in their head right now to alleviate Parkinson’s symptoms. Many of us have elderly family members with cochlear implants, but most people don’t understand that these are computers hooked up to people’s brain… Sex change is a common thing nowadays, some surgery and hormone therapy does the trick. You do not need a vagina or even a penis to procreate… we have had in vitro fertilization since the 1970s… it is quite common today. People don’t understand the evolution of agriculture, they know little about pesticides… They don’t bother learning about the issues, they just want reassuring labels.

But ignorance only explains so much of the fear…

There is also a very strong Luddite agenda backed by what I call “nature worship”.

There is a vast collection of religious nuts and nature worshipers who believe that human beings should not meddle in the order of things. That they have no right to do so… And if they do, bad things will happen.

It is surprising how many echoes of nature worship we find in the media. For example, in the latest Start Trek movie, one of the character finds some biological technology that allows him stop aging and to make himself stronger. Well, he is a vampire. So we are to believe that centuries from now, human beings will cross the chasm between stars in starships… but we will be biologically identical to what we are now. We won’t be stronger. We won’t have better endurance. We will still age and need glasses. We won’t run faster or punch harder. Soldiers on the battlefield will be like us but with “phasers” instead of guns. And if someone tries to enhance himself… well… he must be some kind of evil vampire, right?

Here is the truth: Nature is only plentiful after we harness it. Almost everything we eat today was heavily modified by human beings… from the meat to every single vegetable you can think of. It is “unnatural” to drink cow milk. It is “unnatural” to have vast fields of genetically modified wheat that we turn into bread. (Hint: none of you has eaten the original wheat unmodified by human intervention.) It is natural to die of infection before the age of 5. If you want to know “what nature intended”, then maybe it is a raving bunch of starving and infected human beings…

Human technologies are only “unnatural” if you choose to define it this way. In truth, human beings emerged out of natural selection. From the very beginnings, human beings rose above other species. We cook our food (that’s “unnatural”!) which allows us to gain a lot of calories from food we could barely subsist from in the past…

Still. It was a close one. Nature is brutal. We are the only surviving human species. All others were wiped out. We were nearly wiped out… we are all descendants of a few thousand survivors.

You can be quite certain that we only survived because we made the best out of our brain and technology.

I think that what is natural for human beings is to develop ever better technology to improve the condition of mankind. And, yes, this involves creating artificial blood, putting chips in people’s brain to keep them from shaking, editing their genes so that they don’t have to live in a bubble… finding ways to regrow limbs and reverse aging.

This is the very nature of human beings… spiders create silk… we create technology… When human beings decide to forgo technology, they are like birds who decide to forgo flight…

By forgoing technology, we forgo our very existance. Without technology we would not survive. It is part of us just like silk is part of the spider.

Even knowledgeable people who are not nature worshipers often oppose technology by default. These people adopt the precautionary principle. In effect, they say that new technologies expose us to untold danger… and that we should stick with the tested and true.

It sounds like a reasonable point of view… but it is also very dangerous even if it sounds “prudent”. Back in the 1960s, we underwent the “green revolution”. Before this technological revolution, there were very serious concerns that millions would soon starve. We simply did not have the technology to feed 6 or 7 billion people. Then we exposed rice, corn, wheat to radiation, creating mutant seeds… in effect, accelerating evolution. Today, all of us, including those who eat “organic food” are fed from these mutant seeds. To go back, you would need to first get rid of billion of people. Now… that wasn’t the first time we used “risky” technology to save millions. At the turn of the twentieth century, we adopted chemical fertilizers… without this, millions more would have died.

So the precautionary principle would have led to the death of millions of us. Not so prudent, eh?

In the near future, there will be 10 billion people on Earth, if we develop the technology to feed them… or untold numbers will starve. Deciding that the current technology is good enough may very well comdemn millions to death and starvation… Is it prudent?

Today, hundred of thousands of people a day die of age-related diseases. Finding a cure for these diseases would be as important, if not more, than the green revolution. It involves tinkering with our fundamental metabolism. It might require gene editing, stem-cell technologies, having computer chips embedded in our bodies…

Each time we push forward, some people feel that we are getting “less natural”. They think that, surely, there is a limit… Of course, there are limits, but holding back technology has a tremendous cost. If you say “oh… let us pass on gene editing… it sounds dangerous…”, then you are condemning millions to die and suffer.

But what about this limit that we will reach, where the Earth would disintegrate, maybe because there are too many people on it…?

It helps to have knowledge. There are more forests today in Europe than there were centuries ago. In fact, as farming becomes more efficient, we are able to relinquish more land to the wild. By artificially maintaining our productivity low (as with “organic agriculture”), we are stuck having to use all of the available land.

If not for immigration from poorer countries, most of the advanced countries (Europe, North America, Japan) would have falling population. If you are worried at all about overpopulation, then you need not to look at where technology is plentiful, but where it is lacking: in rural Africa…

If we could, somehow, double the life expectancy of human beings, in time the population of the advanced countries would resume their decline… because it is fecundity and not longevity that drives population.

But should you even be worried about overpopulation? That’s doubtful. Some of the richest places on Earth are also the most densely populated. People are not better off, healthier or smarter in the middle of nowhere.

And, though it is currently unfeasible, it seems clear that it is only a matter of time before we start populating space and the oceans. Our bodies cannot sustain life in space nor can we swim unaided at the bottom of the ocean, but our descendants will be better, stronger…

Technology is natural for human beings. Luddites are spiders refusing to use their silk.

Combine smart people with crazily hard projects

Back in college, professors assigned crazily hard problems… and I was forced to talk with my peers to figure out how they fared… and eventually teaming up with some of them. The same pattern has repeated itself again and again in my life: smart people and really hard problems. That’s how I am able to learn new things…

It is a powerful mix. Smart people challenge you while simultaneously allowing you to build up your confidence as you eventually match their expertise. Hard problems force you to fail, which is how you learn.

Neither of these things is particularly scarce today. There are billions of people connected to the Internet… many of them much smarter than you… yet all of them a few bits away from you… And we have more hard problems at our disposal than ever before in the history of humanity…

I think that smart people meeting hard problems are more than just a learning opportunity… I think that’s how civilization moves forward.

Sadly, however, I fear that this potent mix is too often absent:

  • Most office work is organized so as to push hard problems, and failure, at the margin. The only kind of failures that are routinely present as “zero-sum failures”. Maybe the budget this year is fixed, and so only some projects will be funded… maybe your project will fail to secure funding. You will have failed, but not as a result of working on a difficult problem… it is “artificial failure”: you did not even try.
  • Smart people are often also pushed at the margins. Smart people are those who can make a dent on hard problems… but if the hard problems are pushed at the margin, then who needs them? When employers come to see me, a college professor, they never ask for “the smartest student I know”, they ask for “a good programmer”. They never say that they have hard and interesting problems to offer them… they talk about being able to offer them good salaries…

You might think that colleges would act as a safe haven… but I fear that they often do not. To succeed in academia, it helps to appear smart… but tackling hard problems is entirely optional. In fact, the secret for getting a research grant is… propose to do something you already know how to do… That’s not just a theoretical concern of mine. Just last week, I was in a meeting with a smart young professor. She explained to us, quite openly, that she intentionally did not include an important but difficult idea from her last research grant application… The implication was clearly that since it was a hard problem she did not know how to tackle, it would stack the odds against her to include it as a target. The net result is that grant applications are often conservative, if not openly boring. They must appear new and exciting… but the underlying problems must be easy if not already solved.

So where do you find smart people working on hard problems? I don’t think you find them in managed environment… I think you find them in the cracks…

And I think that it contributes to making the future really hard to predict. The future is often literally being built by friends in a dark garage… it does not get planned by Wall Street or the government.

Common sense in artificial intelligence… by 2026?

Lots of people want to judge machine intelligence based on human intelligence. It dates back to Turing who proposed his eponymous Turing test: can machines “pass” as human beings? Turing, being clever, was aware of how biased this test was:

If the man were to try and pretend to be the machine he would clearly make a very poor showing. He would be given away at once by slowness and inaccuracy in arithmetic. May not machines carry out some-thing which ought to be described as thinking but which is very different from what a man does?

I expect that we will eventually outgrow our anthropocentrism and view what machines really offer: a new kind of intelligence.

In any case, from an economics perspective, it matters a great deal whether machines can do exactly what human beings can do. Calum Chace has published a new book on this topic: The Economic Singularity, Artificial intelligence and the death of capitalism. Chace’s excellent book in the latest in a stream of books hinting that we may soon all be unemployable simply because machines are better than us at most jobs.

To replace human beings at most jobs, machines need to exhibit what we intuitively call “common sense”. For example, if someone just bought a toaster… you do not try to sell them another toaster (as so many online ad systems do today).

Common sense is basic knowledge about how the world of human beings works. It is not rule-based. It is not entirely logical. It is a set of heuristics almost all human beings quickly acquire. If computers could be granted a generous measure of common sense, many believe that they could make better employees than human beings. Whatever one might think about economics, there is an interesting objective question… can machines achieve “common sense” in the near future?

It seems that Geoff Hinton, a famous computer scientist, predicted that within a decade, we would build computers with common sense. These are not computers that are smarter than all of us at all tasks. These are not computers with a soul. They are merely computers with a working knowledge of the world of human beings… computers that know our conventions, they know that stoves are hot, that people don’t usually own twelve toasters and so forth.

Chace recently placed a bet with a famous economist, Robin Hanson, that Hinton is right at 50-to-1 odds. This means that Hanson is very confident that computers will be unable to achieve common sense in the near future.

Hanson is not exactly a Luddite who believes that technology will stall. In fact, Hanson has also an excellent book, the Age of Ems that describes a world where brains have been replaced with digital computers. Our entire civilization is made of software. I have covered some of the content of Hanson’s book on my blog before… for example, Hanson believes that software grows old and becomes senile.

I think that both Hanson and Chace are very well informed on the issues, but they have different biases.

What is my own take?

The challenge for people like Chace who allude to an economic singularity where machines take over the economy… is that we have little to no evidence that such a thing is coming. For all the talks about massive unemployment coming up… the unemployment rates are really not that high. Geoff Hinton thinks that machines will soon acquire common sense… and it looks like an easy problem? But we have no clue right now how to go about solving this problem. It is hard to even define it.

As for Hanson’s, the problem is that betting against what we can do 10 years in the future is very risky. Ten years ago, we did not have iPhones. Today’s iPhone is more powerful than a PC from ten years ago. People at the beginning of the century thought that it would take a million years to get a working aeroplane, whereas it took a mere ten years…

I must say that despite the challenge, I am with Chace. At 50-to-1 odds, I would bet for the software industry. The incentive to offer common sense is great. After all, you can’t drive a car, clean a house or serve burgers without some common sense. What the deep learning craze has taught us is that it is not necessary for us to understand how the software works for the software to be effective. With enough data, enough computing power and trial and error, there is no telling what we can find!

Let us be more precise… what could we expect from software having common sense? It is hard to define it because it is a collection of small pieces… all of which are easy to program individually. For example, if you are lying on the floor yelling “I’m hurt”, common sense dictates that we call emergency services… but it is possible that Apple’s Siri could already be able to do this.

We have the Winograd Schema Challenge but it seems to be tightly tied to natural language processing… I am not sure understanding language and common sense are the same thing. For example, many human beings are illiterate and yet they can be said to have common sense.

So I offer the following “test”. Every year, new original video games come out. Most of them come with no instruction whatsoever. You start playing and you figure it out as you… using “common sense”. So I think that if some piece of software is able to pick up a decent game from Apple’s AppStore and figure out how to play competently within minutes… without playing thousands of games… then it will have an interesting form of common sense. It is not necessary for the software to play at “human level”. For example, it would be ok if it only played simple games at the level of a 5-year-old. The key in this test is diversity. There are great many different games, and even when they have the same underlying mechanic, they can look quite a bit different.

Is it fair to test software intelligence using games? I think so. Games are how we learn about the world. And, frankly, office work is not all that different from a (bad) video game.

Accelerating PHP hashing by “unoptimizing” it

Hashing is a software trick that can map strings to fixed-length integers, such as 32-bit integers. It is ubiquitous in modern software.

Languages like Java and PHP have the ability to store strings with their corresponding hash values. Still, the hash value must be computed at least once.

How much of a burden can this be? Suppose that we use 10 cycles per byte to hash a string. For a long 100-kilobyte string, that would be about a million CPU cycles. If your CPU runs at 2 GHz, you have 2 billion cycles per second. Hence, hashing your string should take no more than half a millisecond. Put another way, you can hash 2000 such strings per second.

Simon Hardy-Francis pointed out to me that this can still represent a performance bottleneck if your PHP application needs to repeatedly load large new strings.

So what does PHP use as a hash function? It uses fundamentally the Java hash function, a simple polynomial hash with an odd multiplier… (coprime with 2)

for (int i = 0; i < len; i++) {
  hash = 33 * hash + str[i];
}

(Java multiplies by 31 instead of 33 but it is the same idea.)

A polynomial hash function with an odd multiplier is found everywhere and has a long history. It is the hash function used by the Karp-Rabin string search algorithm.

As I have pointed out in another post, for better performance, you want to unroll this function like so…

for (; i + 3 < len; i += 4) {
   h = 33 * 33 * 33 * 33 * h 
       + 33 * 33 * 33 * str[i] 
       + 33 * 33 * str[i + 1] 
       + 33 * str[i + 2] 
       + str[i + 3];
}
for (; i < len; i++) {
   h = 33 * h + str[i];
}

The reason this might help might be that it breaks the data dependency: instead of having to wait for the previous multiplication to finish before another one can be issued, you can issue one new multiplication per cycle for up to four cycles in a row. Unrolling more might accelerating the code further.

The PHP developers implement the hash function with an extra optimization, however. Crediting Bernstein for the idea, they point out that…

the multiply operation can be replaced by a faster operation based on just one shift plus either a single addition or subtraction operation

It is true that a shift followed by an addition might be slightly cheaper than a multiplication, but modern compilers are quite good at working this out on their own. They can transform your multiplications by a constant as they see fit.

In any case, so the PHP implementation is an optimized version of the following…

for (int i = 0; i < len; i++) {
  hash = ((hash << 5) + hash) + str[i];
}

The code is actually quite a bit more complicated because it is heavily unrolled, but it is algorithmically equivalent. Their code strongly discourages the compiler from ever using a multiplication.

So are the PHP developers correct? Should we work hard to avoid multiplications in C using Bernstein’s trick? Let us put this theory to the test on a recent x64 processor. As usual, my code is available.

Polynomial hashing (cycles per byte) on Intel Skylake
PHP (no multiplier)PHP (with multiplier)
2.351.75

The multiplication-free PHP approach is 33% slower! Gregory Pakosz pointed out that you can do even better by unrolling the version with multiplier further, reaching 1.5 cycles per byte.

Embedded processors with slow multiplications might give different outcomes. But then, where do you expect PHP processes to run? Overwhelmingly, they run on Intel processors produced in the last ten years… and these processors have fast multipliers.

So I think that the PHP developers are leaving performance on the table. They could easily optimize the computation of the hash function without changing the result of the function. What is more, the code would be more readable if they left the multiplications! If you need to multiply by 33, just do it the simplest possible manner! If it is cheaper to do a shift, the compiler can probably figure it out before you do. If you do not trust your compiler, then, at least, run benchmarks!

Let us look at the larger issue. How fast are 1.5 or 1.75 cycles per byte? Not very fast. Google’s CityHash uses about 0.25 cycles per byte whereas the state-of-the-art CLHash uses about 0.1 cycles per byte on recent Intel processors. So with a more modern algorithm, PHP developers could multiply the speed of their hash functions… but that’s for another day.

Augmented reality becomes mainstream

My go-to reference lately about the near future has been the 2006 novel Rainbows End by Vernor Vinge. The novel is set in 2025 and the author depicts a world where augmented reality is ubiquitous. Kids still go to school, but instead of presenting the working of a turbine using a PowerPoint deck, you can make a working turbine appear out of thin air in the classroom.

Augmented reality is the addition of a layer to the world using computing. One powerful and ubiquitous tool provided by modern computing is GPS: many of your devices can tell where they are on the planet within a few meters (and sometimes better). It has been used for gaming for many years. For example, I have played geocaching, Munzee, Ingress… and yesterday I played Pokémon Go.

Pokémon Go differs from the previous GPS-based games because of its massive popularity. Though the game has been released for barely a week, journalists estimate that it has 10 million users worldwide.

Some will object that Pokémon Go is not “really” an augmented reality game. Indeed, though it projects a small animal onto the image of your smartphone camera to give you the illusion that the animal is right there… the underlying software is not based on computational vision. It would simply not be possible in 2016… but all that matters to the player is that it “works”, it is convincing.

In comparison, I have put on Microsoft’s Hololens headset… It is considered to be “true” augmented reality in the sense that it realistically projects objects on top of your normal view… you can tilt your head and the object stays put. But playing Pokémon Go with Microsoft’s Hololens would be a miserable experience. For one thing, nobody would want to walk around in the street with a bulky headset. And it is debatable whether the Hololens projections feel more real than a Pokémon in Pokémon Go.

I don’t know how long Pokémon Go will thrive. Will it still be around in a year? Who knows? What really matters is that millions of people have now experienced the taste of augmented reality. There is no turning back.

The race is on to produce more convincing augmented reality hardware and software.

And this puts us right on track for the future described by Rainbows End.

Why does it matter? In the long run, augmented reality represents another pathway to extend human abilities.

Virtual Reality: First impressions with the HTC Vive

I just got my hands on some virtual-reality (VR) goggles. Specifically, we have an “HTC Vive“. We are still in the early days of VR and given that these goggles cost about a $1000, not everyone will get to try them outside a demo room. I thought I’d share my impressions.

  • There are indications that the HTC Vive needs a powerful PC. Because I am disorganized, I ended up with the HTC Vive goggles, but no corresponding powerful gaming PC. So I used what I had: my son’s gaming PC. A 5-year-old box. It fails to meet the “minimum” requirements set by HTC, but at no point did we ever encounter any performance problem. To be fair, I did not try to run any demanding game… simply because I have none to test… Still, it seems to me that the belief that VR requires very powerful machines might be overstated.
  • The HTC hardware is fantastic. It looks good and it is sturdy. I am sure that it will all look ridiculous in a few years, but it is quite usable today. It feels good. The HTC Vive comes with two great controllers.
  • Setting up the HTC Vive is a bit harder than just putting on the goggles. You need to setup a room with sensors at each end. Still, it is no big deal. The only annoying hardware issue we got was pairing the controllers with the system. It was a source of confusion. The hardest part was finding out where to click to pair the controller.
  • … which brings me to the software. The software is a bit flaky like most software tends to be. It looks good and it generally works, but once the server stopped responding and we had to “kill it” and another time, the demo would insist that we press the “system” keys whereas doing so never worked. Even so, the software is quite good already.
  • So how is the experience? Great. It simply works. Every demo I tried was convincing. It is just like I imagined it. Better than I imagined it in fact because my previous encounters (years ago) with VR were unconvincing.

So where do I see this going in the next couple of years?

  • The hardware is basically good enough. I am sure I will sound like a fool in five years when the current VR hardware looks obsolete, but I do not expect it to get a whole lot better, qualitatively speaking. What I do expect is that we will get cheaper versions that work nearly as well. Already, the upcoming Sony PlayStation VR is going to cost half as much as an HTC Vive.
  • Content is a problem right now. That is, you can get the goggles working, but you are left feeling that there ought to be something more interesting to do with them… What I hope we will see is an explosion of new applications and games.

What is next for me? I am getting a Sony PlayStation VR for my home. I was still slightly on the fence, but playing with the HTC Vive convinced me that the hardware was mature enough.

In time, I want to setup the HTC Vive so that I can program my own prototypes. As a scientist and engineer, I want to find out what else I can do with these goggles.

Fast random shuffling

In a random shuffle, you want to take the elements of a list and reorder them randomly. In a “fair” random shuffle, all possible permutations must be equally likely. It is surprisingly hard to come up with a fair algorithm. Thankfully, there is a fast and easy-to-implement algorithm: the Fisher-Yates shuffle. It is a rather intuitive algorithm and there are YouTube videos about it… so, I will just point you at a piece of C code:

for (i=size; i>1; i--) {
   int p = random_bounded(i); // number in [0,i)
   swap(array+i-1, array+p); // swap the values at i-1 and p
}

What can we expect to limit the speed of this algorithm? Let me assume that we do not use fancy SIMD instructions or parallelization.

If the input array is not in the cache, and we cannot fetch it in time or it is just too large, then cache faults will dominate the running time. So let us assume that the array is in the CPU’s cache.

If we have N input words, we go through the loop N – 1 times. At each iteration of the loop, you need to read two values and write two other values. A recent x64 processor can only store one value to memory per cycle, so we cannot do better than two cycles per input word. In the very next iteration, you may need to read one of the recently written values. So, two cycles per input word is probably optimistic.

What else could be the problem? The generation of the random numbers could hurt us. Let us assume that we are given a random number generation routine that we cannot change. For this blog post, I will stick with PCG.

What remains? Notice how the Fisher-Yates shuffle requires numbers in a range. The typical techniques to generate random numbers in a range involve frequent divisions.

For example, you might want to look at how the Go language handles it:

func (r *Rand) Int31n(n int32) int32 {
	max := int32((1 << 31) - 1 - (1<<31)%uint32(n))
	v := r.Int31()
	for v > max {
		v = r.Int31()
	}
	return v % n
}

This function always involves two divisions. Java, the PCG library… all involve at least one division per function call, often many more than one. Sadly, divisions are many times more expensive than any other operation, even on recent processors.

In an earlier blog post, I showed how to (mostly) get around divisions.

In general, no map from all 32-bit integers to a range can be perfectly fair. In practice, the effect is quite small unless your range is close to the maximal value of an integer. Thus you can simply use the following function:

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  return multiresult >> 32;
}

Maybe you feel bad about introducing a slight bias. You probably should not since the random-number generation itself is unlikely to be perfect.

Still, we can correct the bias. Recall that some of the values are mapped ceil(4294967296/range) times whereas others are mapped floor(4294967296/range) times. By sometimes redrawing a new random value, we can avoid entirely the bias (this technique is called rejection sampling):

uint32_t random_bounded(uint32_t range) {
  uint64_t random32bit =  random32(); //32-bit random number 
  multiresult = random32bit * range;
  leftover = (uint32_t) multiresult;
  if(leftover < range ) {
      threshold = -range % range ;
      while (leftover < threshold) {
            random32bit =  random32();
            multiresult = random32bit * range;
            leftover = (uint32_t) multiresult;
      }
   }
  return multiresult >> 32;
}

This looks quite a bit worse, but the “if” clause containing divisions is very rarely taken. Your processor is likely to mostly ignore it, so the overhead of this new function is smaller than it appears.

So how do we fare? I have implemented these functions in C, using them to compute a random shuffle. Before each shuffle, I ensure that the array is in the cache. I report the number of clock cycle used per input words, on a recent Intel processor (Skylake). As usual, my code is available.

Random shuffle timings, varying the range function
range functioncycles per input word
PCG library18.0
Go-like20.1
Java-like12.1
no division, no bias7
no division (with slight bias)6

Avoiding divisions makes the random shuffle runs twice as fast.

Could we go faster? Yes. If we use a cheaper/faster random number generator. However, keep in mind that without SIMD instructions or multi-core processing, we cannot realistically hope to reach the lower bound of 2 cycles per input words. That is, I claim that no function can be 3 times faster than the fastest function we considered.

You can save a little bit (half a cycle per input word) if you replace the 32-bit PCG calls by 64-bit calls, processing input words in pairs. Using SIMD instructions, we could go even faster, but I do not have access to a SIMD-accelerated PCG implementation… We could, of course, revisit the problem with different random-number generators.