Simplistic programming is underrated

I was a nerdy kid who liked to spend a lot of time reading. Back then, we did not have the Internet, we could not even imagine it… so I ended up reading the dictionary, reading my encyclopedia. I had a weird vocabulary. I could easily string sentences that nobody around me could understand. Adults would reward me with special attention when I would use words that they did not know.

That was not good.

I also learned to program relatively young. Information was scarce but I quickly absorbed everything I could. When I learned about structured programming, I rejected anything “more primitive”. Then I learned about object-oriented programming, and it was the end of structured programming. Then I learned about functional programming and thought that I had once more reached another level. I learned about metaprogramming and became a fan. And so forth.

That was not good.

I still write using big words sometimes, but never intentionally. I try to write short and simple sentences.

It took me a long time to figure out that the same is true with programming. If you can write a program using nothing but the simplest syntax, it is a net win.

I should explain. It is absolutely true that if you deploy a larger vocabulary if you use longer, more pompous sentences, many people will think you are smarter. The same is true with programming. If you can cram metaprogramming, pure functional programming, some assembly and a neural network into one program, many programmers will be impressed by your skills.

However, there are important downsides:

  • Your apparent mental prowesses will fail to impress those who find it easy to do the same. I have met my share of college students and professors who excel at dropping the name of a few philosophers, at using words only 1% of the population knows about… but, at a certain level, it does nothing for them. People simply roll their eyes and move on… If you have a job interview with Jeff Bezos or Peter Thiel, quoting famous philosophers or using big words might very well backfire.

    Exactly the same argument works for programming. You might impress your peers with your fancy use of closures… but this no longer works so well on people who have known for a few decades what closure are. You simply won’t convince Linus Torvalds that you are hot because you use all of the features of the latest programming languages.

    If you are used to achieving success by appearing smart… you might hit a ceiling, and you won’t even understand what is happening. There is a difference between appearing to be smart, and being smart.

    Really smart people have no need to show off. If you are showing off, you are broadcasting that you aren’t really smart.

  • Big words and fancy programming techniques are asocial. They turn you into a jerk. Even the people who think that you are smart won’t enjoy working with you. We might be impressed by people who use big words, but we don’t want to hang out with them. It is annoying to collaborate with programmers who throw the big guns every little chance they get.
  • Complexity scales poorly. It is much easier to build on your previous work if it is simple. There is a reason we still teach Newton’s three laws. They are powerful because they can be expressed so simply. A simple piece of code that uses few features is easier to reuse.

I like the concept of “simplistic programming” by which I mean “programming that is so simple that people will criticize you for it”. At first, that sounds strange… can we really get criticized for being “too simple”? Of course, we do.

Science and Technology links (December 1st, 2017)

  1. Chollet has a piece on the impossibility of intelligence explosion. He is responding to the theory that smart machines will build even smarter machines, and that soon, human beings will become obsolete. He convincingly tears apart this theory. Human brains might be obsolete one day, but it is not as simple as setting up self-improving artificial intelligence in motion. He writes: we are our tools. He means that the model of intelligence as “brain in a jar” is hopelessly naive. I made much of the same points in a blog post published a week before Chollet’s essay: You are your tools. So to acquire more intelligence, we need to build more tools, and this takes time. Einstein would not have done so well on a deserted island. I do part company with Chollet when he writes:

    Yet, modern scientific progress is measurably linear. I wrote about this phenomenon at length in a 2012 essay titled “The Singularity is not coming”. We didn’t make greater progress in physics over the 1950–2000 period than we did over 1900–1950 — we did, arguably, about as well. Mathematics is not advancing significantly faster today than it did in 1920

    What Chollet fails to appreciate, I fear, is the nature of progress itself. If we have chairs in 1750, it is not the case that chairs will get exponentially more confortable and cheaper over time so that a chair in 2017 should be expected to nourrish all your muscles for free. Bacteria that existed long before there were multicellular organisms are still around. If you go see a play today, it is not “exponentially better” than a play organized by Molière. Antibiotics are probably cheaper than they were in 1945, but they are not necessarily even better. Progress is about enabling new things. That’s how the exponential comes about: the number of new things you get grows and grows over time… at an exponential rate. Chollet is a great programmer who has produced software tools that are then used by lots of other engineers to build better tools, and so forth. What is being built is not just “better software”… it is software that solves problems we could not solve before. Just as one way of doing things becomes optimal, new alternatives open up. That’s called open-ended evolution and it is definitively happening.

  2. The sales of virtual-reality headsets have exceeded 1 million in the third quarter of this year. Half of these headsets are PlayStation VR headsets. If you achieve 1 million units per quarter, then you are at four million units per year. I have a bet with Greg Linden that we will soon achieve 10 million units a year. We are almost half-way there, so I think I still stand a chance of winning if there are interesting developments (like great games) in the near future.
  3. CRISPR/Cas9 is a technique to edit genes discovered in 2012. It is now relatively cheap and it is likely to be used in a few years to correct genetic defects. It works like a search and replace function. I wondered how fast it is:

    it takes as long as six hours for Cas9 to search a bacterium, that is, through four million base pairs. (…) The results show that the price Cas9 pays for its flexibility is time, (…) To find the target faster, more Cas9 molecules searching for the same DNA sequence are needed. (…) Most proteins that search DNA code can recognize one specific sequence merely by sensing the outside of the DNA double helix, Cas9 can search for an arbitrary code, but to determine whether it is in the right place, the molecule has to open the double DNA helix and compare the sequence with the programmed code. The incredible thing is that it can still search the entire genome without using any energy.

    I am no geneticist, but as a computer scientist, the thought that it takes an hour to process about a million base pair has me concerned. Human beings have 3 billion base-pairs per cell. Thankfully, the problem can be parallelized: you can use several Cas9 molecules.

  4. It looks like reducing the amount of iron in our brains could be useful in preventing cognitive decline. It is going to be re-tested in the context of Alzheimer’s. Much of our food has been supplemented from iron. Indeed, actual iron extracted from the ground is mixed with our food (e.g., bread, cereal). We tend to accumulate iron because the body has a hard time excreting iron. Though we need some iron to be healthy, too much iron in some tissues could be harmful.
  5. In Switzerland, a quarter of all clinical trials are never completed, according to a recent article. The article reviews all factors involved in such a high failure rate. Note that these are not mere failures to get scientific results… they are failures to even complete the scientific work. Some of the factors involved are intriguing. For example, if you want to run clinical trials in multiple cities, then you need ethics approvals in each city. Then there is the set of personal incentives which favor competition between researchers rather than broad collaboration.
  6. As men grow older, their testosterone levels fall and this leads, among other things, in muscle loss and sexual difficulties. Magnan observes that some men maintain high levels of testosterone well into old age. My instinct would be to go see these old men and find out how exactly they differ. Is there a genetic component involved?
  7. Kagan believes that we give pills too eagerly to school-age kids. He believes that labelling kids as having a deficit disorder, when their dopamine levels are perfectly fine, is bad for the kids. They are likely to internalize that there is something wrong with them. An interesting point he makes is that the increase number of prescriptions has a financial origin: insurance is likely to cover the cost, so it is free to the parents and to the school. I don’t know why we expect kids to sit at desks all day while listening to what an adult has to say. It does not sound like fun at all!
  8. According to Papadimitriou, many of us may not be getting enough vitamin D:

    Since 2006, type 1 diabetes in Finland has plateaued and then decreased after the authorities’ decision to fortify dietary milk products with cholecalciferol. The role of vitamin D in innate and adaptive immunity is critical. A statistical error in the estimation of the recommended dietary allowance (RDA) for vitamin D was recently discovered; in a correct analysis of the data used by the Institute of Medicine, it was found that 8895 IU/d was needed for 97.5% of individuals to achieve values ≥50 nmol/L. Another study confirmed that 6201 IU/d was needed to achieve 75 nmol/L and 9122 IU/d was needed to reach 100 nmol/L. The largest meta-analysis ever conducted of studies published between 1966 and 2013 showed that 25-hydroxyvitamin D levels <75 nmol/L may be too low for safety and associated with higher all-cause mortality (…)

  9. The ancient Greeks proved that the only regular polygons that tile are triangles, quadrilaterals and hexagons (as now seen on many a bathroom floor). My wife asked: “how did they prove it?” I don’t know how the Greeks proved it. I lazily looked up the answer online. Here is the gist of it. If you have a regular tiling, you can look at the vertices where different polygons meet, the sum of the angles must be 360 degrees. Then you can look at the inner angles formed by the regular polygons. Hexagons have inner angles of 120 degrees, so three hexagons can meet at a vertex, and their inner angles sum up to 360 degrees. Polygons with more sides have larger inner angles… approaching a flat 180 degrees. Suppose, for example, that two polygons with more than 6 sides meet up with a common vertex, then you sum up the two inner angles and what remains is smaller than the inner angle of the polygons and smaller than 180 degrees. With such arguments, you can construct a formal proof. But that’s not how the Ancient Greeks did it, surely?
  10. In a clinical trial, it was found that consuming twice the recommended amount of proteins improved muscle mass. This suggests that we should eat more proteins.
  11. Doctors trust male surgeons more.

Bit hacking versus memoization: a Stream VByte example

In compression techniques like Stream VByte or Google’s varint-GB, we use control bytes to indicate how blocks of data are compressed. Without getting into the details (see the paper), it is important to map these control bytes to the corresponding number of compressed bytes very quickly. The control bytes are made of four 2-bit numbers and we must add these four 2-bit numbers as quickly as possible.

There is a related Stack Overflow question from which I am going to steal an example: given the four 2-bits 11 10 01 00 we want to compute 3 + 2 + 1 + 0 = 6.

  • How do we solve this problem in our implementation? Using table look-ups. Basically, we precompute each of the 256 possible values and just look them in a table. This is often called memoization. It works fine and a lot of fast code relies on memoization but I don’t find it elegant. It makes me sad that so much of the very fastest code ends up relying on memoization.
  • What is the simplest piece of code that would do it without table lookup? I think it might be
     (x & 0b11) + ((x>>2) & 0b11) + ((x>>4) & 0b11) + (x>>6). 
  • Can we get slightly more clever? Yes, aqrit and Kendall Willets came up with a fancier involving two multiplications:
    ((0x11011000 * ((x * 0x0401) & 0x00033033)) >> 28).

    The compiler might implement a product like x * 0x0401 into a shift and an addition. Nevertheless, it is not obvious that two multiplications (even with optimizations) are faster than the naive approach but it is really a nice piece of programming. I expect that most readers will struggle to find out why this expression work, and that’s not necessarily a good thing. (John Regher points out that this code has undefined behavior as I have written it. One needs to ensure that all computations are done using unsigned values.)

  • In Stream VByte, the control bytes are organized sequentially which means that you can use another fancy approach that processes four bytes at once:
    v = ((v >> 2) & 0x33333333) + (v & 0x33333333);
    v = ((v >> 4) & 0x0F0F0F0F) + (v & 0x0F0F0F0F);
    

    where the variable v represents a 32-bit integer. You could generalize to 64-bit integers for even better speed. It might be slightly puzzling at first, but it is not very difficult to work out what the expression is doing.

    It has the benefit of being likely to be faster than memoization, but at the expense of some added code complexity since we need to process control bytes in batches. There is also some concern that I could suffer from uneven latency, with the first length in a batch of four being delayed if we are not careful.

    We could modify this approach slightly to compute directly the sums of the lengths which could be put to good use in the actual code… but it is fancy enough as it stands..

I could imagine quite a few more alternatives, including some that use SIMD instructions, but I have to stop somewhere.

So how fast are these techniques? I threw together a quick benchmark to measure the throughput. I am using a recent (Intel Skylake) processor.

memoization1.7 cycles/byte
naive2.6 cycles/byte
aqrit-Willets3.1 cycles/byte
batch (32-bit)1.4 cycles/byte

Sadly, the aqrit-Willets approach, despite its elegance, is not always faster than the naive approach. The batch approach is fastest.

Because the batch approach could be made even faster by using 64-bit words, it would be my best choice right now to replace memoization if speed were my concern. It illustrates how there are potential benefits in a data layout that allows batch processing.

This microbenchmark reinforces the view that memoization is fast, as it does well despite its simplicity. Unfortunately.

Update: On Twitter, Geoff Langdale described a fast vectorized approach using SIMD instructions. An approach similar to what he advocates is described in the paper Faster Population Counts Using AVX2 Instructions.

Science and Technology links (November 24th, 2017)

Women earned majority of doctoral degrees in 2016 for 8th straight year and outnumber men in grad school 135 to 100.

Materialists use Facebook more frequently, because they compare themselves to others, they objectify and instrumentalize others, and they accumulate friends.

The modern office chair, with wheels, was invented by Charles Darwin. Or so says a Wikipedia article.

The famous Harvard professor Clayton Christensen says that half of all colleges are bound for bankruptcy. (I am skeptical.)

We can rather easily multiply the lifespan of worms. We might ask though whether such actions actually slow down aging, or just prevent death. The former is true. Researchers found enhanced organ functionality in older, long-lived mutants.

I have always been fascinated by how poor synthesized speech sounds. For all the progress made so far, Siri still sounds like a machine. DeepMind had demoed better sounding synthesized speech a few months ago, but it was impractical computationally. They have now announced that they have a computationally practical version. Thus you can safely predict that within a few short years, synthesized speech coming out of most your devices will pass the Turing test: you won’t be able to differentiate voice coming out of your smartphone from actual human voice.

Echoing Tyler Cowen, Tim Hartford report on research research that found that

Companies still invest heavily in innovation, but the focus is on practical applications rather than basic science, and research is often outsourced to smaller outfits whose intellectual property can easily be bought and sold.

If you think that the solution is more government research grants… please consider that the surest way to be denied a government research grant is to propose a project that has a good chance of failing. Failure must not be an option if you want your research grant to be a success. It is that simple: governments are risk averse. And probably rightly so… when governments take chances, things often end poorly.

The data is in: coffee is healthy.

I keep telling people that scholars routinely cite papers that they have never read. I often get incredulous looks. Well. A made-up article got almost 400 citations.

About half of all men (me included) will suffer from male-pattern baldness. We still don’t know exactly what causes it and we do not have a handy cure for it. We have now explained 38% of the risk using genetic analysis.

Professor Kambhampati is a pacifist that does not support the campaign to stop “killer robots”. I share many of his thoughts.

Stress is good: Mitochondrial stress enhances resilience, protects aging cells and delays risk for disease.

Bees can be left or right handed.

The sex of the mice handlers seems to make a difference in drug experiments. (I am skeptical.)

How often do superior alternatives fail to catch on?

Many of us rely on a Qwerty keyboard, at least when we are typing at a laptop. It is often said that the Qwerty keyboard is inferior to clearly better alternatives like the Dvorak keyboard. However, this appears to be largely a myth backed by dubious science.

There is the similarly often repeated story of VHS versus Betamax tapes, when people would record video on tapes. The often told story was that Betamax lost to VHS despite being technically superior. But VHS tapes could record whole 2-hour movies whereas Betamax could not: so VHS was indeed superior.

It is often said that birds have far superior lungs than mammals. So mammals are failures compared to birds… However, bats (which are mammals) have superior lungs than either terrestrial mammals or birds. This suggests that mammals can acquire better lungs when they need it.

I fear that many of the stories about us being stuck with inferior products due to market failures, or about animals being stuck with inferior organs due to evolutionary dead-ends, might actually be weak or false stories.

Credit: This post was inspired by an email exchange with Peter Turney.

You are your tools

I believe that there are no miracle people. When others get the same work done as you do, only much faster, they are almost surely using better tools.

Tools are not always physical objects. In fact, most tools are not physical per se. For example, mathematics is a great tool. Word processors are another tool. Google is also a tool.

Intellectuals have tools to help them be productive. They have books. They have computers. They have software. They also have models, frameworks, and theories.

For example, I studied Physics, so I learned about how physicists think… and it is not how most people think. They have these tricks which turn difficult problems into far easier problems. The main lesson I took away from Physics is that you can often take an impossibly hard problem and simply represent it differently. By doing so, you turn something that would take forever to solve into something that is accessible to smart teenagers.

To illustrate what I have in mind… most people who have studied mathematics seriously, even teenagers, can quickly sum up all numbers in a sequence. For example, what is the sum of the numbers between 1 and 99. That sounds hard? So maybe you can look up a formula online. Maybe. But once you know the “trick”, you can do it in your head, quickly, without effort. There is no miracle involved. To sum up the numbers between 1 and 99, just pair up the numbers. You pair 1 with 99, 2 with 98… and so forth, up to 49 and 51. So you have 49 such pairs, and each pair, sums up to 100 (99+1, 98+2,…). So you have 49 times 100 which is 4,900. Then you have to add the remaining number (50), so that the sum is 4,950.

We don’t know yet what intelligence is. It is not something as simple as how many neurons you host in your neocortex… Dolphins have more such neurons than you do. It is probable that, in time, we will see that what defines intelligence is our ability to build upon new tools.

For some reason, the smartest among us have access to better tools. And that’s ultimately why they can run circles around you and I.

They can’t easily transmit their tools. It takes work, but it tends to happen. A few hundred years ago, most people could not read and write. It was widely believed that most people could never learn to read and write. Until fairly recently (i.e., a handful of centuries), the ability to read was regarded as a very rare trait, a sure sign of high intelligence. We now expect even the dumbest kids in high school to read.

Summing up the numbers between 1 and 100 in your head was, no doubt, a great feat once upon a day. Today it is something that all kids in Singapore know how to do.

You should be constantly trying to expand the number of tools at your disposal. It is a particular version of the growth mindset: the belief that you should always seek to better yourself, by acquiring new tools.

You might reasonably ask… “I have whatever tool that I learned to use, and it is good enough for what I do usually. Why would I invest in learning something new if I don’t feel any urgent need to do so?”

My answer is that acquiring new tools is the surest way to get smarter.

Further reading: Stop Using Excel, Finance Chiefs Tell Staffs at the Wall Street Journal.

Relevant quotes:

We have become the tool of our tools. (Henry David Thoreau)

We shape our tools and, thereafter, our tools shape us. (John Culkin, also attributed to Marshall McLuhan)

Our tools are better than we are, and grow better faster than we do. They suffice to crack the atom, to command the tides, but they do not suffice for the oldest task in human history, to live on a piece of land without spoiling it. (Aldo Leopold)

Do relational databases evolve toward rigidity?

The Hanson law of computing states that:

Any software system, including advanced intelligence, is bound to decline over time. It becomes less flexible and more fragile.

I have argued at length that Hanson is wrong. My main argument is empirical: we build much of our civilization on old software, including a lot of open-source software.

We often build new software to do new things, but that’s on top of older software. Maybe you are looking at your smartphone and you think that you are using software built in the last 4 years. If you think so, you are quite wrong.

So it is not the case that old software becomes obviously less useful or somehow less flexible with time. Yet, to adapt to new conditions, old software often needs “rejuvenation” which we typically call “refactoring”. Old database systems like MySQL were designed before JSON and XML even existed. They have since been updated so that they can deal with these data types efficiently.

So old widely used software tends to get updated, refactored, reengineered…

Viewed at a global scale, software evolves by natural selection. Old software that cannot adapt tends to die off.

There has been a fair amount of work in software aging. However, much of the work is of an interested nature: they want to provide guidance to engineers as to when they should engage into refactoring work (to rejuvenate their software). They are less interested in the less practical problem of determining how software evolves and dies.

Software often relies on database tables. These tables are defined by the attributes that make them up. In theory, we can change these attributes, add new ones, remove old ones. Because open-source software gives us access to these tables, we can see how they evolve. Vassiliadis and Zarras recently published an interesting empirical paper on this question.

Their core result is that tables with lots of attributes (wide tables) tend to survive a long time unchanged. Thinner tables (with fewer attributes) die young. Why is that? One reason might be that wide tables covering lots of attributes tend to have lots of code depending on it. Thus changing these tables is expensive: it might require a large refactoring effort. Thus these wide tables tend to stick around and they contribute to “software rigidity”. That is, old software will accumulate these wide tables that are too expensive to change.

I believe that this “evolution toward rigidity” is real. But it is less of a general feature of software, and more of a particular defect of the relational database model.

This defect, in my view, is as follows. The relational model makes the important recognition that some attributes depend on other attributes (sometimes called “functional dependencies”). So if you have the employee identifier, you can get his name and his rank. From his rank, you might get his salary. From this useful starting point, we get two problems:

  1. Instead of simply treating these dependencies between attributes as first-class citizens, the relational model does away with them, by instead representing them as “tables” where, somehow, attributes need to be regrouped. So, incredibly, the SQL language has no notion of functional dependency. Instead, it has keys and tables. These are not the same ideas!

    Why did functional dependencies get mapped to keys and tables? Simply because this is a natural and convenient way to implement functional dependencies. So we somehow get that “employee identifier, name, rank” get aggregated together. This arbitrary glue leads to rigidity as more and more attributes get lumped together. You cannot reengineer just one dependency or one attribute, without possibly affecting a lot of code.

  2. Functional dependencies are nice, but far more inflexible and limited than it seems at first. For example, some people have more than one name. People change name, actually quite often. Some information might be unknown, uncertain. To cope with uncertain or unknown data, the inventor of the relational model added “null” markers to his model, and some kind of three-value logic that is not even consistent. In a recent paper with Badia, I showed that it is not even possible, in principle, to extend functional dependencies to a open-world model (e.g., as represented by disjunctive tables).

So I would say that relational databases tend to favor rigidity over time.

There are some counterpoints that may contribute to explain why the sky is not falling despite this very real problem:

  • Programmers have a pragmatic approach. In practice, people have never really taken the relational model to its word: SQL is not a faithful implementation of the relational model. Ultimately, you can use a relational database as a simple key-value store. Nobody is forcing you to adopt wide tables. So there is more flexibility than it appears.
  • There are many other instances of rigidity. Constitutions change incrementally because it is too hard to negotiate large changes. Biology is based on DNA and it is unlikely to change anytime soon. Mathematics is based on standard axioms, and we are not likely to revisit them anytime soon. So it is not surprising that we end up locked into patterns. And it is not necessarily dramatic. (But we should not underestimate the cost: mammals have lungs that are far less efficient than the lungs of birds. Yet there is no obvious way for mammals to evolve a different lung architecture.)
  • We have limited the rigidity when we stopped relying universally on SQL as the standard interface to access data. In the web era, we create services that we typically access via HTTP requests. So the rigidity does not have to propagate to the whole of a large organization.

Credit: Thanks to Antonio Badia and Peter Turney for providing me with references and insights for this post.

Science and Technology links (November 17th, 2017)

Josiah Zayner, a biochemist who once worked for NASA, became the first person known to have edited his own genes (…) During a lecture about human genetic engineering that was streamed live on Facebook, Zayner whipped out a vial and a syringe, then injected himself. Now, following in his footsteps, other biohackers are getting ready to take the plunge and tinker with their own genes. Zayner’s experiment was intended to boost his strength by removing the gene for myostatin, which regulates muscle growth. A similar experiment in 2015 showed that this works in beagles whose genomes were edited at the embryo stage. He injected himself (…) to remove the gene. (Biohackers are using CRISPR on their DNA and we can’t stop it)

Human beings definitively do not have the largest brains:

We found that the long-finned pilot whale neocortex has approximately 37.2 × 109 neurons, which is almost twice as many as humans, and 127 × 109 glial cells. Thus, the absolute number of neurons in the human neocortex is not correlated with the superior cognitive abilities of humans (at least compared to cetaceans) as has previously been hypothesized.

We can make old mice smarter by tweaking just one gene according to an article published in Nature:

This study demonstrates for we believe the first time in vivo that 6 months after a single injection of s-KL into the central nervous system, long-lasting and quantifiable enhancement of learning and memory capabilities are found. More importantly, cognitive improvement is also observable in 18-month-old mice treated once, at 12 months of age.

I stumbled on an older post (2015) by Marcel Weiher about his views on where software is headed:

(…) for most performance critical tasks, predictability is more important than average speed (…) Alas the idea that writing high-level code without any concessions to performance (often justified by misinterpreting or simply just misquoting Knuth) and then letting a sufficiently smart compiler fix it lives on. I don’t think this approach to performance is viable, more predictability is needed and a language with a hybrid nature and the ability for the programmer to specify behavior-preserving transformations that alter the performance characteristics of code is probably the way to go for high-performance, high-productivity systems.

He is arguing that engineers have made it hard to reason about performance, and to design software for the needed performance. When we are stuck with unacceptable performance, we are often stuck… unable to know how to fix the problems.

We found a single gene that makes you live seven years older on average than your peers.

Exercise increases the size of your brain.

Fast exact integer divisions using floating-point operations (ARM edition)

In my latest post, I explained how you could accelerate 32-bit integer divisions by transforming them into 64-bit floating-point divisions. Indeed, 64-bit floating-point numbers can represent accurately all 32-bit integers on most processors.

It is a strange result: Intel processors seem to do a lot better with floating-point divisions than integer divisions.

Recall the numbers that I got for the throughput of division operations:

64-bit integer division25 cycles
32-bit integer division (compile-time constant)2+ cycles
32-bit integer division8 cycles
32-bit integer division via 64-bit float4 cycles

I decided to run the same test on a 64-bit ARM processor (AMD A1100):

64-bit integer division7 ns
32-bit integer division (compile-time constant)2 ns
32-bit integer division6 ns
32-bit integer division via 64-bit float18 ns

These numbers are rough, my benchmark is naive (see code). Still, on this particular ARM processor, 64-bit floating-point divisions are not faster (in throughput) than 32-bit integer divisions. So ARM processors differ from Intel x64 processors quite a bit in this respect.

Fast exact integer divisions using floating-point operations

On current processors, integer division is slow. If you need to compute many quotients or remainders, you can be in trouble. You potentially need divisions when programming a circular buffer, a hash table, generating random numbers, shuffling data randomly, sampling from a set, and so forth.

There are many tricks to avoid performance penalties:

  • You can avoid dividing by an arbitrary integer and, instead, divide by a known power of two.
  • You can use a divisor that is known to your compiler at compile-time. In these cases, most optimizing compilers will “optimize away” the division using magical algorithms that precompute a fast division routine.
  • If you have a divisor that is not known at a compile time, but that you reuse often, you can make use of a library like liddivide to precompute a fast division routine.
  • You can reengineer your code to avoid needing a division in the first place, see my post A fast alternative to the modulo reduction.

But sometimes, you are really stuck and need those divisions. The divisor is not frequently reused, and you have lots of divisions to do.

If you have 64-bit integers, and you need those 64 bits, then you might be in a bit of trouble. Those long 64-bit integers have a terribly slow division on most processors, and there may not be a trivial way to avoid the price.

However, if you have a 32-bit integers, you might have a way out. Modern 64-bit processors have 64-bit floating-pointer numbers using IEEE standards. These 64-bit floating-point numbers can be used to represent exactly all integers in the interval [0,253). That means that you can safely cast your 32-bit unsigned integers as 64-bit floating-point numbers.

Furthermore, common x64 processors have fast floating-point divisions. And the division operation over floating-point numbers is certain to result in the closest number that the standard can represent. The division of an integer in [0,232) by an integer in [1,232) is sure to be in [0,232). This means that you can almost replace the 32-bit integer division by a 64-bit floating point division:

uint32_t divide(uint32_t a, uint32_t b) {
  double da = (double) a;
  double db = (double) b;
  double q = da/db;
  return (uint32_t) q;
}

Sadly, if you try to divide by zero, you will not get a runtime error, but rather some nonsensical result. Still, if you can be trusted to not divide by zero, this provides a fast and exact integer division routine.

How much faster is it? I wrote a small program to measure the throughput:

64-bit integer division25 cycles
32-bit integer division (compile-time constant)2+ cycles
32-bit integer division8 cycles
32-bit integer division via 64-bit float4 cycles

These numbers are rough, but we can estimate that we double the throughput.

I am not entirely sure why compilers fail to exploit this trick. Of course, they would need to handle the division by zero, but that does not seem like a significant barrier. There is also another downside to the floating-point approach: it generates many more instructions.

Regarding signed integers, they work much the same, but you need extra care. For example, most processors rely on two’s complement notation which implies that you have one negative number that cannot be represented as a positive number. Thus implementing “x / (-1)” can cause some headaches. You probably do not want to divide signed integers anyhow.

I plan to come back to the scenario where you have lots of 64-bit integer divisions with a dynamic divisor.

This result is for current Intel x64 processors, what happens on ARM processors is quite different.