Faster intersections between sorted arrays with shotgun

A common problem within databases and search engines is to compute the intersection between two sorted array. Typically one array is much smaller than the other one.

The conventional strategy is the “galloping intersection”. In effect, you go through the values in the small arrays and then do a binary search in the large array. A binary search is a simple but effective algorithm to search through a sorted array. Given a target, you compare with with the midpoint value. If your target is smaller than the midpoint value, you search in the first half of the array, otherwise you search in the second half. You can recurse through the array in this manner, cutting the search space in half each time. Thus the search time is logarithm.

If the small array has M elements and the large array has N elements, then the complexity of a galloping search is O(M log N). In fact, you can be more precise: you never need more than M * log N + M comparisons.

Can you do better? You might.

Let me describe an improved strategy which I call “shotgun intersection”. It has been in production use for quite some time, through the CRoaring library, a C/C++ implementation of Roaring Bitmaps.

The idea is that galloping search implies multiple binary searches in sequence through basically the same array. Doing them consecutively might not be best. A binary search, when the large array is not in cache, is memory-bound: it waits for the memory subsystem to deliver the data. So you are constantly waiting. What if you tried to do something else while you wait. What about starting right away on the next binary search?

That is how a shotgun search works. You take, say, the first four values from the small array. You load the midpoint value from the large array, then you compare all of your four values against this midpoint value. If the target value is larger, you set a corresponding index so that the next search will hit the second half of the array. And so forth. In effect, shotgun search does many binary searches at once.

I make my Java code available, if you want a full implementation.

Does it help? It does. Sometimes it helps a lot. Let us intersect an array made of 32 integers with an array made of 100 million sorted integers. I use a cannonlake processor with Java 8.

1-way 1.3 microseconds
4-way 0.9 microseconds

Credit: Shotgun intersections are based on an idea and an initial implementation by Nathan Kurz. I’d like to thank Travis Downs for inspiring discussions.

Science and Technology links (January 12th, 2019)

  1. You can buy a 512GB memory card for $140 from Amazon.
  2. We have been told for decades to avoid saturated fats, the kind found in meat, cheese and butter. After an extensive review, Grasgruber et al. conclude:

    Our results do not support the association between cardiovascular diseases and saturated fat, which is still contained in official dietary guidelines. In the absence of any scientific evidence connecting saturated fat with cardiovascular diseases, these findings show that current dietary recommendations regarding cardiovascular diseases should be seriously reconsidered.

  3. Stoet and Geary propose a new measure of gender inequality and conclude…

    We found that low levels of human development are typically associated with disadvantages for girls and women, while medium and high levels of development are typically associated with disadvantages for boys and men. Countries with the highest levels of human development are closest to gender parity, albeit typically with a slight advantage for women.

  4. Budish argues that Bitcoin would be hacked if it became sufficiently important economically.
  5. We can use deep-learning software to reconstruct intelligible speech from the human auditory cortex.
  6. Nelson’s research indicates that there will be more calories available per capita in 2050 than today, in all income quintiles, and even in an extreme climate change scenario. However, both obesity and dietery shortages (e.g., vitamin A) will remain a challenge:

    We must shift our emphasis from food security to nutrition security.

Science and Technology links (January 5th, 2019)

  1. There are nearly 70,000 centenarians in Japan.
  2. China’s population fell by 1.27 million in 2018.
  3. Obesity is associated with increased senescent cell burden and neuropsychiatric disorders, including anxiety and depression. Clearing senescent cells could alleviate these symptons.
  4. Women who come from families where the mother is the breadwinner are less likely to pursue a science degree at university level.
  5. Stronger men live longer:

    Muscular strength is inversely and independently associated with death from all causes and cancer in men, even after adjusting for cardiorespiratory fitness and other potential confounders.

  6. Inflammation is a crude but effective part of our immune system that serves as a first line of defense. It is necessary but tends to get out of hand as we age. Arai et al. found that your level of inflammation was a potential predictor of how likely you were to reach extreme old age:

    Inflammation is the prime candidate amongst potential determinants of mortality, capability and cognition up to extreme old age

  7. An article published by Nature argues for rejuvenation strategies:

    Ageing is associated with the functional decline of all tissues and a striking increase in many diseases. Although ageing has long been considered a one-way street, strategies to delay and potentially even reverse the ageing process have recently been developed. Here, we review four emerging rejuvenation strategies—systemic factors, metabolic manipulations, senescent cell ablation and cellular reprogramming—and discuss their mechanisms of action, cellular targets, potential trade-offs and application to human ageing.

  8. Childhood leukaemia is a feature of developed societies and has to do with the fact that children are insufficiently exposed to bacteria and infections.
  9. China has landed a rover on the far side of the Moon.
  10. While climate change models predict that as the world warms, biomass will decompose more quickly, which would send a lot more carbon dioxide into the atmosphere, new research finds that it may work the other way around: warmer temperature may reduce decomposition and carbon emissions.
  11. You can defeat Google’s captchas (meant to prove you are a human being) by taking the audio test and feeding it back to Google’s services to get a transcription.

Memory-level parallelism: Intel Skylake versus Intel Cannonlake

All programmers know about multicore parallelism: your CPU is made of several nearly independent processors (called cores) that can run instructions in parallel. However, our processors are parallel in many different ways. I am interested in a particular form of parallelism called “memory-level parallelism” where the same processor can issue several memory requests. This is an important form of parallelism because current memory subsystems have high latency: it can take dozens of nanoseconds or more between the moment the processor asks for data and the time the data comes back from RAM. The general trend has not been a positive one in this respect: in many cases, the more advanced and expensive the processor, the higher the latency. To compensate for the high latency, we have parallelism: you can ask for many data elements from the memory subsystems at the same time.

In earlier work, we showed that current Intel processors (Skylake microarchitecture) are limited to about ten concurrent memory requests whereas Apple’s A12 processor scale to 40 or more memory requests.

Intel just released a more recent microarchitecture (cannonlake) and we have been putting it to the test. Is Intel improving?

It seems so. In a benchmark where you randomly access a large array, using a number of separate paths (which I call “lanes”), we find that the cannonlake processor appears to support twice as many concurrent memory requests as the skylake processors.

The Skylake processor has lower latency (70 ns/query) compared to the Cannonlake processor (110 ns/query). Nevertheless, the Cannonlake is eventually able to beat the Skylake processor in bandwidth by a wide margin (12 GB/s vs 9 GB/s).

The story is similar to the Apple A12 experiments.

This suggests that even though future processors may not have lower latency when accessing memory, we might be better able to hide this latency through more parallelism.

Even if you are writing single-threaded code, you ought to think more and more about parallelism.

Our code is available.

Credit: Though all the mistakes are mine, this is joint work with Travis Downs.

Further details: Processors access the memory through pages. By default, many Intel systems have “small” pages (4kB). When doing random accesses in large memory regions, you are likely to access too many pages, so that you incur expensive “page misses” that lead to “page walks”. It is possible to use large page sizes, even “huge pages”. But since memory is allocated in pages and you may end up with many under-utilized pages if they are too large. In practice, under-utilized pages (sometimes called “memory fragmentation) can be detrimental to performance. To get the good results above, I use huge pages. Because there is just one large memory allocation in my tests, memory fragmentation is not a concern. With small pages, the Cannonlake processor loses its edge over Skylake: they are both limited to about 9 concurrent requests. Thankfully, on Linux, programmers can request huge pages with a madvise call when they know it is a good idea.

Important science and technology findings in 2018

  1. The Gompertz-Makeham law predicts statistically the mortality rate of human beings. The key takeaway is that it is an exponential function. Every few years, the mortality rate of a human being doubles. It is not unique to human beings: most mammals and many other animals have an exponentially rising mortality rate over time. It does not affect all animals, however. Lobsters do not appear to age like we do. Many trees age in reverse, meaning that their mortality rate diminishes over time. In 2018, a scientist studying naked mole rates for decades published an analysis of over 3,000 rats and found that their mortality rate remains constant throughout their life. We do not know why naked mole rates age differently from most other mammals.
  2. Type 1 diabetes is when your pancreas is unable to supply insulin to your cells. Though it can be treated with expensive and inconvenient insulin shots, there is no cure. In 2018, we found that a heart-disease drug can partially reverse type 1 diabetes. This could make some diabetics less dependent on insulin.
  3. Though artificial intelligence has been making a lot of progress in tasks such as image classification and game playing, we are still a long way from being able to intelligently animate human-like body parts like hands. Simple tasks like folding laundry or turning a door knob are still a massive challenge. In 2018, researchers from OpenAI have trained a human-like robot hand to manipulate objects like we would.
  4. Older people tend to have a less efficient immune system. In 2018, we learned that we can at least partially reverse age-related immune-system decline using drugs. It substantially reduces infections in older people.
  5. The human genome project set forth in 1990 to map the human chromosomes. We thought at the time that human beings would have 100,000 genes, but they have only about 25,000 genes. The map was completed in 2003. Yet applications of the human genome project have been scarce. In 2018, the first gene-silencing drug was approved in the USA.
  6. Electrocardiograms (ECG) have been used since the 19th century to monitor human hearts. The first commercially available ECG machines were produced at the beginning of the 20th century but they remain specialized devices mostly just used within hospitals. They are also somewhat invasive. In 2018, Apple released a watch with government-approved ECG (heart monitoring) capabilities.
  7. CRISPR/Cas9 is technique developed in 2012 to edit the genes of living organisms. It is unclear whether it is safe to use it on human beings… Using this technique, a Chinese researcher helped produce the first genetically modified babies, they may be immune to HIV.

Science and Technology links (December 29th, 2018)

  1. Low-dose radiation from A-bombs elongated lifespan and reduced cancer mortality relative to un-irradiated individuals (Sutou, 2018):

    The US National Academy of Sciences (NAS) presented the linear no-threshold hypothesis (LNT) in 1956, which indicates that the lowest doses of ionizing radiation are hazardous in proportion to the dose. This spurious hypothesis was not based on solid data. (…) A-bomb survivors (…) showed longer than average lifespans. Average solid cancer death ratios of(…) A-bomb survivors (…) were lower than the average for Japanese people (…), essentially invalidating the LNT model. Nevertheless, LNT has served as the basis of radiation regulation policy. If it were not for LNT, tremendous human, social, and economic losses would not have occurred in the aftermath of the Fukushima Daiichi nuclear plant accident. For many reasons, LNT must be revised or abolished, with changes based not on policy but on science.

    This is important work. I am surprised at how few people know about hormesis. Many people assume that if you avoid stress, toxins and challenges, you will maximize your health and longevity. That is just flat out wrong.

  2. In climate talks, we use year 1850 as a reference: the implicit goal is to maintain Earth’s global temperature close to the global temperature that existed in year 1850 (say within 1.5 degrees). To my knowledge, nothing makes year 1850 special. In fact, in the absence of both ancient and recent carbon emissions from agriculture and industrialization, current global average temperatures would likely have been about 1.3 degrees lower than they were around 1850 (Vavrus et al., Nature 2018). We were headed down toward another glaciation and, instead, due to human beings, we are headed toward warmer and warmer temperatures. Left unchecked, neither direction is desirable. As argued by Deutsch in the Beginning of Infinity, we have no choice but to accept that there is no such thing as “sustainability” (which assumes an ideal steady state) and that we must learn to engineer our climate.
  3. Human beings do not originate from a single region in Africa: the story of our origin is complicated, involving a mix of different populations and cultures.
  4. McGuff and Litte (2009) write:

    there is no additional physiological advantage afforded to one’s body, including endurance or cardio benefits, by training that lasts more than six to nine minutes a week.

  5. The videogame Fortnite lead to 3 billion dollars in profit for its creators. It has 200 million players.
  6. In older mammals, the skin loses fat. Zhang et al. restored the ability of skin in older mice to store fat, thus making the skin more resistant to some infections.
  7. Students who make friends and study with them tend to do better. This is painfully obvious to anyone who has given serious thought to how schooling works.

Science and Technology links (December 22nd 2018)

  1. For equity reasons, many people advocate for double-blind peer review, meaning that the author does not know who the reviewer is, nor does the reviewer know who the author is. It is believed (despite little hard positive evidence and some contrary evidence) that this is surely benefificial to female authors and minorities. Cox and Montgomerie find that:

    These analyses suggest that double-blind review does not currently increase the incidence of female authorship in the journals studied here. We conclude, at least for these journals, that double-blind review does not benefit female authors and may, in the long run, be detrimental.

    Why do they say it may be detrimental?

    Firstly, making everything anonymous is hard work. And ressources are finite: we are already struggling to find time and people to review manuscripts… adding to the burden has a cost. Thus we must ensure that there are comparable benefits. You shouldn’t think for a minute that making science harder and more expensive is going to necessarily benefit women and minorities. Raising the costs usually works against inclusion.

    Secondly, they observe empirically that publications by women are less likely to appear in double-blind-review journals than in conventional journals. Why is that? If double-blind reviews are obviously beneficial to women, they would flock the double-blind journals… but they do not. Either women are misguided or else, more likely, double-blind reviews do not favor women.

    And, finally, there may be substantial benefits to the authors and the community in the reviewers knowing who they are. It is simply a fact that the identity of the authors is an important factor when assessing a piece of scientific work. Ultimately, we tend to reward sustained high quality work with more credibility. You want authors to have skin in the game: if their work is bad, then they should pay a price (more scrutiny of their work in the future) and when their work is consistently good, they should be rewarded (given more implicit trust).

  2. The price of lithium-ion batteries has fallen by 73% between 2010 and 2016. (Source: Bloomberg) However, it does not mean that these batteries are getting better at a similar rate: it seems that even though the price is falling, the physical quality of the batteries remains similar.
  3. Scientists have created viable human hair follicles from cultured human cells. (Source: Nature)
  4. There is interest in NAD supplements for anti-aging purposes. Xie et al. (2018) show cognitive benefits due to NAD supplements in old mice.
  5. Bernstein et al. (2018) point out that continuous and sustained social interactions reduce individual exploration. In other words, to be highly original, you do need to close your door and stop answering emails and phone calls for a time.

Fast Bounded Random Numbers on GPUs

We often use random numbers in software in applications such as simulations or machine learning. Fast random number generators tend to produce integers in [0,232) or [0,264). Yet many applications require integers in a given interval, say the interval {0,1,2,3,4,5,6,7}. Thus standard libraries frequently provide a way to request an integer within a range.

How does it work underneath? Typically, a 64-bit or 32-bit integer is produced, and then it is converted into an integer within the desired range. Unfortunately, as pointed out in details by Melissa O’Neill, doing so without introducing undue biases is slow. That is, it can take more time to convert the integer to the range than to produce the random integer in the first place!

In Fast Random Integer Generation in an Interval, we show how to drastically accelerate this computation compared to the standard libraries. The main trick is to avoid as much as possible the use of division instructions (since they are slower). Bernardt Duvenhage has a great talk on the application of this technique in Python. There is an illustrative benchmark on GitHub.

In the paper, there is a precautionary note about the applicability of this technique to GPUs. Indeed, there are substantial differences between general purposes 64-bit processors and common GPUs (32-bit). A reader, Norbert Juffa, reached out to me to point out that the note might be unwarranted. Juffa wrote a benchmark using the NVIDIA API (CUDA) to support his claim.

The fast function that avoids divisions as much as possible can be expressed using a few lines of C.

// returns value in [0,s)
uint64_t nearlydivisionless (uint64_t s)  {
    uint64_t x = random64 ();
   // compute ( x * s ) >> 64
    uint64_t h = __umul64hi (x, s); 
    uint64_t l = x * s;
    if (l < s) {
        uint64_t t = -s % s;
        while (l < t) {
            x = random64 ();
            h =__umul64hi (x, s);
            l = x * s;
    return h;

What Juffa does is to generate 10,000,000 integers in the interval [0,500,000] from Marsaglia’s KISS64 random number generator. On a Quadro P2000 Nvidia card, he shows that using an approach that minimizes the use of divisions is much faster.

OpenBSD-like 5 ms
Java-like 2.9 ms
Our approach 1.4 ms

I make Juffra’s code available.

Sorting strings properly is stupidly hard

Programming languages make it hard to sort arrays properly. Look at how JavaScript sorts arrays of integers:

> v = [1,3,2,10]
[ 1, 3, 2, 10 ]
> v.sort()
[ 1, 10, 2, 3 ]

You need a magical incantation to get the right result:

> v.sort((a,b)=>a-b)
[ 1, 2, 3, 10 ]

Though this bad default can create bugs, it is probably not the source of too many frustrations. However, sorting strings alphabetically is a real problem. Let us see how various languages do it.


> var v = ["e", "a", "é","f"]
> v.sort()
[ 'a', 'e', 'f', 'é' ]
> v = ["a","b","A","B"]
> v.sort()
[ 'A', 'B', 'a', 'b' ]


>>> x=["e","a","é","f"]
>>> x.sort()
>>> x
['a', 'e', 'f', 'é']
>>> x=["a","A","b","B"]
>>> x.sort()
>>> x
['A', 'B', 'a', 'b']


  1> var x = ["e","a","é","f"] 
  2> x.sorted()
$R0: [String] = 4 values {
  [0] = "a"
  [1] = "e"
  [2] = "f"
  [3] = "é"
  1>  var x = ["a","b","A","B"]
  2> x.sorted() 
$R1: [String] = 4 values {
  [0] = "A"
  [1] = "B"
  [2] = "a"
  [3] = "b"

As far as I can tell, by default, these languages apply crude code-point sorting. Human beings understand that the characters e, é, E, É, è, ê, and so forth, should be considered as the same letter (e) with accents. There are exceptions to this rule, but the default which consists in sorting accentuated characters after the letter ‘z’ is just not reasonable. The way case is handled is patently stupid. You might prefer A to come before a, or vice versa, but no human being would ever sort the letters as A,B,a,b or a,b,A,B.

There are standards for sorting strings, such as the Unicode Collation Algorithm.

To get a sensible default, programming languages force you to use complicated code. In JavaScript, it is burdensome but easy enough….

> v.sort(Intl.Collator().compare)
[ 'a', 'e', 'é', 'f' ]

However, I am not sure what the equivalent is in Python and Swift. It does not jump at me in the documentation of the respective standard librairies. I did not even look at it for other popular programming languages like Go, C++, and so forth.

It is unacceptably difficult to do the “right thing”. The net result is that many programmers do not sort strings properly. If you use a natural language with non-English characters, you see the effect in many applications. It looks bad.

Thankfully, most major software products get it right. Microsoft Office, Google Docs, Apple apps… all do the right thing. It creeps up in small budget applications. I have to use one in my daily life as an employee of a public university, and it annoys me.

We should do better.

Further reading:International Components for Unicode

Science and Technology links (December 15th 2018)

  1. Academic excellence is not a strong predictor of career excellence. There is weak correlation between grades and job performance. Grant reviews the evidence in details in his New York Times piece.When recruiting research assistants, I look at grades as the last indicator. I find that imagination, ambition, initiative, curiosity, drive, are far better predictors of someone who will do useful work with me. Of course, these characteristics are themselves correlated with high grades, but there is something to be said about a student who decides that a given course is a waste of time and that he works on a side project instead. Breakthroughs don’t happen in regular scheduled classes, they happen in side projects. We want people who complete the work they were assigned, but we also need people who can reflect critically on what is genuinely important. I don’t have any need for a smart automaton: I already have many computers.I have applied the same principle with my two sons: I do not overly stress the importance of good grades, encouraging them instead to pursue their own interests and to go beyond their classes.
  2. Our hearts do not regenerate. Thus a viable strategy might to transplant brand new hearts from pigs. This is much harder than it appears, however. But progress is being made. Researchers are now able to keep baboons alive for months with transplanted pig hearts. To achieve this good result, the scientists had to use an immunosuppressant drug to prevent unwanted growth in the pig’s heart. With some luck, some of us could benefit from transplanted heart pigs in the near future.
  3. Cataract is the most common cause of blindness. It can be “cured” by removing your natural lens and replacing them with artificial lenses called IOL (intraocular lenses). This therapy was invented in the 1940s, but it took 40 years before it became widespread in wealthy countries. It is still out of reach in many countries. Yet the cost of intraocular lenses is less than 10$ and the procedure is inexpensive (it costs less than 25$ in total in some countries). Even today, in many rich countries, access to this therapy is restricted. Finally, in 2017, a government agency in UK recommended that we stop rationing access to cataract surgery.
  4. Physically fit middle-age women are much less likely to develop dementia (e.g., Alzheimer’s).
  5. You might expect that research results published in more prestigious venues would also be more reliable. Brembs (2018) suggests it works the other way around:

    an accumulating body of evidence suggests the inverse: methodological quality and, consequently, reliability of published research works in several fields may be decreasing with increasing journal rank

    My own recommendation to colleagues and students has been that if peer-reviewed publications are warranted, then it is fine to target serious well-managed venues, irrespective of their “prestige”.

    It is hard enough to do solid research, if you also have to tune it so that it outcompetes other proposals in a competition for prestige, I fear that you may discourage good research practices. Scientists care too little about modesty, it is their downfall.

  6. Lomborg, a reknown economist, writes about climate change:

    Using the best individual and collectively peer-reviewed economic models, the total cost of Paris – through slower GDP growth from higher energy costs – will reach $1-2 trillion every year from 2030. (…) It’s so expensive because green energy isn’t ready to replace fossil fuels at scale. Nations are using expensive subsidies and other policies to force immature green technologies on consumers and businesses. We need to change course. The smart option, backed by economic science, is to adopt a technology-led policy. This means investing far more into green energy research and development. Rather than forcing the rollout of immature energy sources, we need to ensure that green energy can out-compete fossil fuels.

    I really like the term “technology-led policy”. If you want to change the world for the better, then making the good things cheap using technology and science is the golden path.

  7. About 60% of all scientists never lead a research project of their own which indicate that they always play a supporting role. In fields like astronomy, ecology and robotics, half of all researchers leave the field every five years, a consequence of the fact that there are many more aspiring scientists than there are good jobs. Though this sounds bad, but one must consider that the number of scientists doubles every 15 years. Thus even though the job prospects for scientists look poor in relative terms, we must consider that we never had so many gainfully employed scientists.
  8. The state of Louisiana is adopting digital drive licenses. Meanwhile, in Montreal, I still can’t take the subway without constantly recharging a stupid card.
  9. Lack of copper might lead to heart disease. Copper is found in shiitake mushrooms, oysters, dark chocolate, sesame seeds, cashew nuts, raw kale, beans and avocados.
  10. The diabetes drug Metformin is under study as an anti-aging drug. It is believed to be very safe yet Konopka et al. suggests that it may lower the benefits due to exercise.
  11. Over time, our bodies accumulate a small fraction of “senescent cells”. It is believed that these disfunctional cells contribute to the diseases of old age. For the last few years, researchers have been looking for senolytics, drugs that can kill senescent cells. It turns out that two antibiotics approved for medical use are potent senolytics.
  12. The first autonomous vehicule (the ancestor of the self-driving car) was built in 1961.
  13. Billions of dollars have been spent on clinical trials to try to cure Alzheimer’s, all in vain. Golde et al. propose that the problem might have to do with poor timing: we need to apply the therapy at the right time. Wadmam suggests that Alzheimer’s might spread like an infection.
  14. China is introducing far reaching penalties for researchers who commit scientific fraud:

    Chinese leaders have been increasingly focused on scientific misconduct, following ongoing reports of researchers there using fraudulent data, falsifying CVs and faking peer reviews. In May, the government announced sweeping reforms to improve research integrity. One of those was the creation of a national database of misconduct cases. Inclusion on the list could disqualify researchers from future funding or research positions, and might affect their ability to get jobs outside academia. (Source: Nature)

    We need to recognize that the scientific enterprise is fundamentally on an honor-based system. It is trivial to cheat in science. You can work hard to collect data, or make it up as you go. Except for the most extreme cases, the penalty for cheating is small because there is almost always plausible deniability.