Science and Technology links (March 14th 2020)

    1. Mothers, but not fathers, possess gender-related implicit biases about emotion expression in children.
    2. Chinese researchers used to be offered cash rewards for publishing research articles. The Chinese government has banned such rewards. Increasingly, we are collectively realizing that a single-minded focus on publication numbers ensures systemic waste and failure. This is not unique to the business of research. You do not want to assess programmers by counting the number of lines of code they write, or writers by how many books they have published. The naive application of simple metrics can lead to disastrous results.
    3. A patient’s lung was removed, cleaned from cancer and put back.
    4. Many major American universities have moved classes online for the rest of the term due to the ongoing pandemic.
    5. Medical doctors keep on practicing many inefficient or harmful therapies, against all scientific evidence. For example, stents are frequently put into people at risk of cardiovascular disease, even though it is virtually useless.

Fast float parsing in practice

In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is quite slow. People who write fast parsers tend to roll their own number parsers (e.g., RapidJSON, sajson), and so we did. However, we sacrifice some standard compliance. You see, the floating-point standard that we all rely on (IEEE 754) has some hard-to-implement features like “round to even”. Sacrificing such fine points means that you can be off by one bit when decoding a string. As such, this never matters: double-precision numbers have more accuracy than any engineering project will ever need and a difference on the last bit is irrelevant. Nevertheless, it is mildly annoying.

A better alternative in C++ might be from_chars. Unfortunately, many standard libraries have not yet caught up the standard and they fail to support from_chars properly. One can get around this problem by using the excellent abseil library. It tends to be much faster than venerable strtod function.

Unfortunately, for our use cases, even abseil’s from_chars is much too slow. It can be two or three times slower than our fast-but-imperfect number parser.

I was going to leave it be. Yet Michael Eisel kept insisting that it should be possible to both follow the standard and achieve great speed. Michael gave me an outline. I was unconvinced. And then he gave me a code sample: it changed my mind. The full idea requires a whole blog post to explain, but the gist of it is that we can attempt to compute the answer, optimistically using a fast algorithm, and fall back on something else (like the standard library) as needed. It turns out that for the kind of numbers we find in JSON documents, we can parse 99% of them using a simple approach. All we have to do is correctly detect the error cases and bail out.

Your results will vary, but the next table gives the speed numbers from my home iMac (2017). The source code is available along with everything necessary to test it out (linux and macOS).

parser MB/s
fast_double_parser (new) 660 MB/s
abseil, from_chars 330 MB/s
double_conversion 250 MB/s
strtod 70 MB/s

Science and Technology links (March 7th 2020)

  1. The benefits of flu vaccines in healthy adults is modest. They do not reduce neonatal death, hospitalisations, or working day lost. It does not seem more helpful in older adults.
  2. While in 1970, only 1% of the American population was severely obese, the percentage of obesity is now at 9%.
  3. Older adults who volunteer and stay active maintain a larger brain.
  4. The election of Donald Trump in 2016 lead to slightly fewer baby boys in Canada. Stress tends to favor female births.
  5. The plague is still killing people in Madagascar. The disease is carried by rats.
  6. The social cost of carbon has been estimated to be as high as $60 a tonne, meaning that if we spent $60 to remove the production of a tonne of carbon, we would be even. Some of the latest recent estimates are much lower: between $0.60 and $3. These new estimates take into account the benefits of CO2 such as higher plant productivity.

Will calling “free” or “delete” in C/C++ release the memory to the system?

In the C programming language, we typically manage memory manually. A typical heap allocation is a call to malloc followed by a call to free. In C++, you have more options, but it is the same routines under the hood.

// allocate N kB
data = malloc(N*1024);
// do something with the memory
// ...
// release the memory
free(data);

It stands to reason that if your program just started and the value of N is large, then the call to malloc will result in an increased memory usage by about N kilobytes. And indeed, it is the case.

So what is the memory usage of your process after the call to “free”? Did the N bytes return to the system?

The answer is that, in general, it is not the case. I wrote a small program under Linux that allocates N kilobytes and then frees them. It will then measure the RAM usage after the call to free. The exact results will depend on your system, standard library and so on, but I give my results as an illustration.

As you can observe in the table, the memory does sometimes get released, but only when it is a large block of over 30 MB in my tests. It is likely because in such cases a different code path is used (e.g., calling mmap, munmap). Otherwise, the process holds on to its memory, never again releasing it to the system.

memory requested memory usage after a free
1 kB 630 kB
100 kB 630 kB
1000 kB 2000 kB
10,000 kB 11,000 kB
20,000 kB 21,000 kB
30,000 kB 31,000 kB
40,000 kB 1,200 kB
50,000 kB 1,300 kB
100,000 kB 1,300 kB

Of course, there are ways to force the memory to be released to the system (e.g., malloc_trim may help), but you should not expect that it will do so by default.

Though I use C/C++ as a reference, the exact same effect is likely to occur in a wide range of programming languages.

What are the implications?

  • You cannot measure easily the memory usage of your data structures using the amount of memory that the processes use.
  • It is easy for a process that does not presently hold any data to appear to be using a lot of memory.

Further reading: glibc malloc inefficiency

Science and Technology links (February 29th 2020)

  1. No one really understands how planes fly. This puts a dent in the model whereas inventions follows theory.
  2. Physician salaries and diagnostic tests account for 4% of GDP in the US.
  3. A supplement (NMN) available to human beings has been used in mice: it restores fertility in old female mice.
  4. The investment firm Andreesen Horowitz is critical of artificial intelligence start-ups. It finds that they have low margins due to the cost of training models using many computers on the cloud. They may fail to scale due to the difficulty of generalizing use cases. And they are subject to commodification. That is, “artificial intelligence” as a business proposition may be far less profitable than typical “software” ventures.
  5. The most profitable businesses tend to be started by older people who have lots of experience in a specific industry.
  6. Dog can detect heat sources far away (1.6 meters) with their nose.
  7. Cancer patients received genetically edited immune cells (using CRISPR). It turned out well.

Fast divisionless computation of binomial coefficients

Suppose that I give you a set of n objects and I ask you to pick k distinct objects, thus forming a new subset. How many such subsets are there? If you have taken college mathematics, you have probably learned that the answer is given by binomial coefficients. They are defined as n!/(k! * (n-k)!) where the exclamation point refers to the factorial. For a programmer, it is easier to think of binomial coefficients as the result of this loop:

def binom(n,k):
    top = 1
    bottom = 1
    for i in 1, 2, ..., k:
        top *= n - i + 1
        bottom *= i
    return top/bottom

Though simple enough, this algorithmic definition is not practical if you are interested in performance. Both the numerator and the denominator grow large quickly. They will soon require several machine words. A programming language like Python will happily give you the correct answer, but slowly. In Java or C, you are likely to get the wrong result due to a silent overflow.

Of course, if you know that the binomial coefficient is too large to fit in a machine word (64 bits), then you may as well go to a big-integer library. But what if you know that the result fits in a machine word? Maybe you have a reasonable bound on the size of n.

Then instead of waiting at the very end before doing the division, you can divide at each iteration in the loop:

def binom(n,k):
    answer = 1
    for i in 1, 2, ..., k:
        answer = answer * (n - k + 1) / k
    return answer

This new approach may still overflow even if the binomial coefficient itself fits in a machine word because we multiply before dividing. You can get around this issue by first finding a common divisor to both the multiplier and the divisor, and factoring it out. Or else, you can further restrict the values of n and k. Let us choose this last path.

We still have as a problem that we need k-1 multiplications and divisions. The multiplications are relatively cheap, but the divisions have longer latencies. We would prefer to avoid divisions entirely. If we assume that k is small, then we can just use the fact that we can always replace a division by a known value with a shift and a multiplication. All that is needed is that we precompute the shift and the multiplier. If there are few possible values of k, we can precompute it with little effort.

Hence, if you know that, say, n is smaller than 100 and k smaller than 10, the following function will work…

uint64_t fastbinomial(int n, int k) {
  uint64_t np = n - k;
  uint64_t answer = np + 1;
  for (uint64_t z = 2; z <= (uint64_t)k; z++) {
    answer = answer * (np + z); 
    fastdiv_t f = precomputed[z];
    answer >>= f.shift;
    answer *= f.inverse;
  }
  return answer;
}

I provide a full portable implementation complete with some tests. Though I use C, it should work as-is in many other programming languages. It should only take tens of CPU cycles to run. It is going to be much faster than implementations relying on divisions.

Another trick that you can put to good use is that the binomial coefficient is symmetric: you can replace k by nk and get the same value. Thus if you can handle small values of k, you can also handle values of k that are close to n. That is, the above function will also work for n is smaller than 100 and k larger than 90, if you just replace k by nk.

Is that the fastest approach? Not at all. Because n is smaller than 100 and k smaller than 10, we can precompute (memoize) all possible values. You only need an array of 1000 values. It should fit in 8kB without any attempt at compression. And I am sure you can make it fit in 4kB with a little bit of compression effort. Still, there are instances where relying on a precomputed table of several kilobytes and keeping them in cache is inconvenient. In such cases, the divisionless function would be a good choice.

Alternatively, if you are happy with approximations, you will find floating-point implementations.

Science and Technology links (February 22nd 2020)

    1. In a large cohort study, the highest probability of reaching 90 years old was found for those drinking between 5g and 15 g of alcohol per day. This does not mean that if you are not drinking, you should start.
    2. The Earth is getting greener thanks to CO2. In turn, a greener Earth will mitigate global warming. (Source: Nature)
    3. In 2019, the carbon emissions in the US fell by 2.9%. They fell by 5% in the European Union. They also fell in Japan. (Source: Bailey)
    4. Robots can take blood samples and apparently do competent job, according to a clinical trial.
    5. We may soon get 80-terabyte disk drives.
    6. The age-adjusted cancer rate in the US is currently about at the same level as it was in 1930. We are not winning the war against cancer.
    7. You are better off cutting your food on wooden planks, they are more hygienic that plastic planks.
    8. Science is undergoing what some people call the “reproducibility crisis”: may important results reported in prestigious venues cannot be reproduced by other scientists, independently. Miyakawa suggests that the reproducibility crisis might be related to the fact that studies are frequently fabricated:

      (…) more than 97% of the 41 manuscripts did not present the raw data supporting their results when requested by an editor, suggesting a possibility that the raw data did not exist from the beginning.

      A few years ago, I was on the PhD committee of a student. I questioned the results. Ultimately, we asked for the software that produced that data. The student quickly reported that the software had been lost, deleted by the University. We declined to grant the PhD despite an extensive publication record (with articles in some of the best venues). I paid a political price for my choice to fail the student. The student eventually did get a PhD after an appeal. I would not be surprised to learn that this student became a professor. The lesson is that you should always doubt scientific studies. Ask that they be independently reproduced.

My thoughts on how research funding is attributed in Canada to Computer Science

In Canada, most computer science professors seek funding with NSERC, the main Canadian funding agency for science and engineering. It is more or less the equivalent of the American NSF. The core NSERC program is called “discovery” and it funds 5-year research programs. So, roughly speaking, currently funded professors apply for funding about once every 5 years. Once funded, you do not have to pursue the program you proposed: we recognize that you cannot be expected to stupidly stay on the same course for five years in a fast moving field.

Applicants get a rating regarding the excellence of the researcher, the merit of their proposal and on how well they are training students. It is an all-around evaluation. It is quite stressful.

Not all professors are funded or seek funding. However, it is common for computer science departments to expect their new faculty to seek funding. At many places, getting funding is a necessary condition to get tenure. I would expect it to be the case at all top departments. In effect, the NSERC discovery program act as a Canada-wide peer review process.

The grants are modest: from about 20k$ a year to 100k$. Very few people get 100k$ a year, you basically have to be a Turing award recipient. So it is not a lot of money compared to what American professors get. In most cases, all the money goes to students. Professors cannot pay themselves. So getting a grant does not increase your salary.

In computer science, applications go to a committee made of between 40 to 50 people. Most are from Canadian universities, but there are some people from industry and from other countries. Each application is reviewed by a team of five committee member, supervised by a chair (who is also a member of the committee) as well as at least one program officer. There are also external reviews which are taken seriously, but are just one element among many. The applicants must provide samples of their research; they committee members browse and discuss these papers. And there is a 5-page proposal describing the science that the applicant wants to pursue.

I just finished a term as co-president of the committee. It is a lot of work. I could probably have written five more papers these last three years without this service responsibility. Let me add that it is unpaid.

Here are my take-away from the experience:

  1. We often hear that science is all about publishing lots and lots of papers. That is definitively not true. Once you put a bunch of professional computer scientists in a room and you tell them to judge someone… they quickly dismiss sheer volume. They seek quality. They seek actual science. They also tend to go with proxies, as in “they published in the best venue”. Yet, even there, it is not so much the volume that matters as the fact that specific journals and conferences are especially selective. Committee members are eager for explanations as to why the research is great; it is often the applicants themselves who are not forthcoming about details. If you wrote about a breakthrough in an online note or presented it at a local workshop, your peers will be happy to consider it, though you have a bit more explaining to do than if it appeared in a prestigious venue. And it is definitively possible to get the highest ratings without a pursuit of prestigious venues.
  2. We take services to the community and to society very seriously. It is not all about papers.
  3. I don’t think bibliometrics like citations get discussed much at all: again, it is not about numbers. Being highly cited can be good, but it is not the end game.
  4. It is surprising how even the most experienced researchers sometimes cannot describe competently a research proposal. Sometimes there are goals, but no means to achieve them. Sometimes the objectives are all over the map and incoherent. People will mix and match ideas in the most absurd manner.
  5. The committee is quite open to new and bold ideas. In fact, it rewards bold ideas if they are presented in a competent manner.
  6. Years after years, I have seen “old ideas” being praised when put into a solid program. Not everything good has to be about bitcoins and deep learning. That is, it is not required that you work on fashionable ideas. The committee has a lot of tolerance for unfashionable ideas.
  7. People who try to “pad” their resume to look impressive take risks. Committee members get annoyed very quickly at people gaming the system.
  8. Everyone gets assessed on the same grid. It does not matter whether you are at the best or worst university. It does not matter if you are 20 years old or 70 years old. Evidently, it does not matter whether you are man or not. So it is a system that is hard on younger, less experienced professors. It is also hard for people from small institutions. If you are both inexperienced and from a small institution, it is especially difficult. The counterpart is that if you do well while being at a small institution, you should be especially proud of yourself.
  9. Unfortunately, as with every year, many professors will get bad news. They will have failed to get funding. This may mean that they will soon lose their job. Some people are just bad at what they do or they have bad ideas: it is a service to tell them loud and clear. But most people who fail to receive funding at not bad. In many cases, the merit of their work was clear. This is not a perfect system, nor can it be. Some people simply have not yet had the means to reach the thresholds set by the grid. Some people do work that just do not fit well with the grid. This makes me sad and slightly depressed, but there is no easy fix. In my department, we specifically do not require that new professors get a discovery grant to receive tenure. We do our own assessment.
  10. It is definitively not all about politics. I have heard rumours about people from a given school of thought trying to sink people from another school, or people trying to be advocate for their own speciality. I am sure it happens. However, when it is observed, it does get some attention. Nothing is ever perfect, but we don’t let politics take over.

All in all, I feel better about my peers and about computer science after this experience at NSERC. I am generally very concerned about quality in research. There is a lot of junk out there. Yet there is also a lot of good people try to do good work. My expectation is that the Canadian system is probably one of the best in the world because it puts quality and good science first.

 

Further reading: Ebrahim Bagheri was a fellow committee member and he wrote about his experience on twitter.

Science and Technology links (February 8th 2020)

  1. It is often believed that radiations are bad for you. To the contrary, David et al. report that life expectancy is approximately 2.5 years longer in people living in areas with an elevated background radiation in the USA.
  2. Birth order, that is whether you are the oldest or youngest sibling, is independently associated with a number of health and performance metrics. The first born is usually more fit and stable. Lin et al. argue that the main explanation might be that younger siblings are more likely to have been unwanted.
  3. The University of California has completely cancelled its subscription to research papers by Elsevier. Elsevier is one of the most important scientific publisher. It is also a for-profit publisher.
  4. Low levels of Low-density lipoprotein cholesterol (LDL-C), often qualified as “bad cholesterol”, are strongly and independently associated with increased risk of cancer, cardiovascular diseases, and all-cause mortality according to a Korean study made of hundreds of thousands of human subjects. This new study puts into question mainstream beliefs about “bad cholesterol”.
  5. Education is strongly associated with better health and better longevity. However, after controlling for income and living conditions, the relationship between health and education evaporates.
  6. Harvard’s George Church has created a startup that will use an anti-aging gene therapy to increase the longevity of dogs. It is based on previous work done on mice and reported in a paper entitled A single combination gene therapy treats multiple age-related diseases (source: PNAS).
  7. As we age, 90% of us will get gray hair. It is often believed to be an irreversible process. Researchers at the University of Alabama found strong evidence that it is not the case and they believe that hair graying can be reversed. They are launching a company to develop the therapy. Note that there is documented but anecdotal evidence for gray-hair reversal, e.g., in cancer patients.

Research should not stop with the research paper

The practice of academic research is based on the production of formal documents that undergo formal reviewers by peers. We routinely evaluate academics for jobs and promotions based on their publication output. When asked about their contribution to science, many academics are happy to point at their papers. In some cases, they will also point at the grants that they have secured.

Back when I worked at NRC, a friend of mine, Martin Brooks gave a controversial talk entitled “user-based research”. It has been nearly 20 years, and I still remember this talk. His ideas were so upsetting that people left while he was talking. I stayed and listened. It took me years to own the ideas he expressed that day.

Martin complained that researchers mostly “threw research over the wall” expecting that other people (maybe industry) would pick up the research from the other side of the wall. While it certainly can happen that others will read your paper and apply it, you should not count on it. Nobody has to read and understand your work.

When I was a bit younger, a senior professor pointed out that some idea we were discussed had been described and tested by his team in some prestigious venue decades ago but that nobody ever did anything with it. Without thinking, I replied:   it was your job to do something with it, don’t blame others. The room fell silent and there was a long pause.

I am not saying that if you find a way to cure cancer in mice, it is your job to get the therapy cleared for use with human beings and to open a hospital where the therapy is delivered. A single individual can only do so much.

What I am saying, however, is that publishing a research paper is not the goal of research. It is not the final output. It is only one element in a chain. And it is not even a requirement. You need to communicate your idea, but the peer reviewed journal article is just one such mean.

Your actual goal is “transfer”. That is, someone, somewhere, must put your ideas in practice beyond the publication of your paper. It does not mean “industrial applications” though it can be that. If your idea is worthwhile and you let it end with a research paper, you have failed.

And it does not merely mean collecting “citations” or other bibliometrics. People routinely cite papers without reading them. Few citations are influential.

But the academic incentives almost conspire to prevent impactful research. There is one specific criteria that academics like to apply that is destructive: novelty. For some piece of academic work to be judged worthwhile, it must be novel. I will say it again and again: originality is overrated.

Of course, people entering a new field tend to “rediscover known facts”. You have to push them back and tell them to go study some more. But there is a difference between naivité and lack of originality. You have to be aware of the history of your field, that is what scholarship is all about. But you also have to stick with ideas for the long run, until the fruits appear.

Instead of rewarding novelty, we should reward scholarship: we should ask people to show that they have studied and documented the past. We should never penalize someone who works on pushing a known idea by refining it, communicating it, validating it.

This idea that some venerable professor had 20 years ago that never went anywhere? Well, it might be entirely worthwhile to revisit it and make it go somewhere, even if it is not novel at all.

More: This blog post is the subject of a live interview with me on YouTube.

Further reading: Boote, D. N., & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature review in research preparation. Educational researcher, 34(6), 3-15.

Science and Technology links (February 1st 2020)

  1. Almost all climate predictions are based on the so-called “business as usual” model, yet this model is based on assumptions that are unrealistically pessimistic. (Source: Nature)
  2. African populations have Neanderthal ancestry. This might be indicative that people from Europe went back to Africa over the centuries.
  3. The oceans are not acidic, their PH is about 8.0. (Source: Forbes)
  4. As we age, we tend to accumulate “senescent cells”: these are cells that should be dead but somehow linger around. We know how to kill them selectively with some success. Recent research suggests that brain neurodegeneration might be caused by senescent brain cells. Neurons themselves do not become senescent, as far as I know, but supporting cells like astrocytes do. This suggests that we may improve brain health in older people with therapies to remove senescent cells. Similarly, senescent cells might cause osteoporosis, the phenomenon whereas the bones of older people become brittles.
  5. Mitochondria are small cells that live in our own cells and produce our energy. We have now learned that mitochondria also live inside the blood stream. It is unclear what purpose they serve, if any.
  6. How frequently a couple kisses is a strong predictor of relationship satisfaction.
  7. Researchers have transplanted stem cells as “cell sheets” on the heart of human patients. The hope is that these sheets contribute to repair the heart of the patients.
  8. Most of us at some point suffer from atherosclerosis, the accumulation of plaques in our arteries. Researchers developed a nanotherapy that helps reduce atherosclerosis in mice.

Cost of a thread in C++ under Linux

Almost all our computers are made of several processing cores. Thus it can be efficient to “parallelize” expensive processing in a multicore manner. That is, instead of using a single core to do all of the work, you divide the work among multiple cores. A standard way to approach this problem is to create threads.

A C++ thread object executes some functions, possibly on a thread created by the operating system, and goes away. If you wanted to increment a counter using a C++ thread, you could do it in this manner:

auto mythread = std::thread([] { counter++; });
mythread.join();

It is obviously silly code that you should never use as is, it is a mere illustration. Creating a new thread is not free. Exactly how expensive it might be depends on several parameters. But can we get some rough idea?

For this purpose, I wrote a small benchmark where I just create a thread, increment a counter and let the thread die. It is the time elapsed while waiting for the thread to run its course. My program computes the mean and standard error of the time, as well as the minimum and maximum duration of the test. For simplicity, I am just going to report the means:

system time per thread
Ampere server (Linux, ARM) 200,000 ns
Skylake server (Linux, Intel) 9,000 ns
Rome server (Linux, AMD) 20,000 ns

I am deliberately not going into the details of the compiler, system library, operating system, RAM and all that fun stuff. You should not look at my table and make far reaching conclusions.

What is clear, however, is that creating a thread may cost thousands of CPU cycles. If you have a cheap function that requires only hundreds of cycles, it is almost surely wasteful to create a thread to execute it. The overhead alone is going to set you back.

There are clever ways to amortize the cost of a thread. For example, you may avoid the constant creation of new threads as in my benchmark. Yet to amortize, you still need to have enough work to make it worthwhile.

Science and Technology links (January 25th 2020)

  1. Scientists found a way to increase the production of new neurons in the brains of mice, effectively rejuvenating the brains of old mice. (Source: Nature)
  2. How many people died during accidents at nuclear power plants? In North America, the most famous accident happened at Three Mile Island when a reactor overheated, shut down, and small amount of harmless radioactive gas was released. Nobody was harmed.
    In the recent Fukushima accident in Japan, about 50 people died directly as a result of the event. Many more people died following the event, but they did not die due to radiations or exposure to radioactive material. Instead, they mostly died due to the evacuation process, and most of the deaths where elderly or sick people from whom the evacuation was traumatic. It is estimated that infants in the region will see their cancer risk raised by 1% over their lifetime. In comparison the nuclear accident in Chernobyl was more tragic: thousands of people probably died as a result. Should you conclude that nuclear energy is risky? In the seventies, a dam collapsed and killed an estimated 230,000 people. Thus hydro-electric power is far more deadly than nuclear power, when just accounting for this one incident. And, of course, coal power plants a far more deadly than either hydro or nuclear. So we are irrational with respect to our fear of nuclear power. Speaking for myself, I would gladly live and work in a nuclear power plant, though maybe not one designed by the USSR. Furthermore, we now have the technology to build small nuclear reactors that would be orders of magnitude safer, if only we allow it.
  3. Academic researchers continue to fail to report the results of clinical trials in the USA, breaking the law in the process. This can be verified since clinical trials are public events, posted on the Web with the name of the researchers. It would be trivial to identify by name the academic researchers who fail to report the results of their clinical trial. These same clinical trials are typically funded by the government. The lack of compliance has been reported for years.
  4. Japan has 70,000 centenarians.
  5. Topical application of a drug commonly used as an immunodepressor (Rapamycin) appears to rejuvenate the skin of older people.
  6. Researchers were able to regenerate the optical nerve of mice after crushing it. It is believed to be an indication that we could reverse age-related vision loss due to glaucoma or other diseases.
  7. Exercise keeps you healthier, but how the benefits work is a bit unclear. Researchers found that by removing genetically some family of proteins (Sestrins) from flies and mice, they were able to negate the benefits of exercise. In contrast, increasing the production of Sestrins appear to emulate the benefits of exercise. (Source: Nature)
  8. Researchers claim to have found a new form of immunotherapy that would be effective against all cancers. However, it is an early stage and we are not even considering clinical trials at this time.

Filling large arrays with zeroes quickly in C++

Travis Downs reports that some C++ compilers have trouble filling up arrays with values at high speed. Typically, to fill an array with some value, C++ programmers invoke the std::fill. We might assume that for a task as simple as filling an array with zeroes, the C++ standard library would provide the absolute best performance. However, with GNU GCC compilers, that is not the case.

The following line of C++ code gets compiled as a loop that fills each individual byte with zeroes when applying the conventional -O2 optimization flag.

std::fill(p, p + n, 0);

When the array is large, it can become inefficient compared to a highly efficient implementation like the C function like memset.

memset(p, 0, n);

Travis reports that the difference is a factor of thirty. He also explains how to fix the C++ so that it is as fast as the memset function.

What about very large arrays? What if you need to write zeroes all over a 2GB array?

memset 30 GB/s
std::fill 1.7 GB/s

Roughly speaking, the memset function is 15 times faster than std::fill in my test. This demonstrates that even if you are working with large volumes of data, you can be easily limited by your inefficient implementations. You are not automatically limited by your memory bandwidth. In fact, std::fill in this instance cannot even keep up with a good network adaptor or a fast disk. We routinely parse JSON data, and write out DOM trees, at speeds far higher than 1.7 GB/s.

Another lesson I would derive from this example is that writing your code in idiomatic C++ is not sufficient to guarantee good performance. There is a lot of engineering involved in making C++ fast, and though it often works out magically, it can fail too.

My source code is available.

Allocating large blocks of memory: bare-metal C++ speeds

In a previous post, I benchmarked the allocation of large blocks of memory using idiomatic C++. I got a depressing result: the speed could be lower than 2 GB/s. For comparison, the disk in my laptop has greater bandwidth.

Methodologically, I benchmarked the “new” operator in C++ with initialization, using the GNU GCC compiler with the -O2 optimization flag1.

char *buf = new char[s]();

It turns out that you can do better while sticking with C++. We cannot simply call the new operator without initialization because, in general, it does not result in the memory being actually allocated. However, we can allocate the memory and then make sure that we touch every “page” of memory. On modern Intel systems, pages are effectively always at least as large of 4kB, so we can touch the memory every 4kB. We might call this approach “new and touch”.

char * buf = new char[size];
for (size_t i = 0; i < size; i += 4096) buf[i] = 0;
buf[size - 1] = 0;

Such a new-and-touch strategy should be close to “bare-metal” memory allocation speeds. So how fast is it? It depends on the page size. By default, most systems rely on small (4kB) pages. Allocating many small pages is expensive. Thankfully, Linux can be configured to use huge pages, transparently, via a feature called “transparent huge pages”. And it makes a huge difference!

Allocating 512MB Setting 512MB to zero
regular pages (4kB) 3.9 GB/s 30 GB/s
transparent huge pages 20 GB/s 30 GB/s

I am using a recent Linux system (Ubuntu 16.04), a Skylake processor and GNU GCC 8.3 with the -O2 optimization flag. My source code is available.

It is still the case that allocating memory on most systems is a non-trivial cost since they rely on small 4kB pages. There are fast disks available on the market that have more than 4 GB/s of bandwidth.

Credit: Thanks to Travis Downs and others for their insights and comments.

 

1 Downs found that we get far better performance out of the new operator with initialization under GNU GCC with the more agressive -O3 optimization flag. Performance-wise, it should be close to the “new and touch” approach that I am describing.

How fast can you allocate a large block of memory in C++?

In C++, the most basic memory allocation code is just a call to the new operator:

char *buf = new char[s];

According to a textbook interpretation, we just allocated s bytes1.

If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But that is because we are cheating: the call to the new operation “virtually” allocates the memory, but you may not yet have actual memory that you can use. As you access the memory buffer, the system may then decide to allocate the memory pages (often in blocks of 4096 bytes). Thus the cost of memory allocation can be hidden. The great thing with a virtual allocation is that if you never access the memory, you may never pay a price for it.

If you actually want to measure the memory allocation in C++, then you need to ask the system to give you s bytes of allocated and initialized memory. You can achieve the desired result in C++ by adding parentheses after the call to the new operator:

char *buf = new char[s]();

Then the operating system actually needs to allocate and initialize memory2. It may still cheat in the sense that it may recycle existing blocks of memories or otherwise delay allocation. And I expect that it might do so routinely if the value of s is small. But it gets harder for the system to cheat as s grows larger.

What happens if you allocate hundreds of megabytes in such a manner? The answer depends on the size of the pages. By default, your system probably uses small (4kB) pages. Under Linux, you can enable “transparent huge pages” which dynamically switches to large pages when large blocks of memory are needed. Using larger pages means having to allocate and access fewer pages, so it tends to be cheaper.

In both instances, I get around a couple of gigabytes per second on a recent Linux system (Ubuntu 16.04) running a conventional Skylake processor. For comparison, you can set memory to zero at tens gigabytes per second and my disk can feed data to the system at more than 2 GB/s. Thus, at least on the system I am currently using, memory allocation is not cheap. My code is available, I use GNU GCC 8.3 with the -O2 optimization flag.

Allocating 512MB Setting 512MB to zero
regular pages (4kB) 1.6 GB/s 30 GB/s
transparent huge pages 2.4 GB/s 30 GB/s

You can do better with different C++ code, see my follow-up post Allocating large blocks of memory: bare-metal C++ speeds.

Further remarks. Of course, you can reuse the allocated memory for greater speeds. The memory allocator in my standard library could possibly do this already when I call the new operator followed by the delete operator in a loop. However, you still need to allocate the memory at some point, if only at the beginning of your program. If you program needs to allocate 32 GB of memory, and you can only do so at 1.4 GB/s, then your program will need to spend 23 seconds on memory allocation alone.

Footnotes:

  1. Several readers have asked why I am ignoring C functions like calloc, malloc and mmap. The reason is simple: this post is focusing on idiomatic C++.
  2. You might wonder why the memory needs to be initialized. The difficulty has to do with security. Though it is certainly possible for the operating system to give you direct access to memory used by another process, it cannot also pass along the data. Thus it needs to erase the data. To my knowledge, most systems achieve this result by zeroing the new memory.

Science and Technology links (January 11th 2020)

  1. The extra wealth that ones acquires by attending college is now estimated to be indistinguishable from zero. The authors control for the parent’s education and the individual’s financial acumen.
  2. A drug approved for use in human beings (Rapamycin) appears to rejuvenate dental health in mice.
  3. Low-fat milk increases the obesity risk of children when compared with high-fat milk.
  4. It appears that type 2 diabetes might caused by fat. Keeping your abdominal fat in check might be the best way to prevent diabetes (speculative?).

How I teach database design

Most software runs on top of databases. These databases are organized logically, with a schema, that is a formal description. You have entities (your user), attributes (your user’s name) and relationships between them.

Typical textbook database design comes from an era when it was possible to believe that you knew, at the start of the project, what the data would be like. And you could believe that it would not change. You could certainly believe that your 200-page design document would be updated and maintained, and it would be compulsory reading for computer folks in your company for decades. When building an accounting application, you would query the databases using SQL. All of the data was accessed through the databases. The database was the interface. And the same set of centralized databases running on a single computer would serve the entire operation. And everyone would know what the field “name” meant, exactly, in the table “user”. There would not be mergers between companies, or corporate pivots taking you into a new direction. You could believe that your conceptual and logical principles were independent from physical implementation principles. There would be no need to ever duplicate data for performance.

Quite often, it is not how systems are built today, if they ever were built this way. You organize functions in your organization in terms of services. And the services communicate between themselves. People who do not have to work directly on your service do not need to know how the data is organized or even if the organization is publicly documented at all. You have to provide them an interface, and that is all that they should need. On your end, you cannot rely on the database engine alone to ensure that the data remains usable. “But my data follows a strict schema” is not an excuse for failure. Your users do not want to know about your data layout, they do not want to have to know.

When organizing your data, you will probably get it wrong. If not today, then next week or next year. Life is fast changing and your knowledge is always incomplete. The waterfall model is naive to the point of being actively harmful. So you want to isolate the data layout from its use as much as you can.

In textbooks, the core principle of database design is often “do not duplicate data”. The logic is sound: if you duplicate data, at some point these values may go out of sync and what do you do then? If I see a programmer duplicating data all over his code, I know he lacks experience. However, it is also the case that deliberate duplication is essential. Sometimes you need duplication for performance and performance matters. Sometimes you need duplication to avoid unwieldy complexity, and managing complexity matters. And you need duplication because your information system runs on more than one processor.

You want to store everything in one giant system so that there is no duplication and everything is consistent? This thinking does not scale. It does not scale with respect to complexity or performance. And it does not even achieve consistency in the real world.

The data you will be receiving and maintaining is not pristine. In fact, it is full of errors and unknowns. Marketing thinks that John Smith’s phone number if something whereas accounting thinks that it is something else? You know what? Maybe they are both right or maybe they are both wrong. And it is worse than you think because there are unknown unknowns: nothing in your database can be regarded without suspicion. And the minute you have incomplete or uncertain data, your dream of enforcing the relationships between entities may just become naive.

Does that mean that anything goes and that you don’t need to worry about database design? Quite the opposite: you ought to be agonizing over every decision and making sure that you never locking yourself into a corner. Because you will have duplication and multiple systems, you will have inconsistency and you will need to deal with them, and a single strategy won’t work for all cases.

So how do I teach database design? I ask students to be critical about the textbook material, and I expose them as much as possible to real-world example where things are not perfectly neat. Begin by looking under the hood of your favorite open-source project.

Further reading: On Desirable Semantics of Functional Dependencies over Databases with Incomplete Information, Fundamenta Informaticae 158 (2018); Functional dependencies with null markers, Computer Journal 58 (2015); A Call to Arms: Revisiting Database Design, SIGMOD Record 40 (2011).

My Science and Technology review for 2019

I like to end every year with my selection of the most significant science and technology events.

  1. In 2019, you could buy a computer from Apple with 1.5 terabytes of memory. And by memory, I mean fast internal memory (RAM). Of course, it would cost tens of thousands of dollars. You and I are unlikely to have that much memory on our computers in the near future. Yet we are all benefiting from faster disks, making the distinction between disk and memory harder. (Note that it has been possible to buy computers with a terabyte of memory and more from other vendors, but not typically in a mainstream desktop form.)
  2. A team lead by Conboy showed that we could rejuvenate multiple organs in old mice merely by giving them Alk5i and oxytocin. Quoting from the paper:

    Alk5i plus OT quickly and robustly enhanced neurogenesis, reduced neuro-inflammation, improved cognitive performance, and rejuvenated livers and muscle in old mice.

    You can purchase oxytocin on the Internet and many people take it already to improve their sleep.

  3. We can tell how old someone is by looking at “epigenetic markers”. That is, from one of your cells, we can look at how your DNA is arranged and expressed, and it tells us your age. In some sense, it defines your “biological age” as opposed to your “chronological age”. As far as we could tell, there was no way to intervene and reverse this age. However, in a study reported in Nature, researchers found such age reversal in a small clinical trial. This age reversal came about accidentally when trying to regrow the “thymus” in older people. The thymus naturally disappears with age, leaving older people with weakened immune systems. The small clinical trial also showed that the thymus was regrown, something that is unexpected.
  4. Apple released a smart watch with an integrated ECG, to monitor your heart conditions.
  5. As we age, we accumulate “senescent cells”. These are old cells that create trouble and refuse to die. We know now to wipe them out in mice, using Dasatinib and Quercetin. It turns out that the exact same approach works in human beings. At least in mice, clearing senescent cells lead to all sorts of health improvements.
  6. A drone attack wiped out 50% of Saudi Arabia’s oil supply for several days. This is a huge impact given that Saudi Arabia provides about 10% of the oil supply worldwide. It is a testimony of how good drone technology has become.
  7. Surgeons are putting patients in suspended animation (they stopped their biological functions) during surgery, replacing their blood with a cold saline solution.

Science and Technology links (December 21st 2019)

  1. The number of research papers with more than 1000 authors is increasingly quickly and reaching many fields.
  2. Researchers at Facebook use neural networks to solve fancy algebraic problems that may be out of reach of computer algebra systems.
  3. At least on the short term, wind turbines may contribute more to global warming than equivalent fossil fuel technologies. The warming effect has been empirically measured.
  4. The Bengal tiger population is increasing quickly in India.