Science and Technology links (February 16th, 2019)

    1. In their new book Empty Planet, Bricker and Ibbitson argue that within the next 30 years, Earth’s population will start to rapidly decline. They believe that official population predictions overestimate future populations because they fail to fully take into account accelerating cultural changes.
    2. It is believed that senescent cells are a major driver of age-related conditions. Senescent cells often occur when cells are the results of too many divisions (beyond the Hayflick limit). Our hearts age, but their cells do not divide very much. That is a problem because our hearts have a limited ability to repair themselves (by creating new cells) but this should protect our hearts from senescent cells… Yet Anderson et al. found that there are senescent cells in the heart: basically cells can become senescent due to damage. What is more exciting is that they found that by clearing these cells in old mice, they could effectively rejuvenate their hearts. Furthermore, there is a growing number of therapies for removing senescent cells. Furthermore, there are ongoing (early) clinical trials to measure the effect of removing senescent cells in human beings. Initial results are encouraging:

      The doctors found that nine doses of the two pills over three weeks did seem to improve patients’ ability to walk a bit farther in the same amount of time, and several other measures of well-being.

      More trials will start this year.

    3. Goldacre et al. looked at how well the most prestigious journals handle the agreed upon set of standards for reporting scientific trials:

      All five journals were listed as endorsing CONSORT, but all exhibited extensive breaches of this guidance, and most rejected correction letters documenting shortcomings. Readers are likely to be misled by this discrepancy.

      (Source: A. Badia)

    4. A new drug appears to reverse age-related memory loss, in mice.

My iPad Pro experiment: almost two years later

Soon after the first iPad Pro came out, I bought one and started using it daily. The experiment is ongoing and I thought it was time to reflect upon it further.

Before I begin, I need to clarify my goals. When I started this experiment, some people objected that I could get a hybrid (laptop/tablet). That is definitively true, and it would be more “practical”. However, it would not be much of an experiment. I am deliberately trying to push the envelope, to do something that few do. So I am not trying to be “practical”.

And, indeed, using an iPad Pro for work is still an oddity. Relying solely on an iPad Pro for serious work is even rarer. I am currently in Ottawa reviewing grant applications. There are a few dozens computer-science researchers (mostly professors) around me. The only other person with an iPad is a Facebook researcher, and he seems to be using the iPad only for reading applications, otherwise he appears to be using a laptop.

In my department, other faculty members have iPad Pros, but I think only one of my colleagues use it seriously. Other colleagues do not appear to use these tablets for work when they have them. I am not sure.

  1. The main impact on my work at relying mostly on a tablet is that I am always focusing on one or two applications at a time. I recall finding it really cool, back in the days, when a Unix box would allow me to have 50 windows open at a time. I think having many windows open is akin to have many different physical files opened on your desk. It is distracting. For example, on a laptop, I would write this blog post while having an email window open, probably a couple of text editors with code. Yes, you can work in full screen mode with a laptop, and I try to do it, but I tend to unavoidably revert back to the having dozens of applications on my screen. Laptops just make it too convenient to do multiple things at once. If you need to concentrate on one thing for a long time, you really want to have just one clean window, and a tablet is great at that. On this note, it is also why I prefer to program in a text editor that has as few distractions as possible. I can write code just fine in Eclipse or Visual Studio, and for some tasks it is really the best setup, but it often leaves me distracted compared to when I work with single full-screen editor with just one file opened.
  2. Though I could not prove it, I feel that using a tablet makes me a better “reader”. Much of my work as a university professor and researcher involves reading and commenting on what other people are doing. The fact that I am entice to concentrate on one document, one task, at a time forces me to be more thorough, I think.
  3. As far as I can tell, programming seriously on a tablet like an IPad Pro is still not practical. However, there are decent ssh clients (I use Shelly) so that if you master Unix tools like vim, emacs, make, and the like, you can get some work done.
  4. I’d really want to push the experiment to the point where I no longer use a keyboard. That’s not possible at this time. I like the keyboard that Apple sells for the iPad Pro 2018. There is a major upside: the keyboard is entirely covered so it is not going to stop working because you spilled some coffee on it.
  5. Generally, most web applications work on a tablet, as you would expect. However, it is quite obvious that some of them were not properly tested. For example, I write research papers using a tool called Overleaf. However, I cannot make the shortcuts work. At the same time, it is really surprising how few problems I have. I think that the most common issues could be quickly fixed if web developers did a bit more testing on mobile devices. Evidently the fact that developers rely on laptops and desktops explains why things work better on laptops and desktops.
  6. At least on Apple’s ios, working with text is still unacceptably difficult at times. Pasting text without the accompanying formatting is a major challenge. Selecting large blocks of text is too hard.

My final point is that working with an iPad is more fun than working with a laptop. I cannot tell exact why that is. I’d be really interested in exploring this “fun” angle further. Maybe it is simply because it is different, but it is maybe not so simple. My smartphone is “fun” even if it is old and familiar.

Science and Technology links (February 9th, 2019)

  1. Though deep learning has proven remarkably capable in many tasks like image classification, it is possible that the problems they are solving remarquably well are just simpler than we think:

    At its core our work shows that [neural networks] use the many weak statistical regularities present in natural images for classification and don’t make the jump towards object-level integration of image parts like humans.

    This challenges the view that deep learning is going to bring us much closer to human-level intelligence in the near future.

  2. Though we age, it is unclear how our bodies keep track of the time (assuming they do). Researchers claim that our blood cells could act as time keepers. When you transplant organs from a donor, they typically behave according to the age of the recipient. However, blood cells are an exception: they keep the same age as the donor. What would happen if we were to replace all blood cells in your body with younger or older ones?
  3. A tenth of all coal is used to make steel. This suggests that it might be harder than people expect to close coal mines and do away with fossil fuels entirely in the short or medium term.
  4. Elite powerlifters have suprising low testosterone (male homone) levels. This puts a dent in the theory that strong men have high testosterone levels.
  5. Chimpanzees learn to crack nuts faster than human beings. This challenges the model that human beings are cognitively superior.
  6. It seems that the male brain ages more rapidly than the female brain.
  7. Grant argues that vitamin D supplements reduce cancer rates, but that medicine is slow to accept it.
  8. Women prefer more masculine looking men in richer countries. I do not have any intuition as to why this might be.
  9. Geographers claim that the arrival of Europeans to America, and the subsequent reduction of population (due mostly to diseases) lead to a global cooling of worldwide temperatures. It seems highly speculative to me that there was any measurable effect.
  10. The New York Times has a piece of a billionnaire called Brutoco who says that “he spends much of his time flying around the world lecturing on climate change” and lives in a gorgeous villa surrounded by a golf course. There is no talk of his personal carbon footprint.

Faster remainders when the divisor is a constant: beating compilers and libdivide

Not all instructions on modern processors cost the same. Additions and subtractions are cheaper than multiplications which are themselves cheaper than divisions. For this reason, compilers frequently replace division instructions by multiplications. Roughly speaking, it works in this manner. Suppose that you want to divide a variable n by a constant d. You have that n/d = n * (2N/d) / (2N). The division by a power of two (/ (2N)) can be implemented as a right shift if we are working with unsigned integers, which compiles to single instruction: that is possible because the underlying hardware uses a base 2. Thus if 2N/d has been precomputed, you can compute the division n/d as a multiplication and a shift. Of course, if d is not a power of two, 2N/d cannot be represented as an integer. Yet for N large enoughfootnote, we can approximate 2N/d by an integer and have the exact computation of the remainder for all possible n within a range. I believe that all optimizing C/C++ compilers know how to pull this trick and it is generally beneficial irrespective of the processor’s architecture.

The idea is not novel and goes back to at least 1973 (Jacobsohn). However, engineering matters because computer registers have finite number of bits, and multiplications can overflow. I believe that, historically, this was first introduced into a major compiler (the GNU GCC compiler) by Granlund and Montgomery (1994). While GNU GCC and the Go compiler still rely on the approach developed by Granlund and Montgomery, other compilers like LLVM’s clang use a slightly improved version described by Warren in his book Hacker’s Delight.

What if d is a constant, but not known to the compiler? Then you can use a library like libdivide. In some instances, libdivide can even be more efficient than compilers because it uses an approach introduced by Robison (2005) where we not only use multiplications and shifts, but also an addition to avoid arithmetic overflows.

Can we do better? It turns out that in some instances, we can beat both the compilers and a library like libdivide.

Everything I have described so far has to do with the computation of the quotient (n/d) but quite often, we are looking for the remainder (noted n % d). How do compilers compute the remainder? They first compute the quotient n/d and then they multiply it by the divisor, and subtract all of that from the original value (using the identity n % d = n - (n/d) * d).

Can we take a more direct route? We can.

Let us go back to the intuitive formula n/d = n * (2N/d) / (2N). Notice how we compute the multiplication and then drop the least significant N bits? It turns out that if, instead, we keep these least significant bits, and multiply them by the divisor, we get the remainder, directly without first computing the quotient.

The intuition is as follows. To divide by four, you might choose to multiply by 0.25 instead. Take 5 * 0.25, you get 1.25. The integer part (1) gives you the quotient, and the decimal part (0.25) is indicative of the remainder: multiply 0.25 by 4 and you get 1, which is the remainder. Not only is this more direct and potential useful in itself, it also gives us a way to check quickly whether the remainder is zero. That is, it gives us a way to check that we have an integer that is divisible by another: do x * 0.25, the decimal part is less than 0.25 if and only if x is a multiple of 4.

This approach was known to Jacobsohn in 1973, but as far as I can tell, he did not derive the mathematics. Vowels in 1994 worked it out for the case where the divisor is 10, but (to my knowledge), nobody worked out the general case. It has now been worked out in a paper to appear in Software: Practice and Experience called Faster Remainder by Direct Computation.

In concrete terms, here is the C code to compute the remainder of the division by some fixed divisor d:

uint32_t d = ...;// your divisor > 0

uint64_t c = UINT64_C(0xFFFFFFFFFFFFFFFF) / d + 1;

// fastmod computes (n mod d) given precomputed c
uint32_t fastmod(uint32_t n ) {
  uint64_t lowbits = c * n;
  return ((__uint128_t)lowbits * d) >> 64; 
}

The divisibility test is similar…

uint64_t c = 1 + UINT64_C(0xffffffffffffffff) / d;


// given precomputed c, checks whether n % d == 0
bool is_divisible(uint32_t n) {
  return n * c <= c - 1; 
}

To test it out, we did many things, but in one particular tests, we used a hashing function that depends on the computation of the remainder. We vary the divisor and compute many random values. In one instance, we make sure that the compiler cannot assume that the divisor is known (so that the division instruction is used), in another case we let the compiler do its work, and finally we plug in our function. On a recent Intel processor (Skylake), we beat state-of-the-art compilers (e.g., LLVM’s clang, GNU GCC).

The computation of the remainder is nice, but I really like better the divisibility test. Compilers generally don’t optimize divisibility tests very well. A line of code like (n % d) = 0 is typically compiled to the computation of the remainder ((n % d)) and a test to see whether it is zero. Granlund and Montgomery have a better approach if d is known ahead of time and it involves computing the inverse of an odd integer using Newton’s method. Our approach is simpler and faster (on all tested platforms) in our tests. It is a multiplication by a constant followed by a comparison of the result with said constant: it does not get much cheaper than that. It seems that compilers could easily apply such an approach.

We packaged the functions as part of a header-only library which works with all major C/C++ compilers (GNU GCC, LLVM’s clang, Visual Studio). We also published our benchmarks for research purposes.

I feel that the paper is short and to the point. There is some mathematics, but we worked hard so that it is as easy to understand as possible. And don’t skip the introduction! It tells a nice story.

The paper contains carefully crafted benchmarks, but I came up with a fun one for this blog post which I call “fizzbuzz”. Let us go through all integers in sequence and count how many are divisible by 3 and how many are divisible by 5. There are far more efficient ways to do that, but here is the programming 101 approach in C:

  for (uint32_t i = 0; i < N; i++) {
    if ((i % 3) == 0)
      count3 += 1;
    if ((i % 5) == 0)
      count5 += 1;
  }

Here is the version with our approach:

static inline bool is_divisible(uint32_t n, uint64_t M) {
  return n * M <= M - 1;
}

...


  uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1;
  uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1;
  for (uint32_t i = 0; i < N; i++) {
    if (is_divisible(i, M3))
      count3 += 1;
    if (is_divisible(i, M5))
      count5 += 1;
  }

Here is the number of CPU cycles spent on each integer checked (average):

Compiler 4.5 cycles per integer
Fast approach 1.9 cycles per integer

I make my benchmarking code available. For this test, I am using an Intel (skylake) processing and GCC 8.1.

Your results will vary. Our proposed approach may not always be faster. However, we can claim that some of time, it is advantageous.

Update: There is a Go library implementing this technique.

Further reading: Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries, Software: Practice and Experience (to appear)

Footnote: What is N? If both the numerator n and the divisor d are 32-bit unsigned integers, then you can pick N=64. This is not the smallest possible value. The smallest possible value is given by Algorithm 2 in our paper and it involves a bit of mathematics (note: the notation in my blog post differs from the paper, N becomes F).

Science and Technology links (February 3rd, 2019)

  1. A Canadian startup built around electric taxis failed. One of their core findings is that electric cars must be recharged several times a day, especially during the winter months. Evidently, the need to constantly recharge the cars increases the costs. I think that this need to constantly recharge cars challenges the view that once we have electric self-driving cars, we can just send our cars roaming the streets, looking for new passengers, at least in the cold of winter in Canada.
  2. If you are going to train computers to emulate medical doctors, you need objective assessments of medical conditions. That may prove more difficult than it sounds. Human experts rarely agree on medical diagnosis and therapies. How can you assess an artificial intelligence in these conditions? Somewhat ironically, the first step might be to learn to assess better the medical doctors themselves. This may not prove popular.
  3. Researchers are making progress toward reconstructing speech from neural patterns (Nature paper). The accuracy is still low but it is getting credible. One day, we may be able to speak through a brain implant.
  4. The US spy agency (NSA) is the largest single employer of mathematicians in the world?
  5. Neanderthals are often believed to have vanished because they could not hunt as efficiently as homo sapiens: they needed to get close to the prey to kill it unlike us. However, it seems that neanderthals could throw their spears far away.
  6. A highly cited Canadian medical researcher (Sophie Jamal) has been banned from getting further research funding in Canada because of how she fabricated data. When caught, she put the blame on her assistant. She is cited about 1000 times a year and she is the author of about 50 research articles. She lost her job at the University of Toronto where she was a professor and her medical license. She was the research director of the Centre for Osteoporosis & Bone Health.
    As I am fond of saying, it is almost trivial in research to fabricate results. Thus, while it is hard to know for sure how frequently results are just made up, it is probably more frequent than most people expect. And before you object that work is peer reviewed: when reviewing a manuscript, you are not going to redo the work to check that it works. Even if you wanted to check the work, it is often impossible to do it in an economical fashion. That’s why I argue that we have to take into account the reputation of the authors when reviewing a science paper. If you have found someone’s results to consistently be reliable in the past, it is reasonable to give them more credibility in the future. Reputation matters.
  7. According to an article in the Guardian, aspirin prevents cancers. (To my knowledge, this has not been robustly demonstrated yet.)
  8. Party balloons were invented by scientist Micheal Faraday.

New Web host

Following my recent blogging problems, the majority advice I received was to move to a different host.

My blog is now hosted by SiteGround. Moving my content over was far easier than I anticipated, so far.

I am no longer using Cloudflare to cache everything so your queries should hit directly my blog engine. In particular, this means that comments should work properly (comment and then see your comment). I will still rely on Cloudflare for performance, but hopefully not to keep the blog alive.

You should also be able to comment on all posts, including older ones.

Please report any problems.Going straight to production without further testing would be insane if I were running a business, but this is a non-profit blog.

Several people offered to help out with the move. Given that it took me less than one hour, it made no sense to outsource that task. I spent the bulk of the time reverse engineering tiny settings. I also have several minor domains that I needed to move, and finding the content was the hard part.

However, if I have further performance problems, I will seek help (either paid or unpaid). Thanks everyone!

Web caching: what is the right time-to-live for cached pages?

I have been having performance problems with my blog and this forced me to spend time digging into the issue. Some friends of mine advocate that I should just “pay someone” and they are no doubt right that it would be the economical and strategic choice. Sadly, though I am eager to pay for experts, I am also a nerd and I like to find out why things fail.

My blog uses something called WordPress. It is a popular blog platform written in PHP. WordPress is fast enough for small sites, but the minute your site gets some serious traffic, it tends to struggle. To speed things up, I tried using a WordPress plugin called WP Super Cache. What the plugin does is to materialize pages as precomputed HTML. It should make sites super fast.

There is a caveat to such plugins: by the time you blog is under so much stress that PHP scripts can’t run without crashing, no plugin is likely to save you.

I also use an external service called Cloudflare. Cloudflare acts as a distinct cache, possibly serving pre-rendered pages to people worldwide. Cloudflare is what is keeping my blog alive right now.

After I reported that by default (without forceful rules) Cloudflare did very little caching, a Cloudflare engineer got in touch. He told me that my pages were served to Cloudflare with a time-to-live delay of 3 seconds. That is, my server instructs Cloudflare to throw away cached pages after three seconds.

I traced back the problem to what is called an htaccess file on my server:

<IfModule mod_expires.c>
  ExpiresActive On
  ExpiresByType text/html A3
</IfModule>

The mysterious “A3” means “expires after 3 seconds”.

How did this instruction get there? It gets written by WP Super Cache. I checked WP Super Cache.

I work on software performance, but I am not an expert on Web performance. However, this feels like a very short time-to-live.

I am puzzled by this decisions by the authors of WP Super Cache.

My blog can’t keep up: 500 errors all over

My blog is relatively minor enterprise. It is strictly non-profit (no ad). I have been posting one or two blog posts a week for about fifteen years. I have been using the same provider in all this time (csoft.net). They charge me about $50 a month. I also subscribe to Cloudflare services, which costs me some extra money.

I use wordpress. If I had to do things over, I would probably choose something else, but that is what I have today. I have thousands of posts, comments, pages, and lots of personalization: I’d rather not risk breaking or losing all this content.

I use php version 7.0. My host provides version 7.3, but 7.0 is the latest they support with something called mbstring. Without mbstring (whatever that is), my blog simply won’t run.

I estimate that I get somewhere between 30,000 and 50,000 unique visitors a month. Despite my efforts, my blog keeps on failing under the load. It becomes unavailable for hours.

I have given up on writing new blog post using the online editor. It is brittle. The old wordpress editor worked relatively well, but since upgrading wordpress, they have now pushed something called the Gutenberg editor. It tries to be clever, but half the time it just fails with a 500 error (meaning that the server just failed). So I use a client called MarsEdit. It seems to work well enough. (Update: after the blog stopped throwing 500 errors constantly, I was able to switch back to default editor which works better for me.)

Several times a week, someone emails me to report that they tried to leave a comment and they got a 500 error.

I seem to get about one spam comment per minute or so. I have just now decided to close comments on posts older than 30 days, in the hope that it will relax the load on my server.

My error logs are filled with “End of script output before headers” (a few every minutes).

I used to rely on WP Super Cache, hoping that it made things better, but I have since disabled most plugins. I am hoping Cloudflare can do the work. (Update: I have since re-enabled WP Super Cache, now that I have fewer php failures, in the hope that it might work. I do not think that it can do its work if your php scripts can’t run to completion.)

I had the following line in the .htaccess file at the root of my blog:

Header set Cache-Control "max-age=600, public"

The intention was that it would entice Cloudflare to cache everything. I do not think it worked.

Because my error logs showed that wp-cron.php was failing every few minutes, I added the following in my wp-config.php file:

define('DISABLE_WP_CRON', true);

I setup a separate cron job to call wp-cron.php every hour.

I now use Cloudflare with the following settings: “Caching Level: Ignore query string”, “Respect Existing Headers” and “Cache: Everything”. I pay for Argo, whatever that is, in the hope that it might improve things. With these settings, I would expect Cloudflare to cache pretty much everything. It apparently does not. My blog gets hammered. Cloudfare reports 45,000 uncached requests for the day, and most of them are in the last couple of hours. (Update: I managed to get cloudflare to cache everything by going to page rules, and setting cache everything, and waiting a few hours. I had to make sure my rules were applied correctly.)

I have asked my host provider (csoft.net) to give me more memory, but they seem unwilling to do it transparently. Though csoft.net is neither cheap nor particularly modern, they have been professional. I have purchased a service with SiteGround, as I am considering moving there because it seems more popular than csoft. I am not super excited about tuning my PHP/Wordpress setup, however. I fear that it is wrong-headed optimization.

What am I missing? How can I be in so much trouble in 2019 with such a relatively modest task?

Note 1: I am aware that there are centralized platform like Medium. This blog is an independent blog on purpose.

Note 2: Many people suggest that I move away from WordPress to something like static generation (e.g., Hugo). I am sympathetic to this point of view, but it is a much easier choice to make when you are starting out and don’t have thousands of articles to carry over.

Credit: I am grateful to Travis Downs and Nathan Kurz for an email exchange regarding my problems.

Update: My blog is now hosted with SiteGround.

What is the space overhead of Base64 encoding?

Many Internet formats from email (MIME) to the Web (HTML/CSS/JavaScript) are text-only. If you send an image or executable file by email, it often first gets encoded using base64. The trick behind base64 encoding is that we use 64 different ASCII characters including all letters, upper and lower case, and all numbers.

Not all non-textual documents are shared online using base64 encoding. However, it is quite common. Load up google.com or bing.com and look at the HTML source code: you will base64-encoded images. On my blog, I frequently embed figures using base64: it is convenient for me to have the blog post content be one blob of data.

Base64 is apparently wasteful because we use just 64 different values per byte, whereas a byte can represent 256 different characters. That is, we use bytes (which are 8-bit words) as 6-bit words. There is a waste of 2 bits for each 8 bits of transmission data. To send three bytes of information (3 times 8 is 24 bits), you need to use four bytes (4 times 6 is again 24 bits). Thus the base64 version of a file is 4/3 larger than it might be. So we use 33% more storage than we could.

That sounds bad. How can engineers tolerate such wasteful formats?

It is common for web servers to provide the content in compressed form. Compression partially offset the wasteful nature of base64.

To assess the effect of base64 encoding, I picked a set of images used in a recent research paper. There are different compression formats, but an old format is gzip. I encode the images using base64 and then I compress them with gzip. I report the number of bytes. I make the files are available.

File name Size Base64 size Base64 gzip size
bing.png 1355 1832 1444
googlelogo.png 2357 3186 2477
lena_color_512.jpg 105764 142876 108531
mandril_color.jpg 247222 333970 253868
peppers_color.jpg 9478 12807 9798

As you can see, the gzip sizes are within 5% of the original sizes. And for larger files, the difference is closer to 2.5%.

Thus you can safely use base64 on the Web without too much fear.

In some instances, base64 encoding might even improve performance, because it avoids the need for distinct server requests. In other instances, base64 can make things worse, since it tends to defeat browser and server caching. Privacy-wise, base64 encoding can have benefits since it hides the content you access in larger encrypted bundles.

Further reading. Faster Base64 Encoding and Decoding using AVX2 Instructions, ACM Transactions on the Web 12 (3), 2018. See also Collaborative Compression by Richard Startin.

Data scientists need to learn about significant digits

Suppose that you classify people on income or gender. Your boss asks you about the precision of your model. Which answer do you give? Whatever your software tells you (e.g., 87.14234%) or a number made of a small and fixed number of significant digits (e.g., 87%).

The latter is the right answer in almost all instances. And the difference matters:

  1. There is a general principle at play when communicating with human beings: you should give just the relevant information, nothing more. Most human beings are happy with a 1% error margin. There are, of course, exceptions. High-energy physicists might need the mass of a particle down to 6 significant digits. However, if you are doing data science or statistics, it is highly unlikely that people will care for more than two significant digits.
  2. Overly precise numbers are often misleading because your actual accuracy is much lower. Yes, you have 10,000 samples and properly classified 5,124 of them so your mathematical precision is 0.5124. But if you stop there, you show that you have not given much thought to your error margin. First of all, you are probably working out of a sample. If someone else redid your work, they might have a different sample. Even if one uses exactly the same algorithm you have been using, implementation matters. Small things like how your records are ordered can change results. Moreover, most software is not truly deterministic. Even if you were to run exactly the same software twice on the same data, you probably would not get the same answers. Software needs to break ties, and often does so arbitrarily or randomly. Some algorithms involve sampling or other randomization. Cross-validation is often randomized.

I am not advocating that you should go as far as reporting exact error margins for each and every measure you report. It gets cumbersome for both the reader and the author. It is also not the case that you should never use many significant digits. However, if you write a report or a research paper, and you report measures, like precision or timings, and you have not given any thought to significant digits, you are doing it wrong. You must choose the number of significant digits deliberately.

There are objections to my view:

  • “I have been using 6 significant digits for years and nobody ever objected.” That is true. There are entire communities that have never heard about the concept of significant digit. But that is not an excuse.
  • “It sounds more serious to offer more precision, this way people know that I did not make it up.” It may be true that some people are easily impressed by very precise answers, but serious people will not be so easily fooled, and non-specialists will be turned off by the excessive precision.