Computer scientists need to learn about significant digits

I probably spend too much time reviewing research papers. It makes me cranky.

Nevertheless, one thing that has become absolutely clear to me is that computer scientists do not know about significant digits.

When you write that the test took 304.03 s, you are telling me that the 0.03 s is somehow significant (otherwise, why tell me about it?). Yet it is almost certainly insignificant.

In computer science, you should almost never use more than two significant digits. So 304.03 s is indistinguishable from 300 s. And 33.14 MB is the same thing as 33 MB.

Why does it matter?

  • Cutting down numbers to their significant digits simplifies the exposition. It is simpler to say that it took 300 s than to say that it took 304.03 s.
  • Numbers expressed without significant digits often lie. Running your program does not take 304.03 s. Maybe it did this one time, but if you run it again, you will get a different number.

Please learn to express your experimental results using as few digits as you can.

Tweet about this on Twitter0Share on Facebook0Share on Google+5Share on Reddit0Email this to someone

16 thoughts on “Computer scientists need to learn about significant digits”

  1. Of course, it is 56.137% harder to make up.

    Ok, now back to the serious things…

    Going one step further, I would suggest replacing the numbers with charts whenever possible.

  2. Saying the program’s runtime is 300 seconds rather than 304.03 seconds is only a slight improvement. Much better would be to say “the mean of k runs was x seconds with variance y”, for example.

  3. Yes, but: much of computer science is not a natural science, or anything much like one; it’s a branch of mathematics. How many significant digits does π have? If some algorithm, on some input, takes (say) 78234 tree-rebalancing operations, then that’s the number it takes. Not plus or minus anything. There’s no measurement error, there’s no experimental error. Should Vassilevska Williams state that her matrix multiplication algorithm has an asymptotic cost of O(n^ two and a bit) ?

    Where there are sources of error or variation, for instance in time and space measurements of running systems, particularly of multi-processing environments, particularly of systems connected to instruments or other external interfaces such as UI devices, then I quite agree, the error and variation should be quantified and numbers given to appropriate numbers of significant digits, and often compsci papers fail to do this well enough.

    Having said that, the sources of variation may often be controllable, and with care the resulting precision may be greater than many physical scientists could normally achieve. I have personally worked on real running systems which have space measurements reproducible to six or more decimal places, and time measurements reproducible to five or more. If I have that many significant digits, should I state them? My habit was generally to give the full precision in tables but to truncate in running text, for rhetorical purposes.

  4. Sometimes when we say “33.14 MB” the purpose is not to answer “is this significantly different from “30 MB” but rather (or also) is this identical to the other file over there. To test identity, all digits are significant.

  5. 33.14 might be significant because it can help determine whether a source of error was the result of overflow or other oddities. Also there is no error, it’s not like chemistry where we don’t know It’s more like math where we objectively know. It would be like deriding mathematicians for not following the rules on significant digits.

  6. I totally agree…

    Sometimes one can keep digits just out of lazyness, since they are the ouput of a program, copied and pasted in the paper.

  7. I don’t think you know what you are talking about.

    A significant digit is a digit that you actually measured. If you have in fact measured every byte of a file (And you should be able to.) Then you can report the size of the file to the nearest byte, regardless of if it is a kilobyte file, a megabyte file, a gigabyte file or a terabyte file.

    Significant figures come into play when you have precision that your instruments cannot actually measure to. Say you are timing a process and your clock is accurate to the nearest second (Like with a UNIX timestamp or something). If this is true. giving a mean with any numbers after the decimal point is inaccurate.

  8. I was taught about significant figures in Grade 10, circa mid-1970s. The rules are about 90% well defined and objective, and about 10% less well defined and subjective.

    Context also matters. For example, if the project requires that a software module must execute start to finish in not more than 304.00 seconds, then 304.03 seconds is probably a fail. But ideally one would measure it to a precision at least six to ten times the margin. Most times excess significant figures is nonsense, not context.

    “100 km/h” speed limit might actually be ±10 km/h. Sign should read “100.0 km/h”. LOL.

  9. @Wells

    The number of significant digits you report is bounded by what you actually measured, but scientists typically report fewer digits, for reasons such as the ones I report.

  10. Logically, using an in-significant digit as if it were significant is “the fallacy of misplaced precision”.

    Context is everything. If I ask you how many miles per gallon you get with your new hybrid car and you say “50.52841926 mpg”, every digit after “50.” is likely to be insignificant. The insignificant digits are not meaningfully informative.

  11. I made a long-term enemy by publically telling a CalTech PHD that he could not publish a measured performance number to eight significant digits (when two digits was dubious). Also the zero-intercept on the graph could not be at (0,0). First-year Physics students are meant to learn the basic stuff, but apparently not a CalTech PHD.

Leave a Reply

Your email address will not be published. Required fields are marked *