In C as well as many other programming languages, we have 32-bit and 64-bit floating-point numbers. They are often referred to as float and double. Most of systems today follow the IEEE 754 standard which means that you can get consistent results across programming languages and operating systems. Hence, it does not matter very much if you implement your software in C++ under Linux whereas someone else implements it in C# under Windows: if you both have recent systems, you can expect identical numerical outcomes when doing basic arithmetic and square-root operations.
When you are reading these numbers from a string, there are distinct functions. In C, you have strtof and strtod. One parses a string to a float and the other function parses it to a double.
At a glance, it seems redundant. Why not just parse your string to a double value and cast it back to a float, if needed?
Of course, that would be slightly more expensive. But, importantly, it is also gives incorrect results in the sense that it is not equivalent to parsing directly to a float. In other words, these functions are not equivalent:
float parse1(const char * c) { char * end; return strtod(c, &end); } float parse2(const char * c) { char * end; return strtof(c, &end); }
It is intuitive that if I first parse the number as a float and then cast it back to a double, I will have lost information in the process. Indeed, if I start with the string “3.14159265358979323846264338327950”, parsed as a float (32-bit), I get 3.1415927410125732421875. If I parse it as a double (32-bit), I get the more accurate result 3.141592653589793115997963468544185161590576171875. The difference is not so small, about 9e-08.
In the other direction, first parsing to a double and then casting back to a float, I can also lose information, although only a little bit due to the double rounding effect. To illustrate, suppose that I have the number 1.48 and that I round it in one go to the nearest integer: I get 1. If I round it first to a single decimal (1.5) and then to the nearest integer, I might get 2 using the usually rounding conventions (either round up, or round to even). Rounding twice is lossy and not equivalent to a single rounding operation. Importantly, you lose a bit of precision in the sense that you may not get back the closest value.
With floating-point numbers, I get this effect with the string “0.004221370676532388” (for example). You probably cannot tell unless you are a machine, but parsing directly to a float is 2e-7 % more accurate.
In most applications, such a small loss of accuracy is not relevant. However, if you ever find yourself having to compare results with another program, you may get inconsistent results. It can make debugging more difficult.
Further reading: Floating-Point Determinism (part of a series)
The conversion can be handy though if you want to output short strings in decimal format though.
For instance when writing an optimizer for SVG path strings you might want to change absolute positions into relative positions and vice versa.
You could be at the absolute position 0.1 and want to add the relative position 0.2. With doubles you will end up at 0.30000000000000004, as 0.1 and 0.2 cannot be exactly represented in binary floating point.
If you parse as double and do all calculations in double precision your error will be less than the precision of single precision. If you convert to single just before formatting, the formatting routine will look for the shortest decimal number within the precision of single precision.
You usually end up with the same result as with lossless decimal math.
It’s important to remember that this trick relies on your binary floating point numbers just being storage for decimal floating point numbers with low precision.