Many programming languages have two binary floating-point types: float (32-bit) and double (64-bit). It reflects the fact that most general-purpose processors supports both data types natively.
Often we need to convert between the two types. Both ARM and x64 processors can do in one inexpensive instructions. For example, ARM systems may use the fcvt instruction.
The details may differ, but most current processors can convert one number (from float to double, or from double to float) per CPU cycle. The latency is small (e.g., 3 or 4 cycles).
A typical processor might run at 3 GHz, thus we have 3 billion cycles per second. Thus we can convert 3 billion numbers per second. A 64-bit number uses 8 bytes, so it is a throughput of 24 gigabytes per second.
It is therefore unlikely that the type conversion can be a performance bottleneck, in general. If you would like to measure the speed on your own system: I have written a small C++ benchmark.
Hello.
Can you explain why you are creating a random number generator and don’t use it. But filling the array with increasing values?
It is dead code that I used at first but that I am no longer using. You don’t need the random generator because the conversion is not data dependent.
Hi Daniel – Why is the latency 3-4 cycles if one conversion takes one cycle? What is latency here? You’re using 1 cycle/conversion in your throughput estimates, so is the latency a one-time thing?
I did not write that the conversion took one cycle, I wrote that one number could be converted per cycle. Our processors are superscalar: they can execute many instructions at once.
In fact, the M1 processor in my Apple laptop can sustain 8 instructions per cycle.