Integers in programming languages have a valid range but arithmetic operations can result in values that exceed such ranges. For example, adding two large integers can result in an integer that cannot be represented in the integer type. We often refer to such error conditions as overflows.
In a programming languages like Swift, an overflow will result in the program aborting its execution. The rationale is that once an arithmetic operation has failed, everything else the program might be doing is suspect and you are better off aborting the program. Most other programming languages are not so cautious. For example, a Rust program compiled in release mode will not abort by default.
In C++, most compilers will simply ignore the overflow. However, popular compilers give you choices. When using GCC and clang, you can specify that integer overflows should result in a program crash (abort) using the -ftrapv flag.
I was curious about the performance implications so I wrote a small program that simply adds all of the values in a large array. The answer I sought turns out to depend critically on the choice of compiler:
GCC 9 | clang 9 | |
---|---|---|
no trapping | 0.17 ns/int | 0.11 ns/int |
trapping | 2.1 ns/int | 0.32 ns/int |
slowdown | 12 x | 3 x |
With no trapping, the clang compiler beats GCC (0.11 vs. 0.17) by a 50% margin but this should not preoccupy us too much: it is a single microbenchmark.
What is a lot more significant is that enabling overflow trapping in GCC incurs an order of magnitude slowdown. Though it is only one microbenchmark, the size of the result suggests that we should be concerned. Looking at the assembly, I find that the clang compiler generates sensible code on x64 processor, with simple jumps added when the overflow is detected. Meanwhile, GCC seems to call poorly optimized runtime library functions.
Overall this one test does establish that checking for overflows can be expensive.
Credit: This blog post was motivated by an email by Stefan Kanthak.
“Overall this one test does establish that checking for overflows can be expensive. ”
That’s a load-bearing “can” right here. So the gcc people chose not to optimize overflow checking to emit something sensible, therefore overflow checking is too slow, therefore we are stuck with the sad state of affairs.
In practice, we are likely to see tasks far more complex than adding an array of integers, so the actual overhead is likely to be in the single digit percents.
Rust chose to omit checks in release mode because a) it’s not memory unsafe to have overflows (though the results can also be dire depending on context), and they in my opinion correctly assumed that they need to reach perf parity with C and C++, or else they wouldn’t be seen as an alternative.
At least Rust does bounds-checking by default (which can often be optimized out by using iterators instead of indexing, or by asserting the length by taking a slice).
The other side of the coin is that the first step in fixing a performance problem is to document it. GCC could multiply its performance.
It is interesting to note that without -ftrapv clang uses SIMD instructions, while with -ftrapv clang uses a scalar addition plus jo instruction. Which probably causes most of the performance difference, the overflow check itself is not that expensive, but it prevents SIMD usage.
Your demo program computes the sum over an array, which allows for auto-vectorization. In programs where the compiler cannot vectorize computations the overhead is probably smaller. When I add -fno-vectorize to your demo program the runtime difference with and without -ftrapv is 43% on an AMD 1950X.
You write the check yourself with __builtin_add_overflow.
-fsanitize=undefined is a better way to trap integer overflow nowadays.
Did you check __builtin_add_overflow or similar functions?
https://gcc.godbolt.org/z/bsTxPv
GCC implements -frtapv like replacement ‘+’ to ‘addvsi3’ function https://github.com/gcc-mirror/gcc/blob/41d6b10e96a1de98e90a7c0378437c3255814b16/libgcc/libgcc2.c#L87
Clang implements it by using ‘setno’. Same as __builtin_add_overflow. https://gcc.godbolt.org/z/eqxYxc
That means clang crash program by ‘UD2’ instruction, but GCC call ‘abort’. Which behaviour is more desirable to you?
Better dare to take a look at the code of the addv?i3() functions shipped in libgcc.a: it’s outright HORRIBLE!
lea rax, [rdi, rsi]
test rsi, rsi
js .negative
cmp rdi, rax
jg .somewhere
...
.negative:
cmp rdi, rax
jl .elsewhere
....
JFTR: what LLVM ships in their compiler-rt is equally bad. It’s a real shame, for both of them!
It would be nice to be able to turn this on in specific variables, operations or blocks of code that are security sensitive, as well as adjust behavior based on what the variable does, including falling back on an alternate algorithm that won’t overflow in such cases.
The portable-snippets library has portable wrappers for many useful operations, including overflow-checking arithmetic. They use the relevant compiler intrinsics where possible.
https://github.com/nemequ/portable-snippets/tree/master/safe-math
For the real horror show perform multiplication instead of addition: on an AMD EPYC 7262, GCC shows a 14x slowdown, beaten by Clang with a whopping 537x slowdown!
This is just one of the many
which LLVM braggs about on their web pages.
You gotta love such blatant lies!