How expensive is integer-overflow trapping in C++?

Integers in programming languages have a valid range but arithmetic operations can result in values that exceed such ranges. For example, adding two large integers can result in an integer that cannot be represented in the integer type. We often refer to such error conditions as overflows.

In a programming languages like Swift, an overflow will result in the program aborting its execution. The rationale is that once an arithmetic operation has failed, everything else the program might be doing is suspect and you are better off aborting the program. Most other programming languages are not so cautious. For example, a Rust program compiled in release mode will not abort by default.

In C++, most compilers will simply ignore the overflow. However, popular compilers give you choices. When using GCC and clang, you can specify that integer overflows should result in a program crash (abort) using the -ftrapv flag.

I was curious about the performance implications so I wrote a small program that simply adds all of the values in a large array. The answer I sought turns out to depend critically on the choice of compiler:

GCC 9 clang 9
no trapping 0.17 ns/int 0.11 ns/int
trapping 2.1 ns/int 0.32 ns/int
slowdown 12 x 3 x

With no trapping, the clang compiler beats GCC (0.11 vs. 0.17) by a 50% margin but this should not preoccupy us too much: it is a single microbenchmark.

What is a lot more significant is that enabling overflow trapping in GCC incurs an order of magnitude slowdown. Though it is only one microbenchmark, the size of the result suggests that we should be concerned. Looking at the assembly, I find that the clang compiler generates sensible code on x64 processor, with simple jumps added when the overflow is detected. Meanwhile, GCC seems to call poorly optimized runtime library functions.

Overall this one test does establish that checking for overflows can be expensive.

Credit: This blog post was motivated by an email by Stefan Kanthak.

Daniel Lemire, "How expensive is integer-overflow trapping in C++?," in Daniel Lemire's blog, September 23, 2020.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

11 thoughts on “How expensive is integer-overflow trapping in C++?”

  1. “Overall this one test does establish that checking for overflows can be expensive. ”

    That’s a load-bearing “can” right here. So the gcc people chose not to optimize overflow checking to emit something sensible, therefore overflow checking is too slow, therefore we are stuck with the sad state of affairs.

    In practice, we are likely to see tasks far more complex than adding an array of integers, so the actual overhead is likely to be in the single digit percents.

    Rust chose to omit checks in release mode because a) it’s not memory unsafe to have overflows (though the results can also be dire depending on context), and they in my opinion correctly assumed that they need to reach perf parity with C and C++, or else they wouldn’t be seen as an alternative.

    At least Rust does bounds-checking by default (which can often be optimized out by using iterators instead of indexing, or by asserting the length by taking a slice).

    1. therefore overflow checking is too slow, therefore we are stuck with
      the sad state of affairs.

      The other side of the coin is that the first step in fixing a performance problem is to document it. GCC could multiply its performance.

  2. It is interesting to note that without -ftrapv clang uses SIMD instructions, while with -ftrapv clang uses a scalar addition plus jo instruction. Which probably causes most of the performance difference, the overflow check itself is not that expensive, but it prevents SIMD usage.

    Your demo program computes the sum over an array, which allows for auto-vectorization. In programs where the compiler cannot vectorize computations the overhead is probably smaller. When I add -fno-vectorize to your demo program the runtime difference with and without -ftrapv is 43% on an AMD 1950X.

    1. Better dare to take a look at the code of the addv?i3() functions shipped in libgcc.a: it’s outright HORRIBLE!

      lea rax, [rdi, rsi]
      test rsi, rsi
      js .negative
      cmp rdi, rax
      jg .somewhere
      cmp rdi, rax
      jl .elsewhere

      JFTR: what LLVM ships in their compiler-rt is equally bad. It’s a real shame, for both of them!

  3. It would be nice to be able to turn this on in specific variables, operations or blocks of code that are security sensitive, as well as adjust behavior based on what the variable does, including falling back on an alternate algorithm that won’t overflow in such cases.

  4. For the real horror show perform multiplication instead of addition: on an AMD EPYC 7262, GCC shows a 14x slowdown, beaten by Clang with a whopping 537x slowdown!
    This is just one of the many

    highly tuned implementations of the low-level code generator support routines

    which LLVM braggs about on their web pages.
    You gotta love such blatant lies!

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.