Don’t assume that safety comes for free: a Swift case study

Most modern languages try to be “safer” by checking runtime values in the hope of producing more secure and less buggy software. Sadly, it makes it harder to reason about the performance of the code. And, sometimes, the cost can be far from negligible.

Let me illustrate it through a story using Swift programming (Apple’s new programming language).

How do you sum the values in an array? If you are an idiot like myself, you might use a for loop:

var s = 0
for i in array {
    s += i
}

According to my benchmark, this runs at a speed of 0.4 ns per array element. My Skylake processor runs at a flat speed of 3.4GHz, so that we are using a bit over one CPU cycle per array element (1.36 to be precise). Not bad, but we know we could do better. How could Swift get something so simple “wrong”? We shall come back to this issue.

We can get fancier and compute the sum using “functional programming” with a call to reduce. It is slightly shorter and more likely to get you a job interview:

array.reduce(0,{$0 + $1})

For extra obfuscation and even better luck during interviews, you can even use an equivalent and even shorter syntax:

array.reduce(0,+)

Though it is more scientific-sounding, this function is actually 50% slower (clocking at 0.6 ns per element). Why is it slower? We are evidently paying a price for the higher level of abstraction.

Can we do better?

Swift’s integers (Int) are 64-bit integers on 64-bit platforms. What happens if the sum of the values cannot be represented using a 64-bit integer (because it is too large)? Swift helpfully checks for an overflow and crashes if it occurs (instead of silently continuing as Java, C or C++ would). You can disable this behavior by prefixing your operators with the ampersand as in this code…

var s = 0
for i in array {
    s &+= i
}

or this one…

array.reduce(0,{$0 &+ $1})

These are “unsafe” functions in the sense that they might silently overflow.

On my Skylake processor, the silent-overflow (“unsafe”) approach can be twice as fast:

techniqueclock cycles per value
simple loop1.36
reduce2
"unsafe" simple loop (&+)0.65
"unsafe" reduce (&+)1.87

So between the “safe” reduce and the “unsafe” simple loop, there is a factor of 3. I consider such a difference quite significant. The “unsafe” simple loop has the same speed as a C function: that’s what we want, ideally.

All these functions achieve nearly the same best performance if you disable safety checks at build time (swift build --configuration release -Xswiftc -Ounchecked). This means, in particular, that we get the fancy abstraction (the reduce approach) for free, as long as we disable safety checks. That makes performance a lot easier to understand.

So why the large performance difference? With regular checked arithmetic in a for loop, Swift generates add instructions (addq) followed by a condition jump (jo). Without checked arithmetic, it is able to vectorize the sum (using paddq instructions). (Thanks to John Regher for putting me down on this path of analysis.) To paraphrase Steve Canon, “overflow checking itself isn’t very expensive, but it inhibits other optimizations.”

You might notice that I am using quotes around the terms “safe” and “unsafe”. That’s because I do not think it is universally a good thing that software should crash when an overflow occurs. Think about the software that runs in your brain. Sometimes you experience bugs. For example, an optical illusion is a fault in your vision software. Would you want to fall dead whenever you encounter an optical illusion? That does not sound entirely reasonable, does it? Moreover, would you want this “fall dead” switch to make all of your brain run at half its best speed? In software terms, this means that a mission-critical application could crash because an unimportant function in a secondary routine overflows. This is not a merely hypothetical scenario: an overflow in an unnecessary software component halted the software of the Ariane 5 rocket leading to a catastrophe.

My Swift and C code is freely available.

Further reading: We Need Hardware Traps for Integer Overflow and Understanding Integer Overflow in C/C++

Appendix: Overflow checks are not unique to Swift. You can compile your C and C++ with runtime overflow checks using LLVM clang (
-fsanitize=undefined
) or GNU GCC (
-fsanitize=signed-integer-overflow
). The Rust language checks for overflow in debug mode (but not in release mode).

Update: A HackerNews user commented on the performance issues:

Looking at the generated code, we have a couple issues that prevent full optimization. One of them I already knew about (https://bugs.swift.org/browse/SR-2926) — as a debugging aid, we guard every trap we emit with the equivalent of an `asm volatile(“”)` barrier, because LLVM will otherwise happily fold all the traps together into one trap. We really want every trap to have a distinct address so that the debugger can map that trap back to the source code that emitted it, but the asm barrier prevents other optimizations as well. As for `reduce`, it looks like the compiler does inline away the closure and turn the inner loop into what you would expect, but for some reason there’s more pre-matter validation of the array than there is with a for loop. That’s a bug; by all means, reduce is intended to optimize the same.

Update 2: Nathan Kurz ran additional tests using arrays of different sizes…

$ swift build --configuration release
$ cset proc -s nohz -e .build/release/reduce

# count  (basic, reduce, unsafe basic, unsafe reduce)
1000      (0.546, 0.661, 0.197, 0.576)
10000     (0.403, 0.598, 0.169, 0.544)
100000    (0.391, 0.595, 0.194, 0.542)
1000000   (0.477, 0.663, 0.294, 0.582)
10000000  (0.507, 0.655, 0.337, 0.608)
100000000 (0.509, 0.655, 0.339, 0.608)
1000000000(0.511, 0.656, 0.345, 0.611)

$ swift build --configuration release  -Xswiftc -Ounchecked
$ cset proc -s nohz -e .build/release/reduce

# count  (basic, reduce, unsafe basic, unsafe reduce)
1000      (0.309, 0.253, 0.180, 0.226)
10000     (0.195, 0.170, 0.168, 0.170)
100000    (0.217, 0.203, 0.196, 0.201)
1000000   (0.292, 0.326, 0.299, 0.252)
10000000  (0.334, 0.337, 0.333, 0.337)
100000000 (0.339, 0.339, 0.340, 0.339)
1000000000(0.344, 0.344, 0.344, 0.344)

19 thoughts on “Don’t assume that safety comes for free: a Swift case study”

  1. Exceptions on 64 bit overflows? I guess every crypto code written in Swift needs to use this “unsafe” notation (I wonder if the compiler throws countless warnings about it?). I’m from the old school of x86 assembly and I never like signed integers in HLL languages and even worse – lack of unsigned integers (Java), because to me and the hardware they are natural things, but this “invention” goes against the hardware CPU design, if it doesn’t trigger an exception in low level code why should it crash in high level? God damn future…

    1. Exceptions on 64 bit overflows?

      It is a runtime assert. The program crashes.

      I guess every crypto code written in Swift needs to use this “unsafe” notation (I wonder if the compiler throws countless warnings about it?).

      The compiler won’t complain, this is at runtime. You can use “+” for checked additions, and the ampersand notation for unchecked overflow. If you do want unchecked overflows, then you have to use the ampersand notation.

      I’m from the old school of x86 assembly and I never like signed integers in HLL languages and even worse – lack of unsigned integers (Java), because to me and the hardware they are natural things, but this “invention” goes against the hardware CPU design, if it doesn’t trigger an exception in low level code why should it crash in high level? God damn future…

      You are right that there is a mismatch between hardware and software. See John Regher’s blog post (link at the bottom).

  2. In my fantasy world, a decade or two from now it will be mainstream for programs to be written in the more abstract style and for compilers to prove that fancy optimizations are safe (in the cases when they are actually safe, of course). This has been some people’s fantasy for at least a generation or two, but it feels like we’re actually getting closer to having this technology (see, e.g., seL4 and CompCert).

    The difference between “safe” and safe is interesting. I would characterize it as the difference between avoiding certain implementation-level bad behaviors versus ensuring that an application behaves sensibly in terms of high-level behavior. The latter is much harder in general, but I still believe the former has value.

    Your analogy with optical illusions feels a little off to me. Natural neural systems have tons of checks and balances to increase the likelihood of reasonable behavior. If software goes out of its intended execution path (for example via numerical overflow), it is much less likely that the result will be acceptable.

    1. Your analogy with optical illusions feels a little off to me. Natural neural systems have tons of checks and balances to increase the likelihood of reasonable behavior.

      Good engineering is not about preventing faults at a high cost. It is about managing the damages when faults occur.

      Your software *will* fail. There is no getting around that. Thinking that “safe” software is software that does not fail is wrong.

      So what we do when problems occur matters a great deal. Is “falling dead” the right approach?

      1. > Good engineering is not about preventing faults at a high cost. It is about managing the damages when faults occur.

        I think you’ve stated this too strongly. Good engineering practice involves both avoiding and managing problems. Even some of the “move fast and break things” zealots acknowledge that it’s possible to take that motto too far.

        > So what we do when problems occur matters a great deal. Is “falling dead” the right approach?

        Of course not, at least from a complete system perspective. But languages like C and C++ by default allow suspicious things like numerical overflow and out-of-bounds accesses to go by unnoticed. It’s hard to take any compensatory action when the system doesn’t notice there’s a problem.

        I have very little experience with Swift, and I think immediately killing the process might be too severe. But even that approach can be managed by having another process monitoring the application and restarting it as needed.

        1. I think you’ve stated this too strongly.

          Millions of people die in car accidents. We could prevent all of it if we built cars like tanks. We don’t all drive military-grade tanks because costs matter. We accept that there will be accidents and failures. If you design things with the assumption that you can make the faults go away, your costs will be out of this world or you’ll fool yourself, and when a fault does occur it will be terrible because unplanned for.

          The choice Swift makes is to indeed assume that there will be overflows, but then that when they occur, a crash is harmless enough.

          But languages like C and C++ by default allow suspicious things like numerical overflow and out-of-bounds accesses to go by unnoticed.

          With clang and GNU GCC, you can compile you C and C++ code with runtime checks:
          http://lemire.me/blog/2016/04/20/no-more-leaks-with-sanitize-flags-in-gcc-and-clang/

          I presume that most industrial-strength C and C++ compilers must have similar capabilities. Of course, it comes with a performance penalty.

    1. My blog post was about safety (overflow checks). I guess you refer to something else, which is the “cost” of abstraction (functional programming idioms in this case). Then, sure, I was surprised that the reduce approach was slower than the loop approach myself and I don’t think that the reason for the performance difference is all that clear.

      I would like you to consider the following points:

      1. Rust (in release mode) does not check for overflows unlike Swift. So it is not directly comparable with Swift.

      2. There are plenty of cases where Swift, Rust, JavaScript, C++, and Java, will have the same performance whether using loops or functional programming idioms… but this, by itself, does not prove that abstraction does not have a performance cost. There are definitively cases where functional programming idioms have a performance cost (as shown here).

  3. An Ariane V rocket was destroyed during takeoff because the guidance software (written in Ada) caused an overflow that caused the program to halt. The computation being performed wasn’t even needed for flight, but was running anyway.

  4. We looked at your `reduce` example, and in the latest compiler, it appears that the for loop and reduce get essentially equivalent performance now:

    Array_for: 0m1.675s user time

    Array_reduce: 0m1.638s time

    Thanks for the test case!

      1. We do have a known issue that `isUniquelyReferenced` checks don’t get hoisted out of an `append` loop. That might account for part of the problem there. I’ll file a bug to make sure we investigate.

        1. Thanks. My argument is that this should never be the fastest way to do things…

          func extendArray(_ x : inout [Int], size: Int) 
                   -> [Int] {
             var answer = Array(repeating: 0, count: size)
             for i in 0..<x.count {
               answer[i] = x[i]
             }
             x = answer
          }
          

          If that’s the fast way, then performance is left on the table.

          1. By all means, that’s a deficiency in Swift’s optimizer. There is another way to initialize the array, using the initializer from another Sequence:

            x = Array((0..<size).lazy.map { $0 < x.count ? x[i] : 0 })

            which will avoid the initial cost of bzero-ing the array buffer, but isn't always the clearest way to express the initialization.

            1. I think that this syntax x = Array((0..<size).lazy.map { $0 < x.count ? x[$0] : 0 }) in Swift is very elegant. I’ll remember it.

              Elegance aside, I am not sure it is necessarily very fast given the current state of Swift. Granted, in principle, it could be fast, but it would require lots of work from the compiler.

              This would make for a long exchange, but I am not sure that map/filter are well suited to high performance computing. They can be slower than simple old-fashioned procedural code because the latter is more transparent to the compiler… especially as you chain them.

              Anyhow.

              I find that your proposal is a tad slower than x += repeatElement(0,count:newcount), and that’s not the fastest way.

              They are both easily 3x slower than they should be.

              1. There’s no fundamental reason the `lazy` variations of `map` and `filter` should be slower than a loop. They’re fully inlinable and don’t produce temporary copies, only “views” over the underlying collection. It sounds like Xcode 8’s compiler had some problems optimizing away closure overhead, hence the perf difference you saw between `reduce` and the for loop, but those issues seem to be fixed in 8.3 beta 1. I just tried this, and consistently get ~17% better results with “B”:

                import Dispatch

                var x = Array(repeating: 1738, count: 1_000_000)
                var sink = x

                print(“A:”)
                print(dispatch_benchmark(1_000) {
                var answer = Array(repeating: 0, count: 2_000_000)
                for i in 0 ..< x.count {
                answer[i] = x[i]
                }
                // Prevent `answer` from being DCE'd
                sink = answer
                })

                print("B:")
                print(dispatch_benchmark(1_000) {
                var answer = Array((0..<2_000_000).lazy.map { $0 < x.count ? x[$0] : 0 })
                // Prevent `answer` from being DCE'd
                sink = answer
                })

                Perhaps neither is still as fast as it should be yet, but we're working on it!

Leave a Reply

Your email address will not be published. Required fields are marked *