Passing integers by reference can be expensive…

In languages like C++, you can pass values to functions in two ways. You can pass by value: the value is semantically “copied” before being passed to the function. Any change you made to the value within the function will be gone after the function terminates. The value is ephemeral. You can also pass by pointer or reference: semantically, we point the new function at the value we already have. Changes made within the function to the value will remain after the scope of the function.

As a rule of thumb, whenever a data element is large (e.g., it cannot fit in registers), it is probably faster to pass by reference or pointer. For tiny data elements that fit in registers, like integers, passing them by value ought to be best.

However, sometimes it is convenient to pass integer values by reference because you want to capture how the value changed during the scope of the function. C++ makes this trivially easy as you just need to add the ampersand (&) in front of the parameter. It looks “free”.

Let us consider the silly example of some function that passes a counter (the variable ‘i’) which gets incremented and added to multiple locations in an array:

void incrementr(uint64_t &i, uint64_t *x) {
  for (int k = 0; k < SIZE; k++) {
    x[k] += i++;
  }
}

We can write an almost entirely equivalent function without using a reference. We just take the counter by value, modify it, and then return it at the end of the function:

int increment(uint64_t i, uint64_t *x) {
  for (int k = 0; k < SIZE; k++) {
    x[k] += i++;
  }
  return i;
}

I expect these two types of functions to be largely equivalent semantically, as long as the counter is assumed not to reside in the array (x). What about performance? The function call itself is different, so there might be a couple of extra move instructions in total in the pass-by-reference case in the best of cases due to calling conventions. However, compilers, like GNU GCC, produce vastly different code that go far beyond a few extra move instructions at the start and end of the function.

GNU GCC 8.3 keeps the value of the passed-by-reference value in memory throughout instead of using a register, due to an aliasing issue. Thus each time you access the counter, you need to reload it. If you modify it, you need to store the new value again. The net result is far worse performance on my Skylake processor…

cycles per value instructions per value
reference 5.9 7.0
value 1.3 3.5

The effect is quite large as you can see. My source code is available.

You can avoid the penalty by copying the passed-by-reference variable to a local variable at the start, and copying it back at end of the function. Or, if your compiler supports it, you can add the __restrict qualifier on your reference.

Other languages like Swift are likely affected as well. In Swift, you can pass an integer as an inout variable which is semantically equivalent to a reference…

func fx(_ allints: inout [Int],  _ j : inout Int) {
      for k in allints.indices {
        allints[k] = j
        j &+= 1
      }
 }

You should avoid such code if you care for performance.

Of course, these considerations are likely to be irrelevant if the function gets inlined or if the function is very inexpensive.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

9 thoughts on “Passing integers by reference can be expensive…”

  1. This effect is due to possible aliasing. The compiler can’t be sure the array and the passed in value don’t overlap (i.e., that the referenced value isn’t one of the array elements). In the case of aliasing, the semantics of the two loops are not the same, since the value i will be modified by the assignment (once) in addition to the increment (maybe it’s even UB due to double-modifcation w/o sequence point). So the compiler emits this conservative code which reloads the value each time. In addition to slower code due to the in-memory increment, optimizations like vectorization are inhibited.

    1. Possible fixes:

      1) Copy i into a local before the loop. This way the compiler knows aliasing isn’t possible since i is an unescaped local. You’ll have to copy i back at the end to maintain the semantics of incrementing k times.

      2) Put the __restrict qualifier on eitheri or x, or both. Roughly speaking, this tells the compiler the values don’t overlap. It isn’t standard in C++, but is relatively widely supported, perhaps because it is standard in C.

      3) Use different types. If the type pointed to byx is different than the type of i, most compilers use type based aliasing analysis (TBAA) to prove aliasing doesn’t occur.

      The compiler could also save you, e.g., by checking whether aliasing occurs at runtime, eg checking whether the address of i falls in the range xx + k. I’m fact, both clang and gcc do this for slightly simpler cases, where i is not incremented: if the runtime check passes, the fast loop is used, otherwise it falls back to the conservative one.

      1. Use different types.

        In my opinion, this is trickier than it sounds. Naively, I would think that ‘counter’ in the example below is of a type distinct from ‘uint64_t’… but I see the same bad optimization effect, possibly because the standard does not allow the compiler to forbid aliasing of this sort. And this says nothing about real C++ code where it is not always immediately obvious what the actual type is (due to the tendency of C++ programmers to abstract away types).

        struct counter {
            uint64_t val;
            uint32_t garbage;    
        };
        
        void incrementj(counter & i, uint64_t * x){
            for(int k = 0; k < 1000000; k++) {
               x[k] += i.val++;
            }
        }
        
  2. Does using a local copy of i, and copying the final value back to the reference make a difference? So uint64_t j = i; for (; j < size; j++) ... i = j;?

  3. You mention that Swift is likely affected by this, but it’s not. Swift enforces at a language level that inout parameters don’t alias. Usually at compile time, but with runtime precondition checks when it can’t prove it statically.

    Swift code currently has other overhead that makes this function bigger than it’s C equivalent but those are separate issues. Largely due to the fact that an array in Swift is not just a pointer to data, but is in fact a copy-on-write ref-counted type that knows its own size and capacity and can change its size to fit its elements.

    1. I make no claim regarding aliasing. I make a claim regarding performance.

      My claim is that this is going to be slower…

      func fx(_ allints: inout [Int],  _ j : inout Int) {
            for k in allints.indices {
              allints[k] = j
              j &+= 1
            }
       }
      

      than the following…

       func f(_ allints: inout [Int],  _ i : inout Int) {
            var j = i
            for k in allints.indices {
              allints[k] = j
              j &+= 1
            }
            i = j
       }
      

      Now, admittedly, there is no fundamental reason for it when the array is large. Even in C++, the compiler could add a check for overlap at the start (knowing that the loop is quite long) and optimize it away. My point in this post is that current compilers (at least some popular ones) will let you down.

      I took a few seconds to write a Swift version of the benchmark

      $ swift -O reference.swift
      ref 1.892044 ns
      hard 0.554002 ns
      

      Basically, Swift exhibits exactly the same performance characteristics as C++ with GNU GCC 8. In fact, if you convert the nanoseconds into CPU cycles, you get very nearly the same performance down to a fraction of a cycle.

      I do not doubt that Swift “doesn’t alias”, as you explain… but it does not follow that the two functions have the same performance once compiled (with optimizations).

      1. That’s quite bizarre, as inout variables are defined to be semantically exactly the same as copying the value to a local variable and copying the new value back to the original variable at the end of the function definition, so according to the rules of the language these functions should generate the exact same code.

        Do you mind if I include your benchmark in a bug report on the Swift compiler?

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax