Appending to an std::string character-by-character: how does the capacity grow?

In C++, suppose that you append to a string one character at a time:

while(my_string.size() <= 10'000'000) {
  my_string += "a";
}

In theory, it might be possible for the C++ runtime library to implement this routine as the creation of a new string with each append: it could allocate a new memory region that contains just one extra character, and copy to the new region. It would be very slow in the worst case. Of course, the people designing the runtime libraries are aware of such potential problem. Instead of allocating memory and copying with each append, they will typically grow the memory usage in bulk. That is, every time new memory is needed, they double the memory usage (for example).

Empirically, we can measure the allocation. Starting with an empty string, we may add one character at a time. I find that GCC 12 uses capacities of size 15 × 2 k for every increasing integers k, so that the string capacities are 15, 30, 60, 120, 240, 480, 960, 1920, etc. Under macOS (LLVM 15), I get that clang doubles the capacity and add one, except for the initial doubling, so you get capacities of 22, 47, 95, 191, 383, 767, etc. So the string capacity grows by successively doubling in all cases.

If you omit the cost of writing the character, what is the cost of these allocations and copy for long strings? Assume that allocating N bytes costs you N units of work. Let us consider the GCC 12 model : they both lead to the same conclusion. To construct a string of size up to 15 × 2n, it costs you 15 + 15 × 21 + 15 × 22 + … + 15 × 2n which is a geometric series with value 15 × (2n + 1 – 1). Generally speaking, you find that this incremental doubling approach costs you no more than 2N units of work to construct a string of size N, after rounding N up to a power of two.  In computer science parlance, the complexity is linear. Insertion in a dynamic array with capacity that is expanded by a constant factor ensures that inserting an element is constant time (amortized). In common sense parlance, it scales well.

In the general case, where you replace 2 by another value (e.g., 1.5), you still get a geometric series but it is in a different basis, 1 +  b1 + b2 + … +  bn  which sums up to  (bn+1-1)/(b – 1). The ratio 2, becomes  b/(b – 1) asymptotically, so for a basis of 1.5, you get 3N units of work instead of 2N. So a smaller scaling factor b leads to more work, but it is still just a constant factor.

If I benchmark the following function for various values of ‘volume’, I get a practically constant time-per-value:

std::string my_string;
while (my_string.size() <= volume) {
  my_string += "a";
}

On my laptop, I get the following results. You can run my benchmark yourself.

volume time/entry (direct measure)
100 5.83 ns
1000 5.62 ns
10000 5.47 ns
100000 5.62 ns
1000000 5.68 ns
10000000 5.69 ns
100000000 5.80 ns

A consequence of how strings allocate memory is that you may find that many of your strings have excess capacity if you construct them by repeatedly appending characters. To save memory, you may call the method shrink_to_fit() to remove this excess capacity. If you are using a temporary string, it is not a concern since the memory is recovered immediately.

Daniel Lemire, "Appending to an std::string character-by-character: how does the capacity grow?," in Daniel Lemire's blog, October 23, 2023.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

10 thoughts on “Appending to an std::string character-by-character: how does the capacity grow?”

    1. I believe the standard doesn’t guarantee any particular behavior; methods of string generally don’t seem to specify complexity, even for push_back: https://cplusplus.com/reference/string/string/push_back/ (“generally amortized constant” suggests exponential growth but “generally” also says it’s not mandatory :D)

      FWIW conventional wisdom also says 1.5x is better than 2x because 2x growth allows no opportunity to reuse memory during reallocation, whereas 1.5x allows the peak memory consumption to be lower as subsequent reallocations can take the space of earlier smaller sizes under the right set of circumstances.

      1. I seem to remember reading somewhere once that any exponential growth less than the golden ratio (1.6180339…) has good reallocation properties. Does that sound right?

        1. Yeah I believe that’s correct. In practice 1.5x is great because not only does it avoid the risk of small over-allocation due to allocator padding going right over the limit implied by the golden ratio, but it also makes the size adjustments cheap to compute (which is a little relevant for small reallocations with a fast allocator).

          Maybe a final note is that one thing that most/all standard STL implementations get “wrong” is that they use the naive exponential formula that results in a very slow start: eg with 1.5x you get 1 => 2 => 3 => 4 => 6 => 9 => 13. This is not a big problem for strings as all implementations now use a short string optimization so the capacity starts from 16-24 and grows from there, but for std::vector (or maybe strings in languages that aren’t C++ and don’t use short string optimization like Rust?) you’ll probably want to change the curve in the beginning to be more aggressive, for example start with a small number of elements to fill a cacheline or so, or use 2x growth factor until 100-odd elements and then switch to 1.5x. This problem also goes away if the user code carefully reserves the container space, which would be a little helpful in the benchmark in this article as well.

      2. FWIW conventional wisdom also says 1.5x is better than 2x because 2x growth allows no opportunity to reuse memory during reallocation

        Isn’t that only true for very specific reallocation circumstances like the used allocator model, whether the reallocation can avoid the content move (then a higher factor is slightly better needing less reallocs) or if the allocator allows partial reusage of still held segments during realloc-moves? The latter is kinda mandatory to get to the “Golden Ratio” rule, without it you’ll always run short on older coalesced segments.

        Maybe the fact that some compilers stick with 2 here indicates, that the std allocators are not that well prepared to benefit from that.

        1. allocator allows partial reusage of still held segments during realloc-moves

          Note that C++ STL never uses realloc – all reallocations by all containers are using the new..delete interface, so they explicitly allocate a new segment, explicitly copy data into it and then free the old one. (which is unfortunate in certain circumstances of course)

          The latter is kinda mandatory to get to the “Golden Ratio” rule, without it you’ll always run short on older coalesced segments.

          I don’t think that’s true. With a naive first-fit allocation model and an allocator that immediately coalesces blocks with neighbors on free, with 1.5X growth you get the requests of progressively larger size but the sum of freed blocks grows faster than the new request, so pretty quickly a new request starts to fit into the gap in the beginning. With 1.7X growth this no longer happens, so you always have less memory in front of your current block than your next block is, so the blocks only ever move forward in memory.

          Here’s a simple Python program that simulates this:

          # container implementation details: we assume that the growth is exponential
          # and that short string buffer handles strings of smaller size
          ratio = 1.5
          block = 16

          # allocator implementation details: we assume that we're using first fit allocator
          # that magically stores block metadata externally and has no alignment restrictions
          # for simplicity we just keep track of a single live allocation
          allocoffset = 0
          allocsize = 0

          for i in range(20):
          block = int(block * ratio)
          if allocoffset >= block:
          print(f"round {i}: allocated range [{allocoffset}..{allocoffset+allocsize}), allocating {block} from the beginning")
          allocoffset = 0
          allocsize = block
          else:
          print(f"round {i}: allocated range [{allocoffset}..{allocoffset+allocsize}), allocating {block} by advancing range")
          allocoffset += allocsize
          allocsize = block

          If you run it with ratio 1.7 you’ll see that the new allocation can never fit into the space left by freeing the previous allocations, whereas with ratios smaller than golden ratio such as 1.6 and 1.5 we periodically are able to fit the allocation into the beginning – this reduces peak memory consumption in a series of allocations. Of course the practical implications will depend on the details of allocator behavior and may not match the simple model.

  1. I think something interesting to cover would be what the worst case looks like in those cases as well (is it nanoseconds, microseconds, or more?). In some applications like video games, I can imagine having a bad worst case performance could be problematic.

  2. It’s interesting to read this because I independently arrived at a “50%” technique in my own buffer module here:

    https://github.com/chkoreff/Fexl/blob/master/src/buf.c#L33

    Only difference is that I double the size up to a point, and thereafter increase it by 50%.

    I suppose I could use 50% every time, but I preferred going from 64 bytes to 128 bytes on the first growth, instead of 96 bytes. So I kept the doubling up to a point, which I set at 2^20 bytes.

  3. What would often be optimal, and is not usually done, would be to allocate (a^n – b), for some base (a) maybe 1.5, (n) increasing incrementally, and (b) a property of the allocator related to the size of the header the allocator reserves for housekeeping.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.