Move or copy your strings? Possible performance impacts

You sometimes want to add a string to an existing data structure. For example, the C++17 template ‘std::optional’ may be used to represent a possible string value. You may copy it there, as this code would often do…

std::string mystring;
std::optional<std::string> myoption;
myoption = mystring;

Or you can move it:

std::string mystring;
std::optional<std::string> myoption;
myoption = std::move(mystring);

In C++, when ‘moving’ a value, the compiler does not need to create a whole new copy of the string. So it is often cheaper.

I wrote a little benchmark to assess the performance difference. It is a single test, but it should illustrate.

Firstly, for relatively long strings (a phrase or a sentence), the move is 5 times to 20 times faster.

copy move
Apple LLVM 14, M2 processor 24 ns/string 1.2 ns/string
GCC 11, Intel Ice Lake 19 ns/string 4 ns/string

Secondly, for short strings (a single word), the move is 1.5 times to 3 times faster but the absolute difference is small (as small as a fraction of a nanosecond). Your main concern should be with long strings.

copy move
Apple LLVM 14, M2 processor 2.0 ns/string 1.2 ns/string
GCC 11, Intel Ice Lake 7 ns/string 2.6 ns/string

My results illustrate that moving your sizeable data structure instead of copying them is beneficial.

But that’s not the fastest approach: the fastest approach is to just hold a pointer. Copying an address is unbeatably fast. A slightly less optimal approach is to use a lightweight object like an std::string_view: copying or creating an std::string_view is cheaper than doing the same with a C++ string.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

7 thoughts on “Move or copy your strings? Possible performance impacts”

  1. > But that’s not the fastest approach: the fastest approach is to just hold a pointer.

    If you’re only storing the string and not doing anything else with it, then perhaps. But if you frequently access the string as the same time as the rest of your existing data structure, than an additional pointer dereference might reduce cache efficiency and increase latency.

    So, if you’re preoccupied with the cost of that string, you should probably measure your actual use case.

  2. Referencing the string by raw pointer is efficient on the initial copy but realistically only has downsides after that e.g.
    * Lifetime guarantees/invariants of the original string
    * Null pointer checks in all accessor/calling code
    * Memory locality and cache behaviour (as already pointed out)

    …so a judgment call would need to be made on the use expectations as Antoine mentioned.

  3. First, almost a nit. The “outdata” is reused once, so perhaps not the same. I did see a change when block-scoping “outdata”.

    $ ./build/b0
    short strings:
    5.09394 **1.63144** 0.405696 0.60292
    long strings:
    13.0458 **2.9494** 0.384981 1.36331

    Run with “outdata” virgin for both tests.

    $ ./build/b0
    short strings:
    5.21159 **1.46637** 0.344897 0.573364
    long strings:
    13.6949 **1.62891** 0.351345 0.788565

    1. Your link is not working (private content?).

      One powerful trick that std::string relies upon are short-string optimization, whereas short strings are stored directly in the string object itself, therefore avoiding any kind of heap allocation.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.