You sometimes want to add a string to an existing data structure. For example, the C++17 template ‘std::optional’ may be used to represent a possible string value. You may copy it there, as this code would often do…
std::string mystring;
std::optional<std::string> myoption;
myoption = mystring;
Or you can move it:
std::string mystring;
std::optional<std::string> myoption;
myoption = std::move(mystring);
In C++, when ‘moving’ a value, the compiler does not need to create a whole new copy of the string. So it is often cheaper.
I wrote a little benchmark to assess the performance difference. It is a single test, but it should illustrate.
Firstly, for relatively long strings (a phrase or a sentence), the move is 5 times to 20 times faster.
copy | move | |
Apple LLVM 14, M2 processor | 24 ns/string | 1.2 ns/string |
GCC 11, Intel Ice Lake | 19 ns/string | 4 ns/string |
Secondly, for short strings (a single word), the move is 1.5 times to 3 times faster but the absolute difference is small (as small as a fraction of a nanosecond). Your main concern should be with long strings.
copy | move | |
Apple LLVM 14, M2 processor | 2.0 ns/string | 1.2 ns/string |
GCC 11, Intel Ice Lake | 7 ns/string | 2.6 ns/string |
My results illustrate that moving your sizeable data structure instead of copying them is beneficial.
But that’s not the fastest approach: the fastest approach is to just hold a pointer. Copying an address is unbeatably fast. A slightly less optimal approach is to use a lightweight object like an std::string_view: copying or creating an std::string_view is cheaper than doing the same with a C++ string.
> But that’s not the fastest approach: the fastest approach is to just hold a pointer.
If you’re only storing the string and not doing anything else with it, then perhaps. But if you frequently access the string as the same time as the rest of your existing data structure, than an additional pointer dereference might reduce cache efficiency and increase latency.
So, if you’re preoccupied with the cost of that string, you should probably measure your actual use case.
Interesting point.
Referencing the string by raw pointer is efficient on the initial copy but realistically only has downsides after that e.g.
* Lifetime guarantees/invariants of the original string
* Null pointer checks in all accessor/calling code
* Memory locality and cache behaviour (as already pointed out)
…so a judgment call would need to be made on the use expectations as Antoine mentioned.
First, almost a nit. The “outdata” is reused once, so perhaps not the same. I did see a change when block-scoping “outdata”.
$ ./build/b0
short strings:
5.09394 **1.63144** 0.405696 0.60292
long strings:
13.0458 **2.9494** 0.384981 1.36331
Run with “outdata” virgin for both tests.
$ ./build/b0
short strings:
5.21159 **1.46637** 0.344897 0.573364
long strings:
13.6949 **1.62891** 0.351345 0.788565
Had some free time, so played around with benchmarking C-with-classes style strings.
C-with-classes string benchmark on Github
Seems that std::string improved dramatically at some point?
Your link is not working (private content?).
One powerful trick that std::string relies upon are short-string optimization, whereas short strings are stored directly in the string object itself, therefore avoiding any kind of heap allocation.
Yep, was private, and is now public.