Performance tip: constructing many non-trivial objects is slow

I started programming professionally when Java came out and right about when C++ was the “hot new thing”. Following the then-current fashion, I looked down at C and Pascal programming.

I fully embraced object-oriented programming. Everything had to be represented as an object. It was the right thing to do. Like many young (and not so young) programmers, I suffered from cargo cult programming. There was a new fashion and its advocates were convincing. I became a staunch advocate myself.

When you program in C, allocating and de-allocating dynamic ressources is hard work. Thus you tend to give much consideration to how and when you are going to allocate ressources. More “modern” languages like Java and C++ have freed us from such considerations. Largely, the trend has continued with a few isolated exceptions.

Over time, I came to recognize one vicious performance anti-pattern that comes with object-oriented programming: the widespread use of small but non-trivial objects. Objects that do a non-trivial amount of work at the beginning or end of their life are “non-trivial”. For example, arrays and strings with sizes determined at runtime are non-trivial.

The net result is that in many instances, with a few lines of code, you can generate thousands or millions of tiny non-trivial objects. For example, in JavaScript, the JSON.parse function takes an input JSON string (a single string), and maps it into lots and lots of tiny objects. Another example is the programmer who loads a file, line by line, storing each line into a new string instance, and then splits the line into dynamically allocated substrings. These programming strategies are fine as long as they are not used in performance sensitive code.

Let us run a little experiment using a fast programming language (C++). Starting from a long random strings, I am going to randomly generate 100,000 small substrings, having a random length of less than 16 bytes using non-trivial objects (std::string). For comparison, I am going to create 100,000 identical small strings using trivial objects (C++17 std::string_view). A std::string_view is basically just a pointer to a memory region otherwise managed.

How quickly can I copy these strings? I published the source code of a benchmark.

I am expressing the speed in volume per second (GB/s) where the volume is given by the sum of all of the small strings. Your results will vary, but here is what I get under GNU GCC 9 (Linux) with an AMD Rome processor:

non-trivial objects (std::string) 0.8 GB/s
trivial objects (std::string_view) 15 GB/s

The std::string implementation is likely optimized for handling small strings. If you repeat the same experiment while storing your data in a std::vector instance, you may get catastrophic performance.

Though 0.8 GB/s may sound fast, keep in mind that it is the speed to do something trivial (copying the strings). If even the trivial parts of your software are slow, there is no hope that your whole system is going to be fast.

Furthermore, do not expect higher-level languages like Python or JavaScript to do better. The C++ performance is likely a sensible upper bound on how well the approach works.

Recommendation: In performance-sensitive code, you should avoid creating or copying thousands or millions of non-trivial objects.

Daniel Lemire, "Performance tip: constructing many non-trivial objects is slow," in Daniel Lemire's blog, August 8, 2020.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

8 thoughts on “Performance tip: constructing many non-trivial objects is slow”

  1. Great reminder. I have also gotten a lot of speed up in parsing data strings by avoiding the use of streams. A bit of a downside is that string_view is C++17. Three years down the road do you think C++17 is already widely adopted?

    1. Wide adoption can mean many things. In bioinformatics, I’ve found it useful to consider the default compilers installed in the field. If server lifespan is 5 years, it can take 5-6 years for a new C++ standard to be widely supported. C++14 reached that point recently, while C++17 is still some years away.

  2. This article states something obvious and basically lacks a point. just seems a cheap sneer at ‘high level’ languages. No one writes performance sensitive code in python or JavaScript.

    1. This article states something obvious

      I am sure it was obvious to you, but what makes you believe that it is obvious to everyone?

      and basically lacks a point.

      I disagree. I think that this blog post makes a clear point.

      just seems a cheap sneer at ‘high level’ languages.

      It is not. I make my point using C++. If I wanted to make Python look bad, I would have used another approach.

      No one writes performance sensitive code in python or JavaScript.

      I disagree. Python and R are the dominant languages in data science. It is entirely possible to have high-performance data processing in Python and R. Of course, the heavy lifting won’t be done in pure Python or in pure R.

  3. People just need to understand that memory allocations are costly, so if you want fast code, you want to minimize the number of “mallocs”.

    Each std::string object is creating such an alloc, that’s why if you create thousands, it gets very slow. std::string_view doesn’t allocate memory, so that’s why it’s faster to use it if you can.

    Same thing when you want to grow an std::string, like: += “x” multiple times. It will cause a few memory reallocations, so that’s why if you know the size that you’ll end up with, it’s much faster to reserve the memory beforehand, and then do the concatenations, because you’re allocating the memory only once instead of many times.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.