What is the memory usage of a small array in C++?

In an earlier blog post, I reported that the memory usage of a small byte array in Java (e.g., an array containing 4 bytes) was about 24 bytes. In other words: allocating small blocks of memory has substantial overhead.

What happens in C++?

To find out, I can try to allocate one million 4-byte arrays and look at the total memory usage of the process. Of course, the memory usage of the process will include some overhead unrelated to the 4-byte arrays, but we expect that such overhead will be relatively small.

From my benchmark, I get the following results…

system memory usage (in bytes)
GCC 8, Linux x86 32 bytes
LLVM 14, Apple aarch64 16 bytes

The results will vary depending on the configuration of your system, on your optimization level, and so forth.

But the lesson is that allocating four bytes (new char[4] or malloc(4)) does not use four bytes of memory… it will generally use much more.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

7 thoughts on “What is the memory usage of a small array in C++?”

  1. hi, daniel, I think it was not caused by different compilers, but caused by different malloc implements, or same malloc in different cpu archs/OSs.

  2. Thank you. On glibc it is even more than I would have expected. Surprising to see that Apple (NetBSD?) fares better.
    I guess with C++, a common improvement is to use Boost.Pool if you have many such tiny objects.
    https://www.boost.org/doc/libs/1_80_0/libs/pool/doc/html/boost_pool/pool/introduction.html
    Given that Java stores (and exposes) the length of the array (i.e., Java arrays are more like a struct { int32 length; byte[] data }) it does fairly well in memory overhead. I would have thought that for a pure byte[4] allocation, C can do with 4-8 bytes.

  3. It indeed depends on the implementation. I have an implementation (overwriting new and delete) that aligns on 8 bytes, so one million arrays of 4 bytes would only require 8 MiB (plus a small fraction for some housekeeping).
    In theory it would be possible to write an implementation that requires even less (4 MiB), assuming that it’s ok to align 4 byte allocations on an address that’s a multiple of 4 bytes.

  4. You’re not measuring just the array, you’re measuring the array + malloc overhead + cache alignment overhead + whatever implementation overhead. Pretty common knowledge going back as far as I can remember.

    1. Daniel, you should not expect the idiomatic usage of C++, or C for that matter, to be terribly efficient.
      I find the impl shifts away from the standard together with the requirements.
      If your memory consumption and/or allocation latency hurts you, you quickly discover custom allocators, how to replace malloc with something thinner and faster, and some more things.
      I’d also recommend reading “what every programmer should know about CPU and memory”
      😀
      Yakov

  5. The point to be made here is there is cost in size and time to small allocations. Maybe you already know this, but more than a few of our peers are foggy on the topic.

    Have you ever read code from others that contains:
    1. Heap allocation that could be static?
    2. Many small heap allocations in a much-repeated loop?
    3. One-at-a-time heap allocation of a large number of objects of a single type?

    To the performance-oriented folk – for the mental itch invoked by the above – you are welcome. 🙂

    Keep in mind that the average programmer is just that. This sort of reminder is not out of place.

Leave a Reply

Your email address will not be published. The comment form expects plain text. If you need to format your text, you can use HTML elements such strong, blockquote, cite, code and em. For formatting code as HTML automatically, I recommend tohtml.com.

You may subscribe to this blog by email.