“Hello world” is slower in C++ than in C (Linux)

A simple C program might print ‘hello world’ on screen:

#include <stdio.h>
#include <stdlib.h>

int main() {
    printf("hello world\n");
    return EXIT_SUCCESS;
}

You can write the equivalent in C++:

#include <iostream>
#include <stdlib.h>

int main() {
    std::cout << "hello world" << std::endl;
    return EXIT_SUCCESS;
}

In the recently released C++20 standard, we could use std::format instead or wrap the stream in a basic_osyncstream for thread safety, but the above code is what you’d find in most textbooks today.

How fast do these programs run? You may not care about the performance of these ‘hello world’ programs per se, but many systems rely on small C/C++ programs running specific and small tasks. Sometimes you just want to run a small program to execute a computation, process a small file and so forth.

We can check the running time using a benchmarking tool such as hyperfine. Such tools handle various factors such as shell starting time and so forth.

I do not believe that printing ‘hello world’ itself should be slower or faster in C++ compared to C, at least not significantly. What we are testing by running these programs is the overhead due to the choice of programming language when launching the program. One might argue that in C++, you can use printf (the C function), and that’s correct. You can code in C++ as if you were in C all of the time. It is not unreasonable, but we are interested in the performance when relying on conventional/textbook C++ using the standard C++ library.

Under Linux when using the standard C++ library (libstdc++), we can ask that the standard C++ be linked with the executable. The result is a much larger binary executable, but it may provide faster starting time.

Hyperfine tells me that the C++ program relying on the dynamically loaded C++ library takes almost 1 ms more time than the C program.

C 0.5 ms
C++ (dynamic) 1.4 ms
C++ (static) 0.8 ms

My source code and Makefile are available. I get these numbers on Ubuntu 22.04 LTS using an AWS node (Graviton 3).

If these numbers are to be believed, there may a significant penalty due to textbook C++ code for tiny program executions, under Linux.

Half a millisecond or more of overhead, if it is indeed correct, is a huge penalty for a tiny program like ‘hello workd’. And it only happens when I use dynamic loading of the C++ library: the cost is much less when using a statically linked C++ library.

It seems that loading the C++ library dynamically is adding a significant cost of up to 1 ms. We might check for a few additional confounding factors proposed by my readers.

  1. The C compiler might not call the printf function, and might call the simpler puts function instead: we can fool the compiler into calling printf with the syntax printf("hello %s\n", "world"): it makes no measurable difference in our tests.
  2. If we compile the C function using a C++ compiler, the problem disappears, as you would hope, and we match the speed of the C program.
  3. Replacing  "hello world" << std::endl; with "hello world\n"; does not seem to affect the performance in these experiments. The C++ program remains much slower.
  4. Adding std::ios_base::sync_with_stdio(false); before using std::cout also appears to make no difference. The C++ program remains much slower.
C (non trivial printf) 0.5 ms
C++ (using printf) 0.5 ms
C++ (std::cout replaced by \n) 1.4 ms
C++ (sync_with_stdio set to false) 1.4 ms

Thus we have every indication that dynamically loading the C++ standard library takes a lot time, certainly hundreds of extra microseconds. It may be a one-time cost but if your programs are small, it can dominate the running time. Statically linking the C++ library helps, but it also creates larger binaries. You may reduce somewhat the size overhead with appropriate link-time flags such as --gc-sections, but a significant overhead remains in my tests.

Note: This blog post has been edited to answer the multiple comments suggesting confounding factors, other than standard library loading, that the original blog post did not consider. I thank my readers for their proposals.

Appendix 1 We can measure precisely the loading time by preceding the execution of the function by LD_DEBUG=statistics (thanks to Grégory Pakosz for the hint). The C++ code requires more cycles. If we use LD_DEBUG=all (e.g., LD_DEBUG=all ./hellocpp), then we observe that the C++ version does much more work (more versions checks, more relocations, many more initializations and finalizers). In the comments, Sam Mason blames dynamic linking: on his machine he gets the following result…

C code that dynamically links to libc takes ~240µs, which goes down to ~150µs when statically linked. A fully dynamic C++ build takes ~800µs, while a fully static C++ build is only ~190µs.

Appendix 2 We can try to use sampling-based profiling to find out where the programs speeds its time. Calling perf record/perf report is not terribly useful on my system. Some readers report that their profiling points the finger at locale initialization in this manner. I get a much more useful profile with valgrind --tool=callgrind command && callgrind_annotate. The results are consistent with the theory that loading the C++ library dynamically is relatively expensive.

Appendix 3 It might get better with GCC 13 which reduces the overhead of the iostream header.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

82 thoughts on ““Hello world” is slower in C++ than in C (Linux)”

  1. I am certainly not an expert in C++. However, if I remember correctly, std::endl is a lot slower than using \n. Of course, you may need to use std::endl. I wonder how the benchmark changes when using \n?

  2. This isn’t exactly news. The C++ specific printing facilities are known to be less efficient than plain old println(), and have been known to be slower decades.

  3. Remove the std::endl and put the \n in the string like the C version, and it should go faster…

      1. Well that’s a false meme, associative thinking. `endl` just causes a call of `flush`. At some point before end of `main` the stream is flushed anyway, so, net win = one function call and check.

    1. No…I think that’s fake news… I’ve heard a lot of people say that std::endl is a new line with a flush, but that either isn’t exactly true or at least implementation defined.

  4. I try to stick with C and use macros if needed to enhance the language. There is a thing for sticking with simplicity. C++ is to complicated and bloated. OOP is Ok but I much better perfer functional programming using just functions.

    1. This code is a really bad comparison.

      This gives the idea that C++ is bloated and slower (it is not, actually it is faster in real code than C).

      And then you have people like this coding in the stone age justified with memes.

    2. When problems are large or complex, the OO C++ features simplify your code to a very large extent.

      Functions are fine, but associating them with the proper data is cumbersome in C, simple and scalable in C++, based on Classes, their extensions or generalization, their relationships, and their instances. Abstraction is the reason why C++ was created, and it delivers that, hence the power and simplicity of its code.

      Real “Functional Programming” isn’t supported by languages as basic as C. Consider exploring languages that are built for Functional Programming, they would give you more power in a world you already like.

  5. Isn’t this, to some extent, testing the streaming IO part of the STL in C++, instead of the language itself? For what it’s worth, std::cout and std::endl probably does more (like flushing the cache) than printf under the hood, which could potentially account for the 1ms increase in execution time.

    1. It is a well established fact that C++ does not provide a zero overhead abstraction unfortunately.
      Note that many features of C++ in fact do provide (+-) zero overhead abstractions.

    2. I think a fair comparison would be to do like so:

      int main() {
      std::ios_base::sync_wyth_stdio(false);
      std::cout << "hello world\n";
      return EXIT_SUCCESS;
      }

      Can you try the benchmark with this C++ implementation?

  6. I have a concern about your conclusion here — not that it’s necessarily wrong, but that this test is incomplete. Specifically, this test does nothing to differentiate between execution time and function call time.

    If we’re looking at 1 ms overhead every time you print to console, I’ll grant that’s significant. But if we’re looking at 1 ms per execution? I can’t rightfully agree with your conclusion that this is significant. Yes, granted, we’re talking about a 200% increase in the execution time for Hello World, but in 2022, I cannot think of a real-world situation where anyone would be executing hello-world equivalent software with such frequency that it creates a cpu bottleneck. Not even in the embedded space.

    I haven’t tested it yet (I might), but my guess is the performance difference you’re seeing takes place in loading the module, and if you were to print to console 10,000 or 100,000 times per execution, you’d still be looking at about a 1 ms difference per execution. I’m basing this guess on the fact that we’re seeing such a significant performance increase in the statically linked c++ version and the knowledge that in a Linux environment, there’s some decent chance that stdio.h is preloaded in memory while iostream is not.

    Obviously, my hunches are not data, and more testing is required before we draw any conclusions here.

    The other question I have is whether you’re running hyperfine with the -N flag. Without it, on processes this short, it’s kicking the following warning at me:

    Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`–shell=none` option to disable the shell completely.

    Which seems potentially relevant.

    I might be back later with followup results.

  7. Try removing stdlib in both programs. Return 0 instead. Also use \n in the cpp program instead of endl. Would be interested in seeing the results of that

  8. There is a difference in your C++ code as opposed to C code, and that is the std::endl statement, which flushes stdout. There is no flushing in the C code. For the code to be equivalent, the C++ statement should be
    std::cout << "hello world\n";

  9. I’m not a professional C or C++ dev but I still remember a few basics from the time I studied physics at my local university (we had C/C++ lectures).

    Both endl and cout have side effects. You compare two pieces of code that don’t do the same thing. You should not expect them to run equally fast.

    There are ways to reduce the side effects like NOT using endl or using ios_base::sync_with_stdio(false).

    https://godbolt.org/ helps a lot if you want to know more details.

    1. In your updated C++ code (multi_hello.cpp), you should also replace std::endl with “\n” as previously suggested here. I suspect this may have a much larger impact on the results due to flushing after each print for 30000 iterations.
      Interested in seeing updated results!

  10. Hello Mr. Lemire,

    IMHO, the comparison of those two snippets isn’t very fair, as the C++ code does a bit more than the C code:

    Streaming std::endl does not only stream a ‘\n’, it also flushes the stream (https://en.cppreference.com/w/cpp/io/manip/endl).

    To make the two programs more comparable, you should either replace the C++ streaming with
    std::cout << "hello world\n";
    or add a
    fflush(stdout);
    to the C program.

    In my tests, both hellocppstatic and hellocppfullstatic were faster than helloc, with both of these changes, hellocpp was slower. However, as my machine wasn't completely idle, these results may be inaccurate.

    But let's go a step ahead:
    If you omit the printf / flush / cout streaming, just leaving the "return EXIT_SUCCESS" (and the includes), the C++ program will most probably be slower. This is because of the static initialization of std::ios_base (std::ios_base::Init::Init() gets called on program startup as soon as gets included).
    It’d be interesting to see the results after removing this include, as the object code of the hello.c and hello.cpp should be totally equal.

    Best regards
    – Mark

    1. “This is because of the static initialization of std::ios_base (std::ios_base::Init::Init() gets called on program startup as soon as gets included)”

      This. Static initialization and destruction is made if iostream header is just included, and even not used. Using stdio.h instead of iostream and printf gives you exactly the same result of assembly between these two languages. Latest GCC release output:

      .LC0:
      .string “Hello world”
      main:
      sub rsp, 8
      mov edi, OFFSET FLAT:.LC0
      xor eax, eax
      call printf
      xor eax, eax
      add rsp, 8
      ret

      But yeah, overall I think this is a good example that all the features in C++ over C are not here for free. You have to understand how using your libraries, your code (of course;) and sometimes even how compilers work, if optimizing CPU usage is your priority one.

  11. This is such a beautiful example of measuring something, and yet understanding almost nothing about what they mean, I shall be using this as an example for our new starters on the pitfalls of premature optimisation and the importance of meaningful test structures and data.

  12. This is based on biased info from decades ago.

    99.99999% of C++ programs used for professional applications in this world do not use standard out (or err) to convey runtime status.

    C++ apps are easier to develop than C, have more rich features, so I’m not sure what you are driving at.

    Oh, C++ apps are oftentimes deployed in embedded (or server) environments…. where there is definitely no I/O to a terminal.

    I suspect this article was written by a troll.

    1. The blog post is specifically about “hello world”.

      If you mean to refer to large programs, then I agree, but it happens often enough that we have to run small commands that only do a few microseconds of work.

  13. The C++ program is doing more work than the C program.

    You should avoid using `std::endl unless you specifically intend to flush the buffers explicitly. There’s nothing wrong with using a simple newline character.

    But also, IO streams are known to be measurably slower than printf. Especially since it has hidden global constructors and destructors.

    std::format is the new modern way to write formated strings.

    So, it’s not really that “hello world is slower in C++”, it’s that The methods that you’ve chosen to perform the task in C++ are by nature slower (But offer better type safety and internationalization capabilities).

    For the simple task of printing”hello world”, honestly you should just use puts.

  14. << std::endl inserts a newline AND flushes the stout buffer, which I don't believe printf() does.
    It would be interesting to see the comparison without << std::endl, since flushing the buffer is a relatively costly operation, it should give you a better apples to apples comparison. I'm no expert though.

  15. This is not an accurate, endl also includes a flush, which is no longer necessary in c++, and adds unnecessary time. You could have just as easily used “\n” in the c++ version the same way you did in the c version.

  16. iostreams are not a minor bit of infrastructure.

    If.you want to compare program startup time, use printf in the C++ version as well.

    You should be able to look at the assembly output to make a good comparison. That’s a better view of what’s happening and why

  17. There is no difference between std::endl ant ‘/n’ because std::cout is flushed at the end of the application.

  18. IMHO it is all about linking with the libstdc++. In the first version of the code I did only replaced std::cout… line with printf line from the C version (without changing includes or linking directives) and the results for C++ did not change on my computer.

    I ran a perf record/report on that version and unlike C, at least 30% time was being lost on locale functionality. My guess is not linking to libstdc++ removes underlying C++ locale functionality from printf.

    Measurements were on my 10 year old machine.

    I wonder what will change if we link with/to clang/libc++ though.

  19. Hello Lemire,

    in C++, operations are synchronized to the standard C streams after each input/output.

    According to cppreference (https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio), synch_with_stdio may reduce the penalty:


    If the synchronization is turned off, the C++ standard streams are allowed to buffer their I/O independently, which may be considerably faster in some cases.


    std::ios::sync_with_stdio(false);
    std::cout << "hello world" << std::endl;

  20. I’m glad neither I, nor my children, attended the University of Quebec if this is how professors spend their time. You conclude that:

    “.. if these numbers are to be believed, there may a significant penalty due to textbook C++ code for tiny program executions, under Linux.”

    then, in a later comment response, state:

    “The blog post is specifically about “hello world”.”

    If it’s the latter, then the former conclusion is invalid. You cannot infer that tiny programs under Linux will perform slower, using C++ rather than C, on the basis of a one line example where the method used is different.

    There are multiple comments addressing the specifics of the differences, and reasons for them, but, if I were you, I’d take this blog post down as it makes you look foolish.

  21. I have a lot of respect for your work, so this blog post is quite baffling & sadning . -What exactly are you getting at or aiming for ?

    ”there may a significant penalty due to textbook C++ code for tiny program, under Linux.”

    -BS, & you’re comparing apples to oranges . readup what cout actually does. Is your printf thread safe? (you can turn of sync_with_io for the std streams if you want that monster to be faster). std::printf is also maybe worth mentioning

  22. This code doesn’t show C++ being slower than C.

    Rather, this is “iostream with stdio sync on printing two strings” being slower than “printf for the trivial case of a string”. No news here.

  23. Everyone else has already mentioned how flawed this is.

    But a better test would be to compare two computationally intensive algorithms or generics, written properly in each language.

    1. Thanks wqw! I was aware of sync_with_stdio but i’ve never seen tie before.

      It’s always a pleasure to learn something I could use someday 🙂

  24. If all you write is hello world then all you need is C.

    Only that we are not in 1992. C is quite useless for user mode apps nowadays and no one creates console apps except Linux freaks that have nothing else to write.

  25. This is a micro-benchmark that illustrates a simple point. I do not believe Daniel is going after any massive generalizations.

    Oh. And all the comments about flushing the I/O buffer … a moment of thought should have told you the examples were equivalent. While it have been a couple decades since I dug into runtime libraries, pretty sure every runtime must flush buffers on program exit.

    Put differently…
    Did you see the output?
    Then the runtime library flushed output buffers on exit.

    Yes, loading dynamic libraries is more expensive. Often this does not matter, but sometimes it can be significant. There is or should be a savings in memory used (across multiple programs using the same libraries), and this can sometimes be significant.

    The savings from shared dynamic libraries was critical in the Win16 era, and for some time after. In present many-gigabyte machines, rather less so. (In this century, have tended to use static libraries more often than dynamic.)

    The C printf() and stdio library was honed decades ago on much leaner machines, and (as you might expect) is lean and efficient. If you dig back into the USENET archives, you can find a period (late-1980s / early 90s?) where there was a bit of a public competition to see who could come up with the leanest stdio library. That code ended up in compiler runtime libraries, and I strongly suspect survives to the present (and offers examples of hyper-optimization).

    The C++ standard streams library arrived on fatter machines, and never received such attention (in part as you can use C stdio).

    Daniel’s experiment matches well with history.

    1. “This is a micro-benchmark that illustrates a simple point. I do not believe Daniel is going after any massive generalizations.”

      With respect, the claim was made in the article that “.. if these numbers are to be believed, there may a significant penalty due to textbook C++ code for tiny program executions, under Linux.”
      Disregarding the strict meaning of ‘may’ – which would make the whole statement a semantic null – this is (IMO) quite a massive over generalization. It is a micro-benchmark, and a poorly considered and written at that, and there is effectively no meaningful generalization at all possible from it – as has been pointed out by many in these comments. That the author subsequently states that this was specifically about “hello world” is not properly reflected in the main text, even now.

      It also seems to expose a deep lack of knowledge not only of the what the programs are doing, what the objects and functions are designed – and their benefits and deficits – but also of C++.
      I’ve seen renderers and whole micro-kernels constexpr’d – which is harder to do in C – and could result in enormous performance benefits, but that’s not the point, nor is it necessarily a reason to choose one language over the other. They were particular implementations, for particular purposes, and but do demonstrate aspects of a language that could be useful in many situations, but which should not be over generalized from. This is the most egregious issue for me, the apparent attempt to classify language suitability on a frankly meaningless code snippet which is hardly an example of any useful real-world program – this is something we try to stamp out from even the newest of starters, and from a professor of computer science seems quite ridiculous to me. YMMV obviously.

      1. Can you do a test to show that for small programs similar to “Hello World” but not necessarily the same C++ runs as fast as C if not faster? This would settle the issue. Wouldn’t it?

        1. At an extreme, you could try something like this
          https://onecompiler.com/cpp/3wdmzd9js
          (or trying Googling for ‘constexptr fibonacci’)
          Re-working that as C should give an indication of what can be done, but – like the article – it’s really missing the point, and I could probably equally well make a C++ version that is far worse **.

          One of the main reasons that individual, micro-benchmarks like this aren’t useful for answering questions like “is language X faster than language Y ?” is that the question is completely meaningless.

          What we can do is ask, for my particular problem space *and* my typical data sets / operating conditions – what would a good choice of language and strategy be ? If, for example, you were designing an ultra-high speed / low latency peer-to-peer message passing system, you probably wouldn’t choose Python. However if you wanted to implement a simple peer-to-peer client-server application then Python, with it’s interpreted nature and rich library support, would make such a thing relatively trivial. It’s exactly these sort of evaluations that should be driven into programmers, and first year computer science students in Quebec and elsewhere, from day one.

          Using massively simplified, and atypical, noddy code fragments -especially when naively implemented – is not really helpful or instructive, and mainly serves to teach people bad coding practice and poor performance analysis techniques IMO.

          ** These are poor Fibonacci number generators, so don’t use them in any performance sensitive regime, they just an example 🙂

    2. The C printf() and stdio library was honed decades ago on much leaner machines,

      It is still being honed. Memchr for instance uses sse2 instructions on x86-64 machines. These instructions were available only long after both c and c++ gained widespread adoption. Memchr beats std::find
      https://gms.tf/stdfind-and-memchr-optimizations.html

      Glibc is much more optimized than libstdc++ simply because it is much smaller,and therefore developers can devote more time to optimization.

      The truth is that abstractions come at a cost of complexity and size which makes it harder to optimize. “Zero-cost abstractions” may be true in a few cases, but there will always be cases that are too hard or time-consuming to look into. It is a simple matter of tradeoffs.

  26. You didn’t provide what compiler you used. I assume it was gcc. In gcc “printf” is one of the built-in functions. This means there is no library involved at all (neither dynamic nor static). It’s practically part of the language and the #include is just for syntax reasons.
    I didn’t read the internals but I assume that gcc doesn’t call a classical printf at all put optimises it on the compiler level, e.g. do the formating at compile-time and use the ‘write’ syscall directly.

  27. This program will run faster than any C program written using the standard qsort library function:

    #include
    #include
    #include
    #include

    using namespace ::std;

    constexpr int ipow(int base, int exp)
    {
    if (exp == 0) {
    return 1;
    } else if (exp < 0) {
    return 0;
    } else if (exp % 2) {
    return base * ipow(base, exp – 1);
    } else {
    return ipow(base * base, exp / 2);
    }
    }

    int main()
    {
    vector foo(ipow(2,30));
    random_device rd; //Will be used to obtain a seed for the random number engine
    mt19937 gen(rd()); //Standard mersenne_twister_engine seeded with rd()
    uniform_int_distribution dis;
    generate(foo.begin(), foo.end(),
    [&gen, &dis]() {
    return dis(gen);
    });
    cerr << "Sorting.\n";
    sort(foo.begin(), foo.end());
    return is_sorted(foo.begin(), foo.end()) ? 0 : 1;
    }

    1. It is true that C++ has many advantages over C as far as algorithmic implementation goes.

      Your program allocates gigabytes of memory and sorts it. If you reduce the task to sorting 12 numbers, the answer might be different, and that’s the motivation of my blog post.

      1. It isn’t that the implementation of sort is better in C++, it’s that you can’t reasonably make a version of the qsort function in C that runs faster than sort in C++.

        And this is because the sort function in C++ is a template, and the compiler essentially writes you a custom one for the data structure and comparison function you’re using in which the comparison and swap functions are inlined into sort and then subjected to aggressive optimizations.

        Making this happen in C would require macro magic of the highest order, and even then would probably be a huge pain to use correctly.

        Your “Hello world” case reads like a general criticism of C++, when I strongly suspect that C++ is faster than C in most cases because of things like I just mentioned. So, it seems like a criticism that’s narrowly tailored to make a point that I don’t think is particularly accurate.

  28. After an extended play with this I would say that the library/flushing issues that most commenters aren’t anything to worry about. All the significant difference in timings seem to be due to the dynamic linker.

    C code that dynamically links to libc takes ~240µs, which goes down to ~150µs when statically linked. A fully dynamic C++ build takes ~800µs, while a fully static C++ build is only ~190µs. Across all of these, the different between printing one “hello world” vs 1000 is only ~20µs.

    Getting good timings was the hardest thing here! Code/analysis are in:

    https://github.com/smason/lemire-hello

  29. Sorry to post more than one comment. I would go back and edit my original if I could….

    The reason that C++ is taking longer here is that the runtime environment of C++ is more complex. Not a LOT more complex, but, it is more complex. C++ has global constructors and destructors that need to be executed on program startup and shutdown. Additionally, the compiler needs to track which global destructors need to be called because (in the case, for example, of local ‘static’ variables in side functions) which ones need to be called can only be determined at runtime. This requires a global data structure that’s initialized on program startup and scanned on program shutdown.

    Additionally, there will be some overhead required to set up the exception handler of last resort.

    I have a hello world written in C++ that will execute faster than C, but it requires passing lots of compiler options to turn off the compiler’s setup of the C++ runtime environment. It would be possible to duplicate this program in C, but it would be challenging, especially with the quality of error handling it’s possible to achieve using my library:

    My library: https://osdn.net/projects/posixpp/scm/hg/posixpp
    (Github mirror): https://osdn.net/projects/posixpp/scm/hg/posixpp

    Link to hello world program written using my library: https://osdn.net/projects/posixpp/scm/hg/posixpp/blobs/tip/examples/helloworld.cpp

  30. I guess my response would be “duh”. With C you have very little object code and a single call to the printf function in a statically loaded library. With C++ you have substantial startup overhead loading the iostream library and all the modules it depends on. Address allocation takes time and it’s going to prepare for all the possible dynamic libraries you might load as well.

  31. Iostreams are known to have performance issues so this isn’t earth shattering news. I’m glad you mention std::format. libfmt which that evolved from has a function called fmt::print…I guarantee fmt::print(“hello world\n”); will not just be as fast as printf, but faster. Especially if there’s a lot of formatting to be done. This is because it can do some of the formatting work at compile time. And it’s typesafe, so no having to worry and remember the gazillion printf variations. It’s freaking amazing. The print function didn’t make it to c++20, but I believe it’s being pushed for in c++23.

  32. The title should be “C++ streams are slower than printf” which is a known fact as streams favor versatility over performance. Streams are significantly different to print functions since formatting is stored as state within the stream object and takes time to construct and destruct once for the program lifetime.

  33. How about a test where you simply output the exact same code compiled in c and c++ …. this will account for the apple to orange comparison…

    also store the time before and after in μSec or a GetTickCount() before and after the call….. and output this data at the end….. this will account for startup lib / runtime difference….

  34. Instead, I suggest you to use libfmt instead. It is safer and faster. Also, note that the newer C++ standard has replaced iostreams with better alternatives. If you are micro-optimizing, you should consider these details.

  35. It would be more accurate if you printed several thousand lines in a loop. The execution time of printing a single line could easily be confused with loading and startup time. Then, there is also the flushing. You want to make sure that you are flushing the same number of times.
    Lastly, given the object oriented nature of C++, it would make sense to turn on optimisations.

      1. As long as you’re not trying to measure the performance the languages can offer under GCC, that might be adequate. If you’re wanting to try to replicate what typical release production code would do, then that’s probably not (partially depending on the functionality being used).

  36. You could throw in some dirty inline assembly lines involving kernel syscall to improve performance.

  37. Doing a benchmark of such small time can be tricky, even only for various caches.

    It is not a good test neither, as 1ms difference (in 1 run?) for doing no processing at all with data, has no significance nor a value to use language A rather than B.

    No one in the world could be interested in investing to save 1ms for a program that is doing nothing. Because it resolves no problems (but actually it creates one 😂)

    If you want a real comparison of some small routines (this one used for the test it isn’t, but it could be used ) and if you can’t profile them, the way to go is to look at the generated assembler code.

    Than on that can be done an analysis, test, benchmark and writing some conclusion.

    Beside focusing on a real problem, it could hp better do this kind of comparison c/c++

  38. One small improvement on this test could be to take out the “net weight” computing the “tare”:

    Run both programs with empty main

    Then hello world and do the net time execution of the hello world.

    Here the tricky part is the include statement should be present or not?

    Beside the expectation should be that the 2 empty programs take the same time to run,
    Otherwise it implies that hello world itself isn’t faster or slower, but there is some bootstrap overhead.

    Anyway.
    I Enjoyed the post.
    Thanks

  39. As the post as been significantly re-written, the emphasis of the slow-down altered, and a number of the criticisms folded into the text as original text, it might be nice to address more of that in replies/updates to the comments and/or acknowledged in the new article text as appropriate.
    This has been done is a couple of cases, but not all – and this puts the revised text at odds with the historical comments.

    The issue of relevance and suitability of the micro-benchmark as-is is not really dealt with either (e.g. if the absolute time was important you would profile, adjust, iterate – if it’s not important, it’s not important), but that’s another matter.

    1. Thanks. Unfortunately people repeatedly proposed alternative explanations, without running benchmarks themselves nor accounting for the critical point that my post makes: the speed dramatically increased after statically linking. I have added a paragraph to acknowledge these additions as you suggested. I do thank the various readers for their proposals but I am not going to answer point-by-point dozens of closely related comments.

      Regarding the relevance of the benchmark, I have explained and re-explained it at length. For long running processes, the issue has always been irrelevant, but if you have short running processes (executing in about a millisecond or less), then you may be spending most of your time loading the standard library. You may not care, of course… but it is useful to be aware. There are solutions such as static linking, but there are tradeoffs.

  40. What this shows (and really all this shows) is that the C++ library is a lot bigger than the C library. If you are not using the capability it provides, it is costing you performance.

    On the other hand, if you have significant work on a complex problem, you can have better performance because the library and the language provides facilities that would be difficult and expensive to write in C.

Leave a Reply to tetsuoii Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.