C++ web app with Crow: early scalability results

Last year, I looked at writing small “hello world” web applications in various programming languages (Go, JavaScript, Nim…). Go, using nothing but the standard library, did well.

In these benchmarks, I am just programming an HTTP route that returns a small string (e.g., ‘hello world’). The query is from the host itself. The intent behind such a benchmark is to measure how well an web application might scale in the best of cases. I call such a benchmark ‘simplistic’ because nobody only ever returns just a short string and you do not usually query the server from the host.

At the time, I had wanted to compare with a C++ library, and I ended up trying the lithium framework which scaled very well.

Jake Arkinstall pointed out that he uses Crow, to build web applications in C++. So I decided to take Crow out on a spin.

My simplistic application has only few lines:

#include "crow.h"
int main() {
  crow::SimpleApp app;
  app.loglevel(crow::LogLevel::Warning);
  CROW_ROUTE(app, "/simple")([](){
    return "Hello world";
  });
  app.port(18080).multithreaded().run();
}

This allows the server to use all threads. You can limit the server to fewer threads by replacing multithreaded() by concurrency(32) to limit (e.g.) the server to 32 threads.

To build it, I use a standard CMakeLists.txt file:

cmake_minimum_required(VERSION 3.15)
project(funserver CXX)
find_package(Crow REQUIRED)
add_executable(server src/server.cpp)
target_link_libraries(server Crow::Crow)

And I use a conan file to specify the dependency:

[requires]
crowcpp-crow/1.1.0
[generators]
CMakeDeps
CMakeToolchain

That is all. Then to build, I issue the following commands in shell:

conan profile detect --force
conan install . --output-folder=build --build=missing
cmake -B build -DCMAKE_TOOLCHAIN_FILE=conan_toolchain.cmake -DCMAKE_BUILD_TYPE=Release
cmake --build build
./build/server

I assumes that you have C++ compiler, Conan and CMake, but these are standard tools.

After issuing these commands, my server is then running. I use bombardier to hammer the server with requests. On a Linux server with many processors (two  Intel Xeon Gold 6338 CPUs, each made of 32 cores) and much memory, I try increasing the number of simultaneous requests (using the tool bombardier) and looking for errors. As the number of simultaneous queries increase, the system has to sustain both a high number of requests and as well as the processing of the server. You can run my benchmark from my source code and instructions. Your numbers will differ.

simultaneous queries requests/s errors (%)
10 260k 0%
100 315k 0%
1000 380k 0%
10,000 350k 0.002%

I filed an issue with the Crow project regarding the errors. They are very uncommon and only occur under intense stress. They may or may not be the result of a bug.

My performance numbers are comparable to a lithium server. Let us rerun the same tests with lithium using 64 threads to verify:

simultaneous queries requests/s errors (%)
10 90k 0%
100 245k 0%
1000 275k 0%
10,000 240k 0%

Though lithium does not cause any errors even at high queries, it has troubles shutting down after being stressed with 10,000 simultaneous queries.

So it seems that Crow offers state-of-the-art performance. Crow offers HTTP/1.1 and WebSocket support, but it has currently no support for the more recent standards. It has a nice Web site.

Appendix. Andrew Johnston invited me to compare these results with a similar server written using Bun, the fast JavaScript runtime. I use the following program.

Bun.serve({
  fetch (req) {
    const url = new URL(req.url);
    if (url.pathname === "/simple") return new Response("Hello world!");
    return new Response('Should return an error!')
  },
  port: Number(Bun.env.PORT || 3000),
  address: Bun.env.ADDRESS || '127.0.0.1'
})

I deliberately do not use multi processes or other tuning. Using the same hardware, and the same test setup, I get the following results with Bun (results will vary):

simultaneous queries requests/s errors (%)
10 48k 0%
100 53k 0%
1000 54k 0%
10,000 47k 0%

The command ps huH p processid |wc -l reveals that it uses about 10 threads, but according to the author of Bun, Jarred Sumner, the server is single threaded. I made a video to illustrate the thread usage. The additional threads are used to by the garbage collector, to recover memory.

In any case, Bun is between 6 to 8 times slower than a pure C++ server written with Crow, in my tests. Your results will vary.

I found it interesting that Bun never caused error, even under stress. The reason is likely that the Bun server never uses many threads, so the system is less likely to be under stress.

Please note that I am not encouraging you to use Crow instead of Bun. I am definitively encouraging you to check Bun out.

Daniel Lemire, "C++ web app with Crow: early scalability results," in Daniel Lemire's blog, April 6, 2024.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

8 thoughts on “C++ web app with Crow: early scalability results”

  1. hi daniel,

    sorry to be a pain but this is not correct.

    “I have verified that the bun server is multi-threaded. The command ps huH p processid |wc -l reveals that it uses about 10 threads.”

    if you check the ps command again after a few seconds you will see there is only one thread eventually. as i said on twitter, bun uses threads for various blocking operations like accessing the file system, so these are possibly threads that were spawned when bringing the process up. JavaScriptCore, the JS engine Bun uses, also has internal threads for doing GC and JIT off the main thread.

    bun absolutely does not run a network service on multiple threads – there is a single event loop which processes non blocking events. so a single bun instance running a network service can only ever hope to saturate a single core on any system. you can ask jarred to confirm, but i’m 99.99% sure i am right here.

    to be clear, i have no affiliation with bun or any interest in promoting it. i was just using it as a counter example to show that the Crow server is not optimal for something built with C++ and, in fact, it performs worse (for me) than Bun/JavaScript for this simple hello world benchmark if you use my spawn script to spin up multiple instances of bun. I’ve updated the readme with the commands to run to see this in action.

    https://gist.github.com/billywhizz/afbef853d07abed58d47cc257a65c586#file-spawn-js

    1. Crow server is not optimal for something built with C++

      My benchmark is too limited to be able to draw conclusions, but can you offer an equivalent C++ alternatives that provide better performance? I will gladly run a benchmark.

      What is your favourite C++ framework that can outdo Crow?

      if you check the ps command again after a few seconds you will see there is only one thread eventually. as i said on twitter, bun uses threads for various blocking operations like accessing the file system, so these are possibly threads that were spawned when bringing the process up. JavaScriptCore, the JS engine Bun uses, also has internal threads for doing GC and JIT off the main thread.

      At rest, Bun uses 1 or 2 threads. When you trigger the multiple HTTP requests, then it goes up to about 10 threads in total on my system. After the requests stop, it goes back to 1 or 2 threads.

      Of course, we could use Web Workers and other JavaScript framework and so on, but this far exceeds the scope of this blog post.

        1. it seems strange to me that you won’t just use the spawn.js script i provided or even just run the bombardier test for longer and use htop to visually monitor cpu usage of the process.

          instead, you go to the trouble to make a video which doesn’t actually tell us anything new and double down on the wildly incorrect claim that bun is 6x-8x slower than Crow? why?

          the threads you are seeing in the video are most likely garbage collection and JIT threads that are active when the process is busy and creating GC/JIT work to do. they will only contribute ~1-2% over the single core the main process is fully utilizing.

          1. This blog post was about Crow. You insisted that I compare it with Bun, and so I did. You proposed a setup which I did not go for in the scope of this blog post because all my tests so far have been simple hello world tests. I do simple benchmarks with out-of-the-box configurations. It is simplistic by design. If I start tuning one approach, I have to tune others. It goes well outside the scope of the blog post.

            There are serious benchmarks out there with tuning and everything. That’s not the point of this blog post, clearly. Please check the title of the blog post.

            You suggested I consider Lithium. I did, the blog post was updated. I had considered Lithium and I verified that Crow offered similar performance. I have now made this explicit.

            I have verified that Bun (but also Node.js) might use several physical threads… it seems that 10 might be typical. As you can see in the video, none of these threads have much load in term of CPU cycles (it is also true of Crow by the way). Node.js servers might also use several threads.

            So far, I have not seen evidence that Crow has poor performance. It might be true, I don’t have the data.

  2. > If, on your system, network IO happens on a single thread, I suggest you file a bug report with Bun, as it is unexpected.

    sorry daniel, but this just isn’t correct. bun networking – like all the mainstream JS runtimes out there – is single-threaded using non-blocking sockets and event loop. if you run a single bun server process you won’t ever be able to utilize more than a single core (or a little over 1 core due to GC/JIT running on separate threads).

    if you could run the spawn.js script from my gist you will see very clearly the results for bun are completely different to what you have published here.

    in terms of C++ frameworks, lithium is very good from benchmarking i have done, but it requires profiling in order to achieve the results you can see for plaintext on techempower here
    https://www.techempower.com/benchmarks/#section=test&runid=cdec9eaf-19ea-48d2-bfa4-df15afbe3236&hw=ph&test=plaintext

    the best i have seen in running my own micro-benches are the uWebSocket (C++)/uSocket (C) implementations, which bun is actually based on.

    https://github.com/uNetworking/uWebSockets

    but, i don’t write much C++. i write mostly C and JavaScript and i have benchmarked a lot of frameworks in different languages, especially when i was working on just-js framework and making it go faster than all of them, without any threading, using non blocking io on an event loop, just like bun does.

    https://www.techempower.com/benchmarks/#hw=ph&test=composite&section=data-r21

    1. Thanks for correcting me Andrew.

      Note that my blog post does include a comparison with lithium and, in my tests, it is not faster. I offer my source code (see blog post).

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.