How fast can you pipe a large file to a C++ program?

Under many operating systems, you can send data from from one process to another using ‘pipes’. The term ‘pipe’ is probably used by analogy with plumbing and we often use the the symbol ‘|‘ to represent a pipe (it looks like a vertical pipe).

Thus, for example, you can sort a file and send it as input to another process:

sort file | dosomething

The operating system takes care of the data. It can be more convenient than to send the data to a file first. You can have a long sequence of pipes, processing the data in many steps.

How efficient is it computationally?

The speed of a pipe depends on the program providing the data. So let us build a program that just outputs a lot of spaces very quickly:

  constexpr size_t buflength = 16384;
  std::vector<char> buffer(buflength, ' ');
  for(size_t i = 0; i < repeat; i++) {
    std::cout.write(buffer.data(), buflength);
  }

For the receiving program, let us write a simple program that receives the data, little else:

  constexpr size_t cache_length = 16384;
  char cachebuffer[cache_length];
  size_t howmany = 0;
  while(std::cin) {
    std::cin.read(cachebuffer, cache_length);
    howmany += std::cin.gcount();
  }

You could play with the buffer sizes: I use relatively large buffers to minimize the pipe overhead.

I am sure you could write more efficient programs, but I believe that most software using pipes is going to be less efficient than these two programs.

I guess speeds that are quite good under Linux but rather depressing under macOS:

macOS (Big Sur, Apple M1) 0.04 GB/s
Linux (Centos 7, ARM Rome) 2 GB/s to 6.5 GB/s

Your results will be different: please run my benchmark. It might be possible to go faster with larger inputs and larger buffers.

Even if the results are good under Linux, the bandwidth is not infinite. You will get better results passing data from within a program, even if you need to copy it.

As observed by one of the readers of this blog, you can fix the performance problem under macOS by falling back on a C API:

  size_t howmany = 0;
  size_t tr;
  while((tr = read(0, cachebuffer, cache_length))) {
    howmany += tr;
  }

You lose portability, but you gain a lot of performance. I achieve a peak performance of 7 GB/s or above which is much more comparable to the cost of copying the data within a process.

It is not uncommon for standard C++ approaches to disappoint performance-wise.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

19 thoughts on “How fast can you pipe a large file to a C++ program?”

  1. Interestingly, if I switch from using std::cin to using fread(3) on stdin, I get speeds closer to 2.6 GB/s on my Intel MacBook Pro running Catalina. Using std::cin is extremely slow. Using read(2) instead of read is a tad faster.

      1. I would also recommend using write to maximize write throughput in case that’s the new bottleneck (the overhead of iostream varies per platform but is almost always observably bad…)

  2. I ran your tests and was able to average ~3GBps using cpispeed, though only 0.02GBps using pipespeed. The previous poster’s comment seems appropriate.

    I threw together a quick test in Go (my language of choice) to see what kind of throughput I could get. With 4MB buffers I was seeing ~3.9GBps without cleaning my environment at all (Chrome running, e tc.).

    Just for fun, I also put pv between the emitter and collectors in both your tests and mine. I chose pv because it’s a very common C-based tool that handles pipes. I saw a measurable but fairly slight drop in both benchmarks with pv in the middle. I guess that shows that pv is using one of the more efficient APIs rather than std::cin.

  3. yes. using system api is much faster! i did some experiments a while ago with javascript and you can achieve these same speeds too: https://just.billywhizz.io/blog/on-javascript-performance-02/. the problem here is a lot of the time is being taken up by syscalls and the context switching into the kernel.

    i think it would be possible to go (much) faster if we could do something entirely in userspace with, for example, io_uring on linux? https://unixism.net/loti/

  4. I don’t have mac with dev tools at hand to verify, but some versions of C++ standard library generate very inefficient code in debug.
    I wonder if you will get better results by adding -O in there.

  5. At some point, I was using pipes to transfer a raw video stream from raspividyuv to my program, but the pipe throughput was too low to be processed in realtime.
    So I tried replacing the pipe with a UNIX socket (replace pipe with socketpair) and the speedup was impressive: from 200 MB/s to 700 MB/s on a raspberry pi 3.

    Apart from the code creating the “pipe” nothing was changed, and in particular, the reading and writing code were exactly the same.

    This made me wondering: why a socket is faster than a pipe?

  6. This is probably the C++ IO APIs showing their inadequacy. Even using Python you can probably achieve more than that (sorry, I don’t have a reproducer to submit :-)).

  7. 20 years ago when I was still in high school I dabbled in competitive programming a bit. Back when g++ was still version 3.x, it was a common pitfall to use #include for anything that involved heavy IO. Programs would literally run out of time just reading input.

    It seems that in some implementations of iostream the issue is still here. At any rate there’s too much “magic” in C++ standard library that using fread (or better yet just the posix read()) would give much more accurate results if one is trying to measure the performance of OS pipes.

  8. By the way, since you’re comparing with read(2) already, I notice that using vmsplice(2) on Linux immediately triples my results.

  9. This is a libc++ issue. On Ubuntu 21.04, when I compile with GCC 10.3, I get about 2.7-3.0 GB/s for both variants (cin and read). When I compile with Clang++ 11 using libstdc++, I get similar numbers. But when I compile with clang++ -stdlib=libc++ I get those 0.1GB/s vs 2.5 GB/s numbers. So the problem is QoI of libc++.

Leave a Reply to Attractive Chaos Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.