Computational overhead due to Docker under macOS

For my programming work, I tend to assume that I have a Linux environnement. That is true whether I am under Windows, under macOS or under a genuine Linux.

How do I emulate Linux wherever I go? I use docker containers. Effectively, the docker container gives me a small subsystem where everything is “as if” I was under Linux.

Containers are a trade-off. They offer a nice sandbox where your code can run, isolated from the rest of your system. However they also have lower performance. Disk and network access is slower. I expect that it is true wherever you run your containers.

However, part of your computing workload might be entirely computational. If you are training a model or filtering data, you may not be allocating memory, writing to disk or accessing the network. In such cases, my tests suggest that you have pretty much the same performance whether you are running your tasks inside a container, or outside of the container… as long as your host is Linux.

When running docker under Windows or macOS, docker must rely on a virtual machine. Under Windows, it may use VirtualBox or other solutions, depending on your configuration, whereas it appears to use Hyperkit under macOS. These virtual machines are highly efficient, but they still carry an overhead.

Let me benchmark a simple Go program that just repeatedly computes random numbers and compares them with the value 0. It prints out the result at the end.

package main

import (
        "fmt"
        "math/rand"
)

func main() {
        counter := 0
        for n := 0; n < 1000000000; n++ {
                if rand.Int63() == 0 {
                        counter += 1
                }
        }
        fmt.Printf("number of zeros: %d \n", counter)
}

It is deliberately simple. I am going to use Go 1.14 (always).

Under macOS, I get that my program takes 11.7 s to run.

$ go build -o myprogram
$ time ./myprogram
number of zeros: 0

real	0m11.911s
user	0m11.660s
sys	0m0.034s

I am ignoring the “sys” time since I only want the computational time (“user”).

Let me run the same program after starting a docker container (from an ubuntu 20.10 image):

$ go build -o mydockerprogram
$ time ./mydockerprogram
number of zeros: 0

real	0m12.038s
user	0m12.026s
sys	0m0.025s

So my program now takes 12 s, so 3% longer. Observe that my methodology is not fool-proof: I do not know that this 3% slowdown is due to the overhead incurred by docker. However, it bounds the overhead.

Let me do something fun. I am going to start the container and run my program in the container, and then shut it down.

$ time run 'bash -c "time ./mydockerprogram"'
number of zeros: 0

real	0m12.012s
user	0m12.003s
sys	0m0.008s

real	0m12.545s
user	0m0.086s
sys	0m0.041s

It now takes 0.5 s longer. That is the time it takes for me start a container, do nothing, and then shut it down. Doing it in this manner takes 8% longer than running it natively in macOS.

Of course, if you run many small jobs, the 0.5 s is going to hurt you. It may come to dominate the running time.

If you want to squeeze every ounce of computational performance out your machine, it is likely that you should avoid the docker overhead under macOS. A 3% overhead may prove to be unacceptable. However, for developing and benchmarking your code, it may well be an acceptable trade-off.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

8 thoughts on “Computational overhead due to Docker under macOS”

  1. It would be great if you’d include numbers for a Linux and a Windows host, too. And also measure overheads on IO.
    At 12 seconds run time, I’d also average multiple runs to get a better estimate and avoid cold-start effects.

  2. It would be great if you’d include numbers for a Linux and a Windows host, too.

    For Linux, there is no computational overhead as far as I know. I think I mention this in the post.

    For Windows, you have WSL as an option which complicates the analysis. And then you have WSL1 and WSL2.

    And also measure overheads on IO.

    IO is likely a much more complicated story because it comes in different forms and difference usage scenarios, and also because measuring IO is just flat out harder to do reliably, but we know the the overhead is going to be significant and, in some cases, large.

    At 12 seconds run time, I’d also average multiple runs to get a better estimate and avoid cold-start effects.

    The results are consistent and accurate (within a 1% margin of error). Note that I use a desktop (iMac). If you use a laptop, you are likely to get more noise in the measures. But on my iMac, the numbers are precisely reproducible, run to run.

  3. Docker on Windows 10 uses Hyper-V by default, though this might require Windows 10 Pro. Hyper-V is a very solid hypervisor, I would bet higher performance than VirtualBox.

    With the latest Windows 10 update, the 2004 release, Docker offers to use WSL2 as an alternative hypervisor, though I’m not sure what that means under the hood. Is it Docker on top of Linux on top of Hyper-V? I would be surprised if Microsoft built a completely different hypervisor just for WSL(2).

    I wonder how the container’s OS image impacts things. It probably doesn’t matter, but Ubuntu is enormous so I use Alpine.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax