For my programming work, I tend to assume that I have a Linux environnement. That is true whether I am under Windows, under macOS or under a genuine Linux.
How do I emulate Linux wherever I go? I use docker containers. Effectively, the docker container gives me a small subsystem where everything is “as if” I was under Linux.
Containers are a trade-off. They offer a nice sandbox where your code can run, isolated from the rest of your system. However they also have lower performance. Disk and network access is slower. I expect that it is true wherever you run your containers.
However, part of your computing workload might be entirely computational. If you are training a model or filtering data, you may not be allocating memory, writing to disk or accessing the network. In such cases, my tests suggest that you have pretty much the same performance whether you are running your tasks inside a container, or outside of the container… as long as your host is Linux.
When running docker under Windows or macOS, docker must rely on a virtual machine. Under Windows, it may use VirtualBox or other solutions, depending on your configuration, whereas it appears to use Hyperkit under macOS. These virtual machines are highly efficient, but they still carry an overhead.
Let me benchmark a simple Go program that just repeatedly computes random numbers and compares them with the value 0. It prints out the result at the end.
package main import ( "fmt" "math/rand" ) func main() { counter := 0 for n := 0; n < 1000000000; n++ { if rand.Int63() == 0 { counter += 1 } } fmt.Printf("number of zeros: %d \n", counter) }
It is deliberately simple. I am going to use Go 1.14 (always).
Under macOS, I get that my program takes 11.7 s to run.
$ go build -o myprogram
$ time ./myprogram
number of zeros: 0
real 0m11.911s
user 0m11.660s
sys 0m0.034s
I am ignoring the “sys” time since I only want the computational time (“user”).
Let me run the same program after starting a docker container (from an ubuntu 20.10 image):
$ go build -o mydockerprogram
$ time ./mydockerprogram
number of zeros: 0
real 0m12.038s
user 0m12.026s
sys 0m0.025s
So my program now takes 12 s, so 3% longer. Observe that my methodology is not fool-proof: I do not know that this 3% slowdown is due to the overhead incurred by docker. However, it bounds the overhead.
Let me do something fun. I am going to start the container and run my program in the container, and then shut it down.
$ time run 'bash -c "time ./mydockerprogram"' number of zeros: 0 real 0m12.012s user 0m12.003s sys 0m0.008s real 0m12.545s user 0m0.086s sys 0m0.041s
It now takes 0.5 s longer. That is the time it takes for me start a container, do nothing, and then shut it down. Doing it in this manner takes 8% longer than running it natively in macOS.
Of course, if you run many small jobs, the 0.5 s is going to hurt you. It may come to dominate the running time.
If you want to squeeze every ounce of computational performance out your machine, it is likely that you should avoid the docker overhead under macOS. A 3% overhead may prove to be unacceptable. However, for developing and benchmarking your code, it may well be an acceptable trade-off.
It would be great if you’d include numbers for a Linux and a Windows host, too. And also measure overheads on IO.
At 12 seconds run time, I’d also average multiple runs to get a better estimate and avoid cold-start effects.
For Linux, there is no computational overhead as far as I know. I think I mention this in the post.
For Windows, you have WSL as an option which complicates the analysis. And then you have WSL1 and WSL2.
IO is likely a much more complicated story because it comes in different forms and difference usage scenarios, and also because measuring IO is just flat out harder to do reliably, but we know the the overhead is going to be significant and, in some cases, large.
The results are consistent and accurate (within a 1% margin of error). Note that I use a desktop (iMac). If you use a laptop, you are likely to get more noise in the measures. But on my iMac, the numbers are precisely reproducible, run to run.
Typo: s/once/ounce/
Also, you might likely want to mention that you run that loop a billion times. It took me some time to count those zeros. 🙂
Docker on Windows 10 uses Hyper-V by default, though this might require Windows 10 Pro. Hyper-V is a very solid hypervisor, I would bet higher performance than VirtualBox.
With the latest Windows 10 update, the 2004 release, Docker offers to use WSL2 as an alternative hypervisor, though I’m not sure what that means under the hood. Is it Docker on top of Linux on top of Hyper-V? I would be surprised if Microsoft built a completely different hypervisor just for WSL(2).
I wonder how the container’s OS image impacts things. It probably doesn’t matter, but Ubuntu is enormous so I use Alpine.
Thanks for the post on MacOS. On Windows, a higher exit time from container is seen, and I filed a bug on Moby that is not yet addressed (https://github.com/moby/moby/issues/40832)
Hey, can you try to reproduce my micro-benchmark ? My team is observing 40-80% overhead, not 3%.
I wrote a comment in your issue after checking it out.
On Windows you also have the option to use the Windows Subsystem for Linux (WSL) in version 2. You can use docker either inside the Linux instance or still use the docker desktop for Windows if you need that GUI in the tray.