Programming inside a container

I have a small collection of servers, laptops and desktops. My servers were purchased and configured at different times. By design, they have different hardware and software configurations. I have processors from AMD, Intel, Ampere and Rockchip. I have a wide range of Linux distributions, both old and new. I also mostly manage everything myself, with some help from our lab technician for the initial setup.

The net result is that I sometimes end up with very interesting systems that are saddled with old Linux distributions. Reinstalling Linux and keeping it safe and secure is hard. Furthermore, even if I update my Linux distributions carefully, I may end up with a different Linux distribution than my collaborators, with a different compiler and so forth. Installing multiple different compilers on the same Linux distribution is time consuming.

So what can you do instead?

I could run virtual machines. With something like VirtualBox, you can run Linux inside Windows insider macOS. It is beautiful. But it is also slow and computationally expensive.

You can switch to containers, and Docker specifically, which have much less overhead. Docker is a ubiquitous tool in cloud computing. As a gross characterization, Docker allows you to run Linux inside Linux. It is a sandbox, but a sandbox that runs almost directly on the host. Unlike virtual machines, my tests show that on computationally intensive tasks, Docker containers run at “native speed” (bare metal). There are reports that system interaction is slower. Network connections and disk access is slower. For my purposes, it is fine.

If you must, you can also run Docker under macOS and Windows, though there will then be more overhead, I expect.

The idea of a container approach is to always start from a pristine state. So you define the configuration that your database server needs to have, and you launch it, in this precise state each time. This makes your infrastructure predictable.

It is not as perfect as it sounds. You still critically depend on the quality of the container you start from. Various hacking can be necessary if you need two applications with different requirements to run together in the same image.

Still, containers work well enough that they are basically sustaining our civilization: much of the cloud-based applications are based on containers one way or another.

Containers were built to deploy software into production. Programming inside containers is not directly supported: you will not find much documentation about it and there is simply not business model around it. What do I mean by “programming inside containers”? I mean that I’d to start a C programming project, decide that I will use the Linux Ubuntu 16.10 distribution and that I will compile and run my code under Linux Ubuntu 16.10, even though my server might be running a totally different Linux distribution (or might be under macOS).

The first problem is that your disk and the disk of the image built from the container are distinct. A running image does not have free access to the underlying server (the host). Remember that it is a sandbox.

So you can do all of your work inside the image. However, remember that the point of container technology is to always start from a pristine state. If you load up an image, do some work, and leave… your work is gone. Images are immutable by design. It is a great thing: you cannot easily mess up an image by tinkering with it accidentally.

You can, after doing some work inside an image, take a snapshot of the new state, commit it and create a new image from which you would start again. It is complicated and not practical.

What else could you do? What you can do instead is keep the image stateless, as images are meant to be. The image will only contain the compiler and build tools. There is no reason to change any of these tools. You will have all of your code in a directory, as you would normally do. To run and compile code, you will enter in the the image and run your commands. You can bind the repository from the host disk to the image just as you enter it.

This works much better, but there are glitches if you are issuing directly your docker command lines:

  1. Depending on how Docker is configured on your machine, you may find that you are unable to read or write to the disk bound to the image from the image. A quickfix is to run the image with privileged access but it is normally frowned upon (and unnecessary).
  2. The files that you create or modify from within the Docker image will appear on the host disk, often with strange file permissions. For example, maybe all of the files are owned by the root user. I had a research assistant that had a good workaround: he ran Linux as root all the time. I do not recommend such a strategy.

These glitches come from the strange way in which Docker deals with permissions and security. Contrary to what you may have read, it is not a simple matter of setting user and group identifiers: it may be sufficient on some systems but not on systems supporting Security-Enhanced Linux which require additional care.

And finally, you need to remember lots of complicated commands. If you are anything like me, you would rather not to have to think about Docker. You want to focus all of your attention on the code.

So the solution is to use a little script. In my case I use a bash script. You can find it on GitHub. It handles messy commands and file permissions for you.

For years, I tried to avoid having to rely on a script, but it is simply unavoidable to work productively.

Basically, I copy two files at the root of the directory where I want to work (Dockerfile and run), and then I type:

./run bash

And that is all. I am now in a subshell, inside the host directory. I can run programs, compile them. I have complete access to a recent Ubuntu distribution. This works even under the ARM-based servers that I have.

The run script can take other commands as well, so I can use it as part of other scripts.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

24 thoughts on “Programming inside a container”

  1. There is a hack for the second issue. When you launch the container, mount your current directory as the same directory in the container and set your working directory to the same.

    docker run -v 'pwd':'pwd' -w 'pwd' ubuntu

    1. Nice trick. I will use it.

      As for your command, let us see what happens when we run it…

      $ docker run -it -v $(pwd):$(pwd) -w $(pwd) ubuntu bash 
      # ls 
      ls: cannot open directory '.': Permission denied 
      
      $ docker run -it -v $(pwd):$(pwd):Z -w $(pwd) ubuntu bash  
      # ls 
      CMakeLists.txt  Dockerfile  README.md  main.cpp  
      # touch x.txt 
      # exit 
      $ ls -al x.txt
      -rw-r--r--. 1 root root 0 May 22 16:36 x.txt
      

      So, first, if you are using a secured linux (and many of my servers are secured linux), the volume binding won’t work. That’s easily solved.

      But then, more critically, as you can see, the file permission is messed up.

  2. I use lxd because it feels more like a vm but is a container.

    lxc launch ubuntu:Alias containerName
    lxc exec containerName -- Command
    lxc stop containerName
    lxc destroy containerName

    “LXD is a next generation system container manager. It offers a user experience similar to virtual machines but using Linux containers instead.

    It’s image based with pre-made images available for a wide number of Linux distributions and is built around a very powerful, yet pretty simple, REST API.” from https://linuxcontainers.org/lxd/

  3. I started with this approach but needed to also cater for services that you deployed which means you really need to include systemd as part of your development environment.

    Though wrapped up in a Makefile, it pretty much looks like:

    env TMPDIR=$(pwd) $(pwd)/packer build -on-error=ask -only docker packer.json
    docker run -it --rm \
    -e container=docker \
    -v $(pwd)/data:/opt/VENDOR/PROJECT/data:ro \
    -v $(pwd)/nginx:/opt/VENDOR/PROJECT/nginx:ro \
    -v $(pwd)/lua:/opt/VENDOR/PROJECT/lua:ro \
    -v $(pwd)/public:/opt/VENDOR/PROJECT/public:ro \
    -v $(pwd)/src:/opt/VENDOR/PROJECT/public/gpt/src:ro \
    --publish=127.0.0.1:8000:80 \
    --publish=127.0.0.1:63790:6379 \
    --tmpfs /run \
    --tmpfs /tmp \
    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
    --cap-add SYS_ADMIN --cap-add NET_ADMIN --cap-add SYS_PTRACE \
    --stop-signal SIGPWR \
    VENDOR/PROJECT:latest

    Packer generates the Docker container but will also cook my production GCP, AWS, Azure, … images too.

    packer.json includes a call out to a setup script that does the grunt work and installs systemd (Debian: systemd-sysv) and sets the entry point to /sbin/init; there are some other minor details (such as passwd -d root so you can do a console login and logging out is via typing halt in the container) but this is the gist of it.

    To interact with the deployment you work from the host side and then just reload your service to make it live. You do need to line up your ducks in a row to get those bind mounts into the right place for your service to just pick up on but when you get it right it makes life a lot easier.

    As a note I continue to use a shell script over orchestration tools, as well as other container/VM environments, so everything remains accessible to others. The above seems to work on Windows and macOS too.

    At the end of the day this is about making it not just easier for myself, but for everyone else.

    1. …those bind mounts are read-only to act as a polite reminder/guard that you should not edit files directly on your ‘server’ but instead make all changes in the project host side where they can be commited and be re-deployed (hopefully involving just a service reload).

    1. I require privileged access when running the container because, as a programmer, I need to low-level access (performance counters, and so forth). Otherwise, it would not be required.

  4. Have you looked at singularity?
    It’s containers for HPC environments. The idea is the scientist creates his environment (e.g. his laptop) as an image and run on a cluster with this image. Should also work for programming and it’s becoming a standard in HPC.
    regards,
    Boris
    https://singularity.lbl.gov/

      1. I’m not a real expert, but the main ideas are,
        1. security. Docker needs a daemon running as root and the containers running in a root context. Sysadmins don’t like that 😉
        Singularity is a program, running under your username.
        2. convenience. Singularity automatically mounts your home directory and the work directory (not sure about the later) into the container, so no extra copying back and forth.
        3. portability. A singularity container is just one big file. Docker containers are build from layers. You can easily convert them to singularity. The advantage is, you can just copy this one big file to a new cluster and don’t have to rely on that there are the proper versions of your libraries/programs installed.
        4. reproducability. If you publish a paper, you just have to preserve the singularity container and your dataset and can reproduce the results years later. Docker containers get updated.
        Hope this explains the reasons of this development. It makes it also much easier for sysadmins not to install 10 versions of the same library for different projects. So win-win 🙂
        regards,
        Boris

        1. Singularity is a program, running under your username.

          So you do not get access to performance counters and the like? I need to be able to access these from time to time. So if privileged access is not feasible, that would make it impossible for me to use such an option.

          Singularity automatically mounts your home directory and the work directory (not sure about the later) into the container, so no extra copying back and forth.

          I do the same with Docker.

          portability. A singularity container is just one big file. Docker containers are build from layers. You can easily convert them to singularity. The advantage is, you can just copy this one big file to a new cluster and don’t have to rely on that there are the proper versions of your libraries/programs installed.

          I am confused about this comment. The whole point of Docker is not to have to worry about the libraries or programs installed on the host.

          reproducability. If you publish a paper, you just have to preserve the singularity container and your dataset and can reproduce the results years later. Docker containers get updated.

          I don’t think they do get updated. Not unless you want to get updates. That is an important feature for docker.

          Hope this explains the reasons of this development. It makes it also much easier for sysadmins not to install 10 versions of the same library for different projects. So win-win 🙂
          regards,

          I use docker for this very purpose, so that I do not have to install different versions.

  5. Ok,
    I think in a self administered environment there might be not much difference. But Docker was meant to run micro services and has different goals. Sure you can tweak docker, but singularity solves some of the problems without tweaking.
    I’m not sure about performance counters, but I doubt it, because of being a userland process.
    The main idea behind it is, that you can test your program small scale on your local machine (maybe in docker) and once you’re happy, convert your docker image to singularity and run the production on a cluster.
    The main point is not a lot of clusters will implement docker, because of the inherent security issues. And I think a lot of research needs as much computing power as possible, so you have to think how to scale out.
    Anyhow, it was just a suggestion 🙂
    regards,
    Boris

        1. For me, the point of using docker is that I can have the same workflow no matter what machine I am on (Windows, macOS, any Linux distribution).

          Yes. I have VirtualBox, but it has major drawbacks for what I have to do. Running podman inside VirtualBox would be a terrible experience compared to just launching docker.

          I also don’t want to mess with the systems. I don’t want to hack /etc/apt/sources.list.d under ubuntu if I don’t need to. Docker is easily to install, fully supported, pretty much everywhere.

          I realize that what I write may sound unfair, but I think that using Docker, at this point in time, makes a lot of sense.

          I did investigate podman. I am sure it is great for some people… but I don’t find it attractive. When I can just start podman containers with a command line under macOS, then maybe…

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.