I have a small collection of servers, laptops and desktops. My servers were purchased and configured at different times. By design, they have different hardware and software configurations. I have processors from AMD, Intel, Ampere and Rockchip. I have a wide range of Linux distributions, both old and new. I also mostly manage everything myself, with some help from our lab technician for the initial setup.
The net result is that I sometimes end up with very interesting systems that are saddled with old Linux distributions. Reinstalling Linux and keeping it safe and secure is hard. Furthermore, even if I update my Linux distributions carefully, I may end up with a different Linux distribution than my collaborators, with a different compiler and so forth. Installing multiple different compilers on the same Linux distribution is time consuming.
So what can you do instead?
I could run virtual machines. With something like VirtualBox, you can run Linux inside Windows insider macOS. It is beautiful. But it is also slow and computationally expensive.
You can switch to containers, and Docker specifically, which have much less overhead. Docker is a ubiquitous tool in cloud computing. As a gross characterization, Docker allows you to run Linux inside Linux. It is a sandbox, but a sandbox that runs almost directly on the host. Unlike virtual machines, my tests show that on computationally intensive tasks, Docker containers run at “native speed” (bare metal). There are reports that system interaction is slower. Network connections and disk access is slower. For my purposes, it is fine.
If you must, you can also run Docker under macOS and Windows, though there will then be more overhead, I expect.
The idea of a container approach is to always start from a pristine state. So you define the configuration that your database server needs to have, and you launch it, in this precise state each time. This makes your infrastructure predictable.
It is not as perfect as it sounds. You still critically depend on the quality of the container you start from. Various hacking can be necessary if you need two applications with different requirements to run together in the same image.
Still, containers work well enough that they are basically sustaining our civilization: much of the cloud-based applications are based on containers one way or another.
Containers were built to deploy software into production. Programming inside containers is not directly supported: you will not find much documentation about it and there is simply not business model around it. What do I mean by “programming inside containers”? I mean that I’d to start a C programming project, decide that I will use the Linux Ubuntu 16.10 distribution and that I will compile and run my code under Linux Ubuntu 16.10, even though my server might be running a totally different Linux distribution (or might be under macOS).
The first problem is that your disk and the disk of the image built from the container are distinct. A running image does not have free access to the underlying server (the host). Remember that it is a sandbox.
So you can do all of your work inside the image. However, remember that the point of container technology is to always start from a pristine state. If you load up an image, do some work, and leave… your work is gone. Images are immutable by design. It is a great thing: you cannot easily mess up an image by tinkering with it accidentally.
You can, after doing some work inside an image, take a snapshot of the new state, commit it and create a new image from which you would start again. It is complicated and not practical.
What else could you do? What you can do instead is keep the image stateless, as images are meant to be. The image will only contain the compiler and build tools. There is no reason to change any of these tools. You will have all of your code in a directory, as you would normally do. To run and compile code, you will enter in the the image and run your commands. You can bind the repository from the host disk to the image just as you enter it.
This works much better, but there are glitches if you are issuing directly your docker command lines:
- Depending on how Docker is configured on your machine, you may find that you are unable to read or write to the disk bound to the image from the image. A quickfix is to run the image with privileged access but it is normally frowned upon (and unnecessary).
- The files that you create or modify from within the Docker image will appear on the host disk, often with strange file permissions. For example, maybe all of the files are owned by the root user. I had a research assistant that had a good workaround: he ran Linux as root all the time. I do not recommend such a strategy.
These glitches come from the strange way in which Docker deals with permissions and security. Contrary to what you may have read, it is not a simple matter of setting user and group identifiers: it may be sufficient on some systems but not on systems supporting Security-Enhanced Linux which require additional care.
And finally, you need to remember lots of complicated commands. If you are anything like me, you would rather not to have to think about Docker. You want to focus all of your attention on the code.
So the solution is to use a little script. In my case I use a bash script. You can find it on GitHub. It handles messy commands and file permissions for you.
For years, I tried to avoid having to rely on a script, but it is simply unavoidable to work productively.
Basically, I copy two files at the root of the directory where I want to work (Dockerfile and run), and then I type:
./run bash
And that is all. I am now in a subshell, inside the host directory. I can run programs, compile them. I have complete access to a recent Ubuntu distribution. This works even under the ARM-based servers that I have.
The run script can take other commands as well, so I can use it as part of other scripts.
There is a hack for the second issue. When you launch the container, mount your current directory as the same directory in the container and set your working directory to the same.
docker run -v 'pwd':'pwd' -w 'pwd' ubuntu
Nice trick. I will use it.
As for your command, let us see what happens when we run it…
So, first, if you are using a secured linux (and many of my servers are secured linux), the volume binding won’t work. That’s easily solved.
But then, more critically, as you can see, the file permission is messed up.
You can fix the second issue with uid mapping. I haven’t needed to use that feature myself, so my advice after this point is more speculative. That said, https://seravo.fi/2019/align-user-ids-inside-and-outside-docker-with-subuser-mapping looks like another good solution.
Yes. That’s what my script does.
I use lxd because it feels more like a vm but is a container.
lxc launch ubuntu:Alias containerName
lxc exec containerName -- Command
lxc stop containerName
lxc destroy containerName
“LXD is a next generation system container manager. It offers a user experience similar to virtual machines but using Linux containers instead.
It’s image based with pre-made images available for a wide number of Linux distributions and is built around a very powerful, yet pretty simple, REST API.” from https://linuxcontainers.org/lxd/
I am not bound to Docker. It just happens to be everywhere I need to work. I must admit that my approach is a bit of a hack.
https://docs.fedoraproject.org/en-US/fedora-silverblue/toolbox/
I think in theory Fedora toolbox could be extended to support multiple Linux distros but probably LXD is your best bet for now.
Daniele, I suggest you trying this tool to develop inside a container. I use it and it is great
https://code.visualstudio.com/docs/remote/containers
It is native from vscode
I started with this approach but needed to also cater for services that you deployed which means you really need to include
systemd
as part of your development environment.Though wrapped up in a Makefile, it pretty much looks like:
env TMPDIR=$(pwd) $(pwd)/packer build -on-error=ask -only docker packer.json
docker run -it --rm \
-e container=docker \
-v $(pwd)/data:/opt/VENDOR/PROJECT/data:ro \
-v $(pwd)/nginx:/opt/VENDOR/PROJECT/nginx:ro \
-v $(pwd)/lua:/opt/VENDOR/PROJECT/lua:ro \
-v $(pwd)/public:/opt/VENDOR/PROJECT/public:ro \
-v $(pwd)/src:/opt/VENDOR/PROJECT/public/gpt/src:ro \
--publish=127.0.0.1:8000:80 \
--publish=127.0.0.1:63790:6379 \
--tmpfs /run \
--tmpfs /tmp \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
--cap-add SYS_ADMIN --cap-add NET_ADMIN --cap-add SYS_PTRACE \
--stop-signal SIGPWR \
VENDOR/PROJECT:latest
Packer generates the Docker container but will also cook my production GCP, AWS, Azure, … images too.
packer.json
includes a call out to asetup
script that does the grunt work and installssystemd
(Debian:systemd-sysv
) and sets the entry point to/sbin/init
; there are some other minor details (such aspasswd -d root
so you can do a console login and logging out is via typinghalt
in the container) but this is the gist of it.To interact with the deployment you work from the host side and then just reload your service to make it live. You do need to line up your ducks in a row to get those bind mounts into the right place for your service to just pick up on but when you get it right it makes life a lot easier.
As a note I continue to use a shell script over orchestration tools, as well as other container/VM environments, so everything remains accessible to others. The above seems to work on Windows and macOS too.
At the end of the day this is about making it not just easier for myself, but for everyone else.
…those bind mounts are read-only to act as a polite reminder/guard that you should not edit files directly on your ‘server’ but instead make all changes in the project host side where they can be commited and be re-deployed (hopefully involving just a service reload).
Vagga creates user space containers for developers
This approach has undoubtedly been reimplemented multiple times by many people. Two examples I am aware of:
https://github.com/opencomputeproject/OpenNetworkLinux/blob/master/docker/tools/onlbuilder
https://github.com/Azure/sonic-buildimage/blob/master/Makefile.work#L244
Yes. I do not claim to be original.
LOL, I set this up more than 1 year ago, after tired of having to set up my vim dev env on cluster nodes for debugging: https://github.com/vicaya/vimdev/blob/master/v which doesn’t require privilege mode.
Note, the repo also integrates with docker hub build automation, which is fairly convenient as well.
I require privileged access when running the container because, as a programmer, I need to low-level access (performance counters, and so forth). Otherwise, it would not be required.
Have you looked at singularity?
It’s containers for HPC environments. The idea is the scientist creates his environment (e.g. his laptop) as an image and run on a cluster with this image. Should also work for programming and it’s becoming a standard in HPC.
regards,
Boris
https://singularity.lbl.gov/
I am not exactly sure we need “containers for HPC”. What is the benefit over Docker?
I’m not a real expert, but the main ideas are,
1. security. Docker needs a daemon running as root and the containers running in a root context. Sysadmins don’t like that 😉
Singularity is a program, running under your username.
2. convenience. Singularity automatically mounts your home directory and the work directory (not sure about the later) into the container, so no extra copying back and forth.
3. portability. A singularity container is just one big file. Docker containers are build from layers. You can easily convert them to singularity. The advantage is, you can just copy this one big file to a new cluster and don’t have to rely on that there are the proper versions of your libraries/programs installed.
4. reproducability. If you publish a paper, you just have to preserve the singularity container and your dataset and can reproduce the results years later. Docker containers get updated.
Hope this explains the reasons of this development. It makes it also much easier for sysadmins not to install 10 versions of the same library for different projects. So win-win 🙂
regards,
Boris
So you do not get access to performance counters and the like? I need to be able to access these from time to time. So if privileged access is not feasible, that would make it impossible for me to use such an option.
I do the same with Docker.
I am confused about this comment. The whole point of Docker is not to have to worry about the libraries or programs installed on the host.
I don’t think they do get updated. Not unless you want to get updates. That is an important feature for docker.
I use docker for this very purpose, so that I do not have to install different versions.
Ok,
I think in a self administered environment there might be not much difference. But Docker was meant to run micro services and has different goals. Sure you can tweak docker, but singularity solves some of the problems without tweaking.
I’m not sure about performance counters, but I doubt it, because of being a userland process.
The main idea behind it is, that you can test your program small scale on your local machine (maybe in docker) and once you’re happy, convert your docker image to singularity and run the production on a cluster.
The main point is not a lot of clusters will implement docker, because of the inherent security issues. And I think a lot of research needs as much computing power as possible, so you have to think how to scale out.
Anyhow, it was just a suggestion 🙂
regards,
Boris
Use podman instead of docker most issues will be solved instantly.
Podman is linux only, right?
Yes, but we have vm and wsl to take care of that. (here)
For me, the point of using docker is that I can have the same workflow no matter what machine I am on (Windows, macOS, any Linux distribution).
Yes. I have VirtualBox, but it has major drawbacks for what I have to do. Running podman inside VirtualBox would be a terrible experience compared to just launching docker.
I also don’t want to mess with the systems. I don’t want to hack /etc/apt/sources.list.d under ubuntu if I don’t need to. Docker is easily to install, fully supported, pretty much everywhere.
I realize that what I write may sound unfair, but I think that using Docker, at this point in time, makes a lot of sense.
I did investigate podman. I am sure it is great for some people… but I don’t find it attractive. When I can just start podman containers with a command line under macOS, then maybe…