| LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
The Docker (now Moby) project has done a lot to popularize containers in recent years. Along the way, though, it has generated concerns about its concentration of functionality into a single, monolithic system under the control of a single daemon running with root privileges: dockerd. Those concerns were reflected in a talk by Dan Walsh, head of the container team at Red Hat, at KubeCon + CloudNativeCon. Walsh spoke about the work the container team is doing to replace Docker with a set of smaller, interoperable components. His rallying cry is "no big fat daemons" as he finds them to be contrary to the venerated Unix philosophy.
As we saw in an earlier article, the
basic set of
container operations is not that complicated: you need to pull a
container image, create a container from the image, and start it. On
top of that, you need to be able to build images and push them to a
registry. Most people still use Docker for all of those steps but, as it
turns out, Docker isn't the only name in town anymore: an early
alternative was rkt, which led to the creation of various standards
like CRI (runtime), OCI (image), and CNI (networking) that allow
backends like CRI-O or Docker to
interoperate with, for example,
Kubernetes.
These standards led Red Hat to create a set of "core utils" like the CRI-O runtime that implements the parts of the standards that Kubernetes needs. But Red Hat's OpenShift project needs more than what Kubernetes provides. Developers will want to be able to build containers and push them to the registry. Those operations need a whole different bag of tricks.
It turns out that there are multiple tools to build containers right now. Apart from Docker itself, a session from Michael Ducy of Sysdig reviewed eight image builders, and that's probably not all of them. Ducy identified the ideal build tool as one that would create a minimal image in a reproducible way. A minimal image is one where there is no operating system, only the application and its essential dependencies. Ducy identified Distroless, Smith, and Source-to-Image as good tools to build minimal images, which he called "micro-containers".
A reproducible container is one that you can build multiple times and always get the same result. For that, Ducy said you have to use a "declarative" approach (as opposed to "imperative"), which is understandable given that he comes from the Chef configuration-management world. He gave the examples of Ansible Container, Habitat, nixos-container, and Smith (yes, again) as being good approaches, provided you were familiar with their domain-specific languages. He added that Habitat ships its own supervisor in its containers, which may be superfluous if you already have an external one, like systemd, Docker, or Kubernetes. To complete the list, we should mention the new BuildKit from Docker and Buildah, which is part of Red Hat's Project Atomic.
Buildah's name apparently comes from Walsh's colorful Boston accent;
the Boston
theme permeates the branding of the tool:
the logo, for example, is a Boston terrier dog (seen at right). This
project takes a different approach from Ducy's decree: instead
of enforcing a declarative configuration-management approach to
containers, why not build simple tools that can be used by
your favorite configuration-management tool? If you want to use
regular command-line commands like cp (instead of Docker's custom
COPY directive, for example), you can. But you can also use Ansible
or Puppet, OS-specific or language-specific installers like APT or pip, or
whatever other system to provision the content of your
containers. This is what building a container looks like with regular
shell commands and simply using make to install a binary inside the
container:
# pull a base image, equivalent to a Dockerfile's FROM command
buildah from redhat
# mount the base image to work on it
crt=$(buildah mount)
cp foo $crt
make install DESTDIR=$crt
# then make a snapshot
buildah commit
An interesting thing with this approach is that, since you reuse
normal build tools from the host environment, you can build really
minimal images because you don't need to install all the dependencies
in the image. Usually, when building a container image, the target
application build dependencies need to be installed within the
container. For example, building from source
usually requires a compiler toolchain in the container, because it is not
meant to access the host environment.
A lot of containers will also
ship basic Unix tools like ps or bash which are
not actually
necessary in a micro-container. Developers often forget to (or simply
can't) remove some dependencies from the built containers; that common
practice creates unnecessary overhead and attack surface.
The modular approach of Buildah means you can run at least parts of
the build as non-root: the mount command still needs the
CAP_SYS_ADMIN capability, but there is an issue open to resolve
this. However, Buildah shares the same limitation as
Docker in that it can't build containers inside containers. For Docker,
you need to run the container in "privileged" mode, which is not
possible in certain environments (like GitLab Continuous
Integration, for example) and, even when it is possible, the
configuration is
messy at best.
The manual commit step allows fine-grained control over when to create container snapshots. While in a Dockerfile every line creates a new snapshot, with Buildah commit checkpoints are explicitly chosen, which reduces unnecessary snapshots and saves disk space. This is useful to isolate sensitive material like private keys or passwords which sometimes mistakenly end up in public images as well.
While Docker builds non-standard, Docker-specific images, Buildah
produces standard OCI images among other output formats. For
backward compatibility, it has a command called
build-using-dockerfile or buildah bud that parses normal
Dockerfiles. Buildah has a enter command to inspect images from
the inside directly and a run command to start containers on the
fly. It does all the work without any "fat daemon" running in the
background and uses standard tools like runc.
Ducy's criticism of Buildah was that it was not declarative, which made it less reproducible. When allowing shell commands anything can happen: for example, a shell script might download arbitrary binaries, without any way of subsequently retracing where those come from. Shell command effects may vary according to the environment. In contrast to shell-based tools, configuration-management systems like Puppet or Chef are designed to "converge" over a final configuration that is more reliable, at least in theory: in practice you can call shell commands from configuration-management systems. Walsh, however, argued that existing configuration management can be used on top of Buildah, but it doesn't force users down that path. This fits well with the classic "separation" principle of the Unix philosophy ("mechanism not policy").
At this point, Buildah is in beta and Red Hat is working on integrating it into OpenShift. I have tested Buildah while writing this article and, short of some documentation issues, it generally works reliably. It could use some polishing in error handling, but it is definitely a great asset to add to your container toolbox.
Walsh continued his presentation by giving an overview of another project that Red Hat is working on, tentatively called libpod. The name derives from a "pod" in Kubernetes, which is a way to group containers inside a host, to share namespaces, for example.
Libpod includes the kpod command to inspect and manipulate container
storage directly. Walsh explained this can be useful if, for example,
dockerd hangs or if a Kubernetes cluster
crashes. kpod is basically
an independent re-implementation of the docker command-line
tool. There is a command to list running containers
(kpod ps) or images (kpod images). In
fact, there is a
translation
cheat sheet documenting all Docker commands with a
kpod equivalent.
One of the nice things with the modular approach is that when you run
a container with kpod run, the container is directly
started as a
subprocess of the current shell, instead of a subprocess of
dockerd. In theory, this allows running containers directly from
systemd, removing the duplicate work dockerd is doing. It enables
things like socket-activated
containers, which is something
that is not
straightforward to do with Docker, or even with
Kubernetes
right now. In my experiments, however, I have found that containers
started with kpod lack some fundamental functionality, namely
networking (!), although there is an issue in progress
to complete that implementation.
A final command we haven't covered is push. While the above
commands
provide a good process for working with local containers, they don't cover
remote registries, which allow developers to actively collaborate on
application packaging. Registries are also an essential part of a
continuous-deployment
framework. This is where the skopeo project comes
in. Skopeo is another Atomic project that "performs various
operations on container images and image repositories", according to
the README file. It was
originally designed to inspect the contents of container registries without
actually downloading the sometimes voluminous images as docker pull
does. Docker refused patches to support inspection, suggesting
the creation of a separate tool, which led to
Skopeo. After pull, push was the logical next step and Skopeo can
now do a bunch of other things like copying and converting images
between registries without having to store a copy locally. Because
this functionality was useful to other projects as well, a lot of the
Skopeo code now lives in a reusable library called
containers/image. That library is in turn used by Pivotal,
Google's container-diff, kpod push, and buildah push.
kpod is not directly tied to Kubernetes, so the name might change in
the future — especially since Red Hat legal has not cleared
the name yet. (In fact, just as this article was going to "press", the name
was changed to podman.) The team wants to implement more
"pod-level" commands
which would allow operations on multiple containers, a bit like what
docker compose might do. But at that level, a better tool might
be Kompose which can execute Compose YAML files into a Kubernetes
cluster. Some Docker commands (like swarm) will never be implemented,
on purpose, as they are best left for Kubernetes itself to handle.
It seems that the effort to modularize Docker that started a few years ago is finally bearing fruit. While, at this point, kpod is under heavy development and probably should not be used in production, the design of those different tools is certainly interesting; a lot of it is ready for development environments. Right now, the only way to install libpod is to compile it from source, but we should expect packages coming out for your favorite distribution eventually.
[We would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend KubeCon + CloudNativeCon.]
Containers without Docker at Red Hat
Posted Dec 21, 2017 8:24 UTC (Thu) by branden (guest, #7029) [Link]
Containers without Docker at Red Hat
Posted Dec 21, 2017 10:08 UTC (Thu) by zuki (subscriber, #41808) [Link]
Containers without Docker at Red Hat
Posted Dec 21, 2017 14:09 UTC (Thu) by anarcat (subscriber, #66354) [Link]
Issues reported while writing this article:
Containers without Docker at Red Hat
Posted Dec 21, 2017 18:41 UTC (Thu) by johncktx (subscriber, #113610) [Link]
Containers without Docker at Red Hat
Posted Dec 21, 2017 19:05 UTC (Thu) by me@jasonclinton.com (subscriber, #52701) [Link]
Please keep the discussion constructive.
Containers without Docker at Red Hat
Posted Dec 21, 2017 18:45 UTC (Thu) by cyperpunks (subscriber, #39406) [Link]
Containers without Docker at Red Hat
Posted Dec 21, 2017 19:09 UTC (Thu) by pizza (subscriber, #46) [Link]
If you're so against Red Hat "touching other people's projects" then you should just abandon Linux (including Android) and probably get out of this field altogether.
As an added bonus -- you won't ever have to hear about systemd ever again.
Containers without Docker at Red Hat
Posted Dec 21, 2017 19:17 UTC (Thu) by anarcat (subscriber, #66354) [Link]
We have had enough of that for a lifetime already, and you know, it's the holiday season and all... no sense in waging war in this time of year. ;)
Containers without Docker at Red Hat
Posted Dec 21, 2017 19:29 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]
It is also useful to think of the context and remember that systemd is a umbrella project with multiple software components, one of which is a init system. It is not a single monolithic daemon. Unlike a init system, containers don't need a daemon at all.
I for one am glad to see more experimentation in the container world to figure out a scalable workflow to manage them.
Containers without Docker at Red Hat
Posted Jan 1, 2018 8:51 UTC (Mon) by fuuuuuuc (guest, #120531) [Link]
Containers without Docker at Red Hat
Posted Dec 31, 2017 10:53 UTC (Sun) by jospoortvliet (subscriber, #33164) [Link]
Containers without Docker at Red Hat
Posted Dec 21, 2017 23:15 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]
# fedora packaged binary names
* `systemd` (system init *and* gnome user session)
* `systemd-hostnamed`
* `systemd-importd`
* `systemd-journald`
* `systemd-localed`
* `systemd-logind`
* `systemd-machined`
* `systemd-networkd`
* `systemd-resolved`
* `systemd-socket-proxyd`
* `systemd-timedated`
* `systemd-timesyncd`
* `systemd-udevd`
AFAIK - all of these components make heavy use of common libraries and have large portions of their functionality factored out into shared libs.
Eg.
```
$ ldd /usr/lib/systemd/systemd
linux-vdso.so.1 (0x00007ffd22f9b000)
libsystemd-shared-233.so => /usr/lib/systemd/libsystemd-shared-233.so (0x00007fc2b8582000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fc2b835a000)
librt.so.1 => /lib64/librt.so.1 (0x00007fc2b8152000)
libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007fc2b7f10000)
libpam.so.0 => /lib64/libpam.so.0 (0x00007fc2b7d01000)
libaudit.so.1 => /lib64/libaudit.so.1 (0x00007fc2b7ad9000)
libkmod.so.2 => /lib64/libkmod.so.2 (0x00007fc2b78c3000)
libmount.so.1 => /lib64/libmount.so.1 (0x00007fc2b766f000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fc2b7458000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc2b7239000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc2b6e68000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007fc2b6c63000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fc2b6a4a000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fc2b6824000)
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fc2b660f000)
libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007fc2b6301000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc2b60fd000)
libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007fc2b5ee9000)
libacl.so.1 => /lib64/libacl.so.1 (0x00007fc2b5ce0000)
libidn.so.11 => /lib64/libidn.so.11 (0x00007fc2b5aac000)
libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fc2b5861000)
libcryptsetup.so.4 => /lib64/libcryptsetup.so.4 (0x00007fc2b5638000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc2b87ba000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fc2b53c5000)
libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007fc2b51c0000)
libz.so.1 => /lib64/libz.so.1 (0x00007fc2b4fa9000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fc2b4da4000)
libattr.so.1 => /lib64/libattr.so.1 (0x00007fc2b4b9f000)
libdevmapper.so.1.02 => /lib64/libdevmapper.so.1.02 (0x00007fc2b494b000)
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007fc2b8805000)
libsepol.so.1 => /lib64/libsepol.so.1 (0x00007fc2b46b4000)
libudev.so.1 => /lib64/libudev.so.1 (0x00007fc2b87e2000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc2b439e000)
```
Containers without Docker at Red Hat
Posted Dec 29, 2017 22:04 UTC (Fri) by aleXXX (subscriber, #2742) [Link]
Alex
Containers without Docker at Red Hat
Posted Dec 29, 2017 23:02 UTC (Fri) by jhoblitt (subscriber, #77733) [Link]
Containers without Docker at Red Hat
Posted Dec 31, 2017 10:56 UTC (Sun) by jospoortvliet (subscriber, #33164) [Link]
Containers without Docker at Red Hat
Posted Jan 4, 2018 20:58 UTC (Thu) by HelloWorld (guest, #56129) [Link]
Oh wait, those aren't actually advantages!
Containers without Docker at Red Hat
Posted Jan 5, 2018 21:50 UTC (Fri) by Wol (guest, #4433) [Link]
Cheers,
Wol
Containers without Docker at Red Hat
Posted Dec 21, 2017 23:33 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]
When I want a docker image to be essentially a base image + one large layer, I've found `packer` to be a good model (it actually creates a "3rd" metadata only layer too). However, sometimes it is still desirable to build up an image from multiple layers (to minimizing layer churn for consumers and/or rebuild times) but without having a bunch of dead blocks from installing/uninstalling a tool used as part of the image build operation that does not need to be part of the final image.
A good example of this pattern is using `puppet` to handle app install and static configuration. When using `docker-build`, in order to avoid bloating a layer with a bunch of blocks from files that will be delete, it requires either installing a shell script that is invoked via a `RUN` or chaining multiple shell commands together under a single `RUN` with `&&`, `;`, etc. About a week ago I was lamenting to co-workers that it would be nice if `docker-build`, `rocker`, etc. supported a transnational mode where a layer wasn't baked until some sort of `COMMIT` directive was given.
Containers without Docker at Red Hat
Posted Dec 22, 2017 10:30 UTC (Fri) by cyphar (subscriber, #110703) [Link]
[1]: https://umo.ci/
Containers without Docker at Red Hat
Posted Dec 22, 2017 10:27 UTC (Fri) by cyphar (subscriber, #110703) [Link]
[1]: https://github.com/openSUSE/umoci/
[2]: https://github.com/cyphar/orca-build/
Containers without Docker at Red Hat
Posted Dec 22, 2017 11:21 UTC (Fri) by SomewhatAmazing (guest, #120306) [Link]
Containers without Docker at Red Hat
Posted Dec 22, 2017 14:05 UTC (Fri) by anarcat (subscriber, #66354) [Link]
Now, it may be a fantastic tool, and in fact, looking at it it could even become the default if it gets merged in the upstream lib: that would be great! It is very similar in design to buildah too. But frankly, the point is not that there's no logo, it's genuinely that I didn't (and couldn't) research *all* the build tools in existence. Ducy presented about 8 of those and I talked about just one, because it was specifically introduced in a talk in details, and flowed well with the other articles (e.g. the runtimes one with CRI-O).
Buildah didn't need a logo to make it into my article: what it did was to be fulfill the three criterias above, I'm sorry to disappoint you, but there are at least 10 other projects than buildah that are in the same situation, and we can't please them all.
Containers without Docker at Red Hat
Posted Dec 23, 2017 22:50 UTC (Sat) by cyphar (subscriber, #110703) [Link]
I'm not annoyed at you at all, I'm just frustrated that people feel the need to reinvent others' tools and/or not provide credit. The developer community is incredibly hostile to one another's work for some reason that I can't understand -- we're all working on the same software at the end of the day.
And the logo thing was just a snide remark, I didn't mean it to be taken seriously.
Containers without Docker at Red Hat
Posted Dec 30, 2017 0:56 UTC (Sat) by anarcat (subscriber, #66354) [Link]
I do agree, however, it is best when people collaborate on such projects. It would be great if RH could comment on why exactly they felt the need to write yet another one of those, on top of the "fat daemons" rhetoric. :)
Containers without Docker at Red Hat
Posted Dec 24, 2017 15:06 UTC (Sun) by robert_s (subscriber, #42402) [Link]
Just use Nix. At it's foundation it has a single simple & powerful abstraction which cuts through most of this and most of the belief that managing packages and managing containers are really different problems.
Containers without Docker at Red Hat
Posted Jan 4, 2018 21:05 UTC (Thu) by HelloWorld (guest, #56129) [Link]
That said I probably didn't invest as much time as I should have and lazily installed Fedora again since I already know how that works…
Containers without Docker at Red Hat
Posted Jan 6, 2018 20:53 UTC (Sat) by micah (subscriber, #20908) [Link]
"However, Buildah shares the same limitation as Docker in that it can't build containers inside containers. For Docker, you need to run the container in "privileged" mode, which is not possible in certain environments (like GitLab Continuous Integration, for example) and, even when it is possible, the configuration is messy at best."
I'm doing DIND (Docker in Docker) in gitlab's CI for building images and uploading them to their respective registries, and it wasn't messy at all. I wouldn't say you "can't build containers inside of containers" and that it is 'not possible in ... GitLab Continous Integration" -- because I'm doing that.
The linked article (from "messy") seems to have overly complicated their own problem. If you don't try to do something complicated, then it doesn't get messy. If you have trouble runny BTRFS on top of BTRFS, I can't say I'm surprised that you are trying to do something complicated, and the result is... something complicated. Trying to do something messy and complicated with build caching, and then being surprised that its messy and complicated is why 'it got worse', not because DIND is messy and complicated.
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds