User: Password:
|
|
Subscribe / Log in / New account

Containers without Docker at Red Hat

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

December 20, 2017

This article was contributed by Antoine Beaupré


KubeCon+CloudNativeCon NA

The Docker (now Moby) project has done a lot to popularize containers in recent years. Along the way, though, it has generated concerns about its concentration of functionality into a single, monolithic system under the control of a single daemon running with root privileges: dockerd. Those concerns were reflected in a talk by Dan Walsh, head of the container team at Red Hat, at KubeCon + CloudNativeCon. Walsh spoke about the work the container team is doing to replace Docker with a set of smaller, interoperable components. His rallying cry is "no big fat daemons" as he finds them to be contrary to the venerated Unix philosophy.

The quest to modularize Docker

As we saw in an earlier article, the basic set of container operations is not that complicated: you need to pull a container image, create a container from the image, and start it. On top of that, you need to be able to build images and push them to a registry. Most people still use Docker for all of those steps but, as it turns out, Docker isn't the only name in town anymore: an early alternative was rkt, which led to the creation of various standards like CRI (runtime), OCI (image), and CNI (networking) that allow backends like CRI-O or Docker to interoperate with, for example, Kubernetes.

These standards led Red Hat to create a set of "core utils" like the CRI-O runtime that implements the parts of the standards that Kubernetes needs. But Red Hat's OpenShift project needs more than what Kubernetes provides. Developers will want to be able to build containers and push them to the registry. Those operations need a whole different bag of tricks.

It turns out that there are multiple tools to build containers right now. Apart from Docker itself, a session from Michael Ducy of Sysdig reviewed eight image builders, and that's probably not all of them. Ducy identified the ideal build tool as one that would create a minimal image in a reproducible way. A minimal image is one where there is no operating system, only the application and its essential dependencies. Ducy identified Distroless, Smith, and Source-to-Image as good tools to build minimal images, which he called "micro-containers".

A reproducible container is one that you can build multiple times and always get the same result. For that, Ducy said you have to use a "declarative" approach (as opposed to "imperative"), which is understandable given that he comes from the Chef configuration-management world. He gave the examples of Ansible Container, Habitat, nixos-container, and Smith (yes, again) as being good approaches, provided you were familiar with their domain-specific languages. He added that Habitat ships its own supervisor in its containers, which may be superfluous if you already have an external one, like systemd, Docker, or Kubernetes. To complete the list, we should mention the new BuildKit from Docker and Buildah, which is part of Red Hat's Project Atomic.

Building containers with Buildah

[Buildah logo]

Buildah's name apparently comes from Walsh's colorful Boston accent; the Boston theme permeates the branding of the tool: the logo, for example, is a Boston terrier dog (seen at right). This project takes a different approach from Ducy's decree: instead of enforcing a declarative configuration-management approach to containers, why not build simple tools that can be used by your favorite configuration-management tool? If you want to use regular command-line commands like cp (instead of Docker's custom COPY directive, for example), you can. But you can also use Ansible or Puppet, OS-specific or language-specific installers like APT or pip, or whatever other system to provision the content of your containers. This is what building a container looks like with regular shell commands and simply using make to install a binary inside the container:

    # pull a base image, equivalent to a Dockerfile's FROM command
    buildah from redhat

    # mount the base image to work on it
    crt=$(buildah mount)
    cp foo $crt
    make install DESTDIR=$crt

    # then make a snapshot
    buildah commit

An interesting thing with this approach is that, since you reuse normal build tools from the host environment, you can build really minimal images because you don't need to install all the dependencies in the image. Usually, when building a container image, the target application build dependencies need to be installed within the container. For example, building from source usually requires a compiler toolchain in the container, because it is not meant to access the host environment. A lot of containers will also ship basic Unix tools like ps or bash which are not actually necessary in a micro-container. Developers often forget to (or simply can't) remove some dependencies from the built containers; that common practice creates unnecessary overhead and attack surface.

The modular approach of Buildah means you can run at least parts of the build as non-root: the mount command still needs the CAP_SYS_ADMIN capability, but there is an issue open to resolve this. However, Buildah shares the same limitation as Docker in that it can't build containers inside containers. For Docker, you need to run the container in "privileged" mode, which is not possible in certain environments (like GitLab Continuous Integration, for example) and, even when it is possible, the configuration is messy at best.

The manual commit step allows fine-grained control over when to create container snapshots. While in a Dockerfile every line creates a new snapshot, with Buildah commit checkpoints are explicitly chosen, which reduces unnecessary snapshots and saves disk space. This is useful to isolate sensitive material like private keys or passwords which sometimes mistakenly end up in public images as well.

While Docker builds non-standard, Docker-specific images, Buildah produces standard OCI images among other output formats. For backward compatibility, it has a command called build-using-dockerfile or buildah bud that parses normal Dockerfiles. Buildah has a enter command to inspect images from the inside directly and a run command to start containers on the fly. It does all the work without any "fat daemon" running in the background and uses standard tools like runc.

Ducy's criticism of Buildah was that it was not declarative, which made it less reproducible. When allowing shell commands anything can happen: for example, a shell script might download arbitrary binaries, without any way of subsequently retracing where those come from. Shell command effects may vary according to the environment. In contrast to shell-based tools, configuration-management systems like Puppet or Chef are designed to "converge" over a final configuration that is more reliable, at least in theory: in practice you can call shell commands from configuration-management systems. Walsh, however, argued that existing configuration management can be used on top of Buildah, but it doesn't force users down that path. This fits well with the classic "separation" principle of the Unix philosophy ("mechanism not policy").

At this point, Buildah is in beta and Red Hat is working on integrating it into OpenShift. I have tested Buildah while writing this article and, short of some documentation issues, it generally works reliably. It could use some polishing in error handling, but it is definitely a great asset to add to your container toolbox.

Replacing the rest of the Docker command-line

Walsh continued his presentation by giving an overview of another project that Red Hat is working on, tentatively called libpod. The name derives from a "pod" in Kubernetes, which is a way to group containers inside a host, to share namespaces, for example.

Libpod includes the kpod command to inspect and manipulate container storage directly. Walsh explained this can be useful if, for example, dockerd hangs or if a Kubernetes cluster crashes. kpod is basically an independent re-implementation of the docker command-line tool. There is a command to list running containers (kpod ps) or images (kpod images). In fact, there is a translation cheat sheet documenting all Docker commands with a kpod equivalent.

One of the nice things with the modular approach is that when you run a container with kpod run, the container is directly started as a subprocess of the current shell, instead of a subprocess of dockerd. In theory, this allows running containers directly from systemd, removing the duplicate work dockerd is doing. It enables things like socket-activated containers, which is something that is not straightforward to do with Docker, or even with Kubernetes right now. In my experiments, however, I have found that containers started with kpod lack some fundamental functionality, namely networking (!), although there is an issue in progress to complete that implementation.

A final command we haven't covered is push. While the above commands provide a good process for working with local containers, they don't cover remote registries, which allow developers to actively collaborate on application packaging. Registries are also an essential part of a continuous-deployment framework. This is where the skopeo project comes in. Skopeo is another Atomic project that "performs various operations on container images and image repositories", according to the README file. It was originally designed to inspect the contents of container registries without actually downloading the sometimes voluminous images as docker pull does. Docker refused patches to support inspection, suggesting the creation of a separate tool, which led to Skopeo. After pull, push was the logical next step and Skopeo can now do a bunch of other things like copying and converting images between registries without having to store a copy locally. Because this functionality was useful to other projects as well, a lot of the Skopeo code now lives in a reusable library called containers/image. That library is in turn used by Pivotal, Google's container-diff, kpod push, and buildah push.

kpod is not directly tied to Kubernetes, so the name might change in the future — especially since Red Hat legal has not cleared the name yet. (In fact, just as this article was going to "press", the name was changed to podman.) The team wants to implement more "pod-level" commands which would allow operations on multiple containers, a bit like what docker compose might do. But at that level, a better tool might be Kompose which can execute Compose YAML files into a Kubernetes cluster. Some Docker commands (like swarm) will never be implemented, on purpose, as they are best left for Kubernetes itself to handle.

It seems that the effort to modularize Docker that started a few years ago is finally bearing fruit. While, at this point, kpod is under heavy development and probably should not be used in production, the design of those different tools is certainly interesting; a lot of it is ready for development environments. Right now, the only way to install libpod is to compile it from source, but we should expect packages coming out for your favorite distribution eventually.

[We would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend KubeCon + CloudNativeCon.]


(Log in to post comments)

Containers without Docker at Red Hat

Posted Dec 21, 2017 8:24 UTC (Thu) by branden (guest, #7029) [Link]

Nobody must breathe a word of this to Lennart Poettering.

Containers without Docker at Red Hat

Posted Dec 21, 2017 10:08 UTC (Thu) by zuki (subscriber, #41808) [Link]

Classy.

Containers without Docker at Red Hat

Containers without Docker at Red Hat

Posted Dec 21, 2017 18:41 UTC (Thu) by johncktx (subscriber, #113610) [Link]

News release that systemd absorbes this project as well as ls and pwd in 3... 2... 1...

Containers without Docker at Red Hat

Posted Dec 21, 2017 19:05 UTC (Thu) by me@jasonclinton.com (subscriber, #52701) [Link]

Please keep the discussion constructive.

Containers without Docker at Red Hat

Posted Dec 21, 2017 18:45 UTC (Thu) by cyperpunks (subscriber, #39406) [Link]

RH should work on systemd replacement with a "no big fat daemons" design before touching
other peoples projects.


Containers without Docker at Red Hat

Posted Dec 21, 2017 19:09 UTC (Thu) by pizza (subscriber, #46) [Link]

http://community.redhat.com/software/

If you're so against Red Hat "touching other people's projects" then you should just abandon Linux (including Android) and probably get out of this field altogether.

As an added bonus -- you won't ever have to hear about systemd ever again.

Containers without Docker at Red Hat

Posted Dec 21, 2017 19:17 UTC (Thu) by anarcat (subscriber, #66354) [Link]

So I am sure that the irony of that is not lost on any LWN reader here. But please consider that Red Hat is a large organization with multiple people involved. Furthermore, I would be very grateful if the comments on the article focused on the actual content than drift off in yet another systemd flamewar.

We have had enough of that for a lifetime already, and you know, it's the holiday season and all... no sense in waging war in this time of year. ;)

Containers without Docker at Red Hat

Posted Dec 21, 2017 19:29 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

"So I am sure that the irony of that is not lost on any LWN reader here. But please consider that Red Hat is a large organization with multiple people involved"

It is also useful to think of the context and remember that systemd is a umbrella project with multiple software components, one of which is a init system. It is not a single monolithic daemon. Unlike a init system, containers don't need a daemon at all.

I for one am glad to see more experimentation in the container world to figure out a scalable workflow to manage them.

Containers without Docker at Red Hat

Posted Jan 1, 2018 8:51 UTC (Mon) by fuuuuuuc (guest, #120531) [Link]

There's certainly more in PID1 than there used to be. For example the Dynamic User's stuff could have been moved out of it, but it still lives in PID1. Not calling it an init system but rather an execution engine is more fitting.

Containers without Docker at Red Hat

Posted Dec 31, 2017 10:53 UTC (Sun) by jospoortvliet (subscriber, #33164) [Link]

One could just as well complain that CNCF creates such a monolithic container management stack as it is one project doing lots of separate, small toolz working together, just like systemd. Heck bsd, Unix itself are monolithic if systemd is... unix is one 'thing' that consists of lots of independent small moving parts developed to work as one. Just like systemd.

Containers without Docker at Red Hat

Posted Dec 21, 2017 23:15 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]

Which "big fat daemon" in the https://github.com/systemd/systemd repo are you referring to?

# fedora packaged binary names
* `systemd` (system init *and* gnome user session)
* `systemd-hostnamed`
* `systemd-importd`
* `systemd-journald`
* `systemd-localed`
* `systemd-logind`
* `systemd-machined`
* `systemd-networkd`
* `systemd-resolved`
* `systemd-socket-proxyd`
* `systemd-timedated`
* `systemd-timesyncd`
* `systemd-udevd`

AFAIK - all of these components make heavy use of common libraries and have large portions of their functionality factored out into shared libs.

Eg.

```
$ ldd /usr/lib/systemd/systemd
linux-vdso.so.1 (0x00007ffd22f9b000)
libsystemd-shared-233.so => /usr/lib/systemd/libsystemd-shared-233.so (0x00007fc2b8582000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fc2b835a000)
librt.so.1 => /lib64/librt.so.1 (0x00007fc2b8152000)
libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007fc2b7f10000)
libpam.so.0 => /lib64/libpam.so.0 (0x00007fc2b7d01000)
libaudit.so.1 => /lib64/libaudit.so.1 (0x00007fc2b7ad9000)
libkmod.so.2 => /lib64/libkmod.so.2 (0x00007fc2b78c3000)
libmount.so.1 => /lib64/libmount.so.1 (0x00007fc2b766f000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fc2b7458000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc2b7239000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc2b6e68000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007fc2b6c63000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fc2b6a4a000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fc2b6824000)
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fc2b660f000)
libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007fc2b6301000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fc2b60fd000)
libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007fc2b5ee9000)
libacl.so.1 => /lib64/libacl.so.1 (0x00007fc2b5ce0000)
libidn.so.11 => /lib64/libidn.so.11 (0x00007fc2b5aac000)
libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fc2b5861000)
libcryptsetup.so.4 => /lib64/libcryptsetup.so.4 (0x00007fc2b5638000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc2b87ba000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fc2b53c5000)
libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007fc2b51c0000)
libz.so.1 => /lib64/libz.so.1 (0x00007fc2b4fa9000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fc2b4da4000)
libattr.so.1 => /lib64/libattr.so.1 (0x00007fc2b4b9f000)
libdevmapper.so.1.02 => /lib64/libdevmapper.so.1.02 (0x00007fc2b494b000)
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007fc2b8805000)
libsepol.so.1 => /lib64/libsepol.so.1 (0x00007fc2b46b4000)
libudev.so.1 => /lib64/libudev.so.1 (0x00007fc2b87e2000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc2b439e000)
```

Containers without Docker at Red Hat

Posted Dec 29, 2017 22:04 UTC (Fri) by aleXXX (subscriber, #2742) [Link]

Why are you listing the output of ldd on systemd ?
Showing that it links against e.g. libz, liblzma, libacl and libpthread says what ?

Alex

Containers without Docker at Red Hat

Posted Dec 29, 2017 23:02 UTC (Fri) by jhoblitt (subscriber, #77733) [Link]

That is it not monolithic.

Containers without Docker at Red Hat

Posted Dec 31, 2017 10:56 UTC (Sun) by jospoortvliet (subscriber, #33164) [Link]

It is as monolithic as Unix itself: lots of small components working towards one purpose. The irony of people complaining it isn't "Unix like" is epic...

Containers without Docker at Red Hat

Posted Jan 4, 2018 20:58 UTC (Thu) by HelloWorld (guest, #56129) [Link]

But components are supposed to be processes that communicate with pipes! That's very useful because it's much slower than calling some library functions, and you get to enjoy all the bugs that inevitably creep in when you need to serialize everything into byte streams.

Oh wait, those aren't actually advantages!

Containers without Docker at Red Hat

Posted Jan 5, 2018 21:50 UTC (Fri) by Wol (guest, #4433) [Link]

While SysVInit is a spaghetti-collection of scripts that do lots of things, badly ... :-)

Cheers,
Wol

Containers without Docker at Red Hat

Posted Dec 21, 2017 23:33 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]

Is it possible with `buildah` (or any of the the other tools) to produce an image with multiple layers? I realize much of the development focus in this area is on OCI but most of my experience is working with docker.

When I want a docker image to be essentially a base image + one large layer, I've found `packer` to be a good model (it actually creates a "3rd" metadata only layer too). However, sometimes it is still desirable to build up an image from multiple layers (to minimizing layer churn for consumers and/or rebuild times) but without having a bunch of dead blocks from installing/uninstalling a tool used as part of the image build operation that does not need to be part of the final image.

A good example of this pattern is using `puppet` to handle app install and static configuration. When using `docker-build`, in order to avoid bloating a layer with a bunch of blocks from files that will be delete, it requires either installing a shell script that is invoked via a `RUN` or chaining multiple shell commands together under a single `RUN` with `&&`, `;`, etc. About a week ago I was lamenting to co-workers that it would be nice if `docker-build`, `rocker`, etc. supported a transnational mode where a layer wasn't baked until some sort of `COMMIT` directive was given.

Containers without Docker at Red Hat

Posted Dec 22, 2017 10:30 UTC (Fri) by cyphar (subscriber, #110703) [Link]

With umoci[1] you can tailor exactly at which point you "snapshot" the layer, if that helps. It also allows you to set multiple metadata flags in one go. It can even do these things as root. umoci only supports OCI images though.

[1]: https://umo.ci/

Containers without Docker at Red Hat

Posted Dec 22, 2017 10:27 UTC (Fri) by cyphar (subscriber, #110703) [Link]

I'm a little disappointed that umoci[1] wasn't mentioned alongside buildah. It was around for longer, has better support for the OCI image specification, is more widely used, supports rootless containers, doesn't use graphdrivers that were derived from Docker, and isn't tied to the Docker concept of images. I have a PoC[2] of how you can use umoci to build Dockerfiles without requiring Docker or root. But it doesn't have a logo, so I guess it doesn't matter.

[1]: https://github.com/openSUSE/umoci/
[2]: https://github.com/cyphar/orca-build/

Containers without Docker at Red Hat

Posted Dec 22, 2017 11:21 UTC (Fri) by SomewhatAmazing (guest, #120306) [Link]

Someone should write an article on umoci and submit it to LWN for publication.

Containers without Docker at Red Hat

Posted Dec 22, 2017 14:05 UTC (Fri) by anarcat (subscriber, #66354) [Link]

The reason why umoci wasn't mentioned in this article is that this is the first time I ever hear of it. I didn't hear about it at the conference, it wasn't covered by Ducy in his talk, and I never encountered it in my work with fellow sysadmins. There were three different ways for me to notice it: doing a talk, convincing someone to make one, or making it popular enough that my friends actually use it; it failed to do all three.

Now, it may be a fantastic tool, and in fact, looking at it it could even become the default if it gets merged in the upstream lib: that would be great! It is very similar in design to buildah too. But frankly, the point is not that there's no logo, it's genuinely that I didn't (and couldn't) research *all* the build tools in existence. Ducy presented about 8 of those and I talked about just one, because it was specifically introduced in a talk in details, and flowed well with the other articles (e.g. the runtimes one with CRI-O).

Buildah didn't need a logo to make it into my article: what it did was to be fulfill the three criterias above, I'm sorry to disappoint you, but there are at least 10 other projects than buildah that are in the same situation, and we can't please them all.

Containers without Docker at Red Hat

Posted Dec 23, 2017 22:50 UTC (Sat) by cyphar (subscriber, #110703) [Link]

The reason for my annoyance is that the talk was given by Dan Walsh. The thing is, Dan Walsh is more than aware of the existence of umoci, I've talked to him and several other people on the containers team at Red Hat about it (a while before buildah existed -- and actually buildah was one usecase where I said that umoci would be useful and asked whether they'd want to use it). But I assume he just elected not to mention it when prepping for his talk.

I'm not annoyed at you at all, I'm just frustrated that people feel the need to reinvent others' tools and/or not provide credit. The developer community is incredibly hostile to one another's work for some reason that I can't understand -- we're all working on the same software at the end of the day.

And the logo thing was just a snide remark, I didn't mean it to be taken seriously.

Containers without Docker at Red Hat

Posted Dec 30, 2017 0:56 UTC (Sat) by anarcat (subscriber, #66354) [Link]

Just to make sure things are clear here: I covered multiple talks at KubeCon. There was a talk by Ducy who covered multiple build system, but unfortunately didn't mention yours. Walsh's talk *only* mentioned buildah (more or less), which is understandable since it's the product they are developing.

I do agree, however, it is best when people collaborate on such projects. It would be great if RH could comment on why exactly they felt the need to write yet another one of those, on top of the "fat daemons" rhetoric. :)

Containers without Docker at Red Hat

Posted Dec 24, 2017 15:06 UTC (Sun) by robert_s (subscriber, #42402) [Link]

All this flapping around trying to come up with new abstractions.

Just use Nix. At it's foundation it has a single simple & powerful abstraction which cuts through most of this and most of the belief that managing packages and managing containers are really different problems.

Containers without Docker at Red Hat

Posted Jan 4, 2018 21:05 UTC (Thu) by HelloWorld (guest, #56129) [Link]

I tried installing NixOS on my personal machine recently but couldn't get some of the desktop features I use to run properly. Its package repos also seems a little outdated, with Plasma 5.10 and Firefox 56. I also couldn't get NetworkManager running. I also tried to install some Plasma widgets using nix-env -i plasma-nm, only to find out that Plasma widgets installed this way don't work. Apparently they need to be installed globally, which sort of defeats the point of having a per-user nix env.

That said I probably didn't invest as much time as I should have and lazily installed Fedora again since I already know how that works…

Containers without Docker at Red Hat

Posted Jan 6, 2018 20:53 UTC (Sat) by micah (subscriber, #20908) [Link]

Just a point of clarification about this section:

"However, Buildah shares the same limitation as Docker in that it can't build containers inside containers. For Docker, you need to run the container in "privileged" mode, which is not possible in certain environments (like GitLab Continuous Integration, for example) and, even when it is possible, the configuration is messy at best."

I'm doing DIND (Docker in Docker) in gitlab's CI for building images and uploading them to their respective registries, and it wasn't messy at all. I wouldn't say you "can't build containers inside of containers" and that it is 'not possible in ... GitLab Continous Integration" -- because I'm doing that.

The linked article (from "messy") seems to have overly complicated their own problem. If you don't try to do something complicated, then it doesn't get messy. If you have trouble runny BTRFS on top of BTRFS, I can't say I'm surprised that you are trying to do something complicated, and the result is... something complicated. Trying to do something messy and complicated with build caching, and then being surprised that its messy and complicated is why 'it got worse', not because DIND is messy and complicated.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds