Systemd and containers
Systemd development is also driven by the idea that the operating system running inside a container should be as similar as possible to that running outside. Applications should not care whether they are running inside a container or not; the operating system should hide the differences between those two settings. Lennart called this concept "integrated isolation." There is isolation in that the container should appear to be its own host, but there should be full integration with the real host; a system-log query, for example, should be able to return results from the host, from one or more containers, or all of the above.
The Docker system, Lennart said, wants to have a single tenant running in each container. Systemd takes a different approach. Daemons that run on the host, for example, should generally run within the container as well.
There are a number of other overriding principles that drive systemd
container development. The core system should be minimal, without a lot of
options for users to wade through — but the door should be open for
alternatives when the need arises. The focus should be on getting the
low-level parts right. "No hacks"; the implementation should be clean.
That is why Btrfs is used with native systemd containers rather than adding
the logical volume
manager to the mix. New standards should not be created when standards
already exist; thus, for example, systemd doesn't include its own packaging
system since Docker and rkt already exist. Most systemd testing is done
inside containers; that simply makes life easier, since it's no longer
necessary to reboot the system for each testing cycle. Finally, systemd is
not focused on cluster-wide orchestration; instead, it provides the
building blocks that allow others to create solutions at that level.
The core of the systemd approach to containers is a daemon called systemd-machined; it serves as a registry for containers running on the system. Other tools that have container awareness integrate with systemd-machined to get their job done. Thus, the systemctl command, when given the -M option, will run a command in an arbitrary container. The -r option causes services running in containers to be listed as well. There is a list-machines subcommand to create a list of containers along with some health status. Other command-line tools, including loginctl, systemd-run, and journalctl, also understand the -M option. The machinectl command-line tool can be used to log into or stop a container. And so on.
Work has been done to integrate with other system tools as well. There is an option to the ps command to see which container owns any given process. Gnome-system-monitor has been extended in similar ways. The name service switch (NSS) subsystem has been enhanced with a module to do hostname resolution for containers. The sd-bus IPC library is container-aware. Importantly, all of these mechanisms work with the various container-management systems available on Linux; they are not limited to systemd's containerization mechanism.
That mechanism is built around systemd-nspawn, a simple container manager. The rkt container system was built around systemd-nspawn at the lowest levels. It is a minimal system which, naturally, integrates with systemd-machined. All it needs is a directory full of operating-system files and it will be able to boot from there. It can also boot raw disk images, the same ones that work with KVM or on bare metal. So, for example, Fedora Cloud images work with systemd-nspawn.
It can also be set up to provide "containers as a service." Each container can be a normal systemd service, meaning that all of systemd's resource-management and process-management features are available.
One of the newer tools is systemd-networkd, added about one year ago. It adds a network-configuration mechanism that, Lennart said, is "minimal" but "really powerful." A systemd-networkd daemon running on the host will, when a virtual Ethernet tunnel appears for a container, pick an IP range for the container, run a DHCP server for it, set up IPv4LL, set up network address translation, etc. When running in the container, instead, it will launch the DHCP client and perform IPv4LL setup. The end result is that networking for containers just works with no additional configuration needed.
Another recent addition is systemd-resolved, which does central hostname resolution. It is a centralized daemon, avoiding many of the hassles that come with NSS resolution; in particular, it maintains a single, global cache. There is no longer a need to run a resolver in each application which, among other things, should reduce the attack surface of the system. Lookups with LLMNR are set up by default.
The systemd-import tool can import and export container images. It understands simple formats like tarballs and raw disk images; it can also import from (but not export to) the Docker format. This tool is typically used via the machinectl pull-raw command to get container images off the net. These images can then be booted directly with systemd-nspawn.
Lennart closed the session with a brief discussion of the concept of stateless systems. These systems have everything they need to boot in /usr; other directories, including /etc and /var, are created on the first boot. With such systems, a factory reset is easily achieved by removing everything except /usr. To achieve a truly stateless system, one just has to always boot in this manner. Stateless systems are useful for containers, since it is possible to instantiate a lot of them and have them set themselves up on the first boot. A single /usr image can be used as a golden master image, making for easy centralized updating. The stateless systems idea is relatively new, but the systemd developers are hoping to push distributors in that direction.
Lennart's slides [PDF] are available for the curious.
[Your editor would like to thank the Linux Foundation for funding his
travel to LinuxCon Japan]
| Index entries for this article | |
|---|---|
| Conference | LinuxCon Japan/2015 |
