LWN.net Logo

Distributions

Systemd lightweight containers

By Jake Edge
February 6, 2013

Linux containers, which are implemented using kernel namespaces and control groups, allow processes to operate in an isolated manner, so that the interactions with other processes and kernel services are limited. That makes containers attractive for a variety of tasks, including many that might have once been done using chroot(). As namespace support in the kernel matures, tools to set up and use containers are becoming more prevalent—and easier to use. A feature proposed for Fedora 19 will make use of systemd to create and manage containers.

At first blush, systemd does not really seem like a container-management tool. In fact, detractors might see that as feature creep. But systemd already has infrastructure to spawn containers in the form of the systemd-nspawn command. In addition, creating a new process ID (PID) namespace means that an init program (i.e. PID 1) is needed, which is, of course, the role that systemd normally fills.

Beyond that, systemd is designed around the idea of "socket activation", so that services can be started when the first connection is made to them. That idea can be applied to containers, so that a new container gets started when a connection is made to a certain port. This "container activation" feature is reminiscent of a similar idea in the SELinux-based secure containers feature that was added to Fedora 18. Unlike the secure containers, though, those created with systemd-nspawn are not primarily intended for security. With proper care and feeding, however, they can provide another layer of a "defense in depth".

One goal of the "systemd lightweight containers" feature is to make it easy to run an unmodified Fedora 19 inside the containers created by systemd-nspawn. But it isn't just Fedora that could run in those containers, Debian is another candidate; other distributions are possible too. By installing a minimal system into a directory somewhere—using yum or debootstrap for example—and then pointing systemd-nspawn at it, a usable version of the distribution can be run. Users can log into it from the "console", set up a service or services to run inside of it, and so on. Rudimentary directions on setting that up are part of the feature proposal.

By default, systemd-nspawn sets up separate PID, mount, IPC (inter-process communication), and UTS (host and domain name) namespaces, and executes the given command inside of them. If invoked with the -b option, it will search for an init binary to execute, and pass any arguments to that program. This command:

    systemd-nspawn -bD /srv/rawhide 3
would start a container with a root filesystem at /srv/rawhide, execute the init found there (which would be Rawhide's version of systemd) and pass the runlevel "3" to it. Note that due to a bug in Fedora's audit support (or the kernel, or systemd-nspawn, depending on who you talk to), auditing needs to be disabled in the kernel by booting with "audit=0". Even then, some systems will still experience problems unless they give the container extra capabilities using a command like:
    systemd-nspawn --capability=cap_audit_write,cap_audit_control -bD /srv/rawhide 3
Presumably, that particular problem will be shaken out before long, as giving those capabilities to the container allows it to control auditing in the host—just the kind of thing a container is meant to avoid.

With a simple unit file, the container can be turned into a service that can be started, stopped, and monitored with systemctl. Fans of the systemd journal can use the -j option of systemd-nspawn to effectively export the container's journal to the host. A "journalctl -m" command on the host will then show merged journal entries from the host and any containers.

Multiple containers can be started using the same directory and they won't be able to see each other. Changes to the filesystem will be immediately visible in any container using it, but processes in one container cannot interact with processes in another, nor with the processes on the host.

Using the techniques described in "systemd for Administrators, Part XX", these containers can easily be made socket activated. An incoming connection on a particular host port would spawn the container, which would have unit files that recognized the incoming connection to start the right service on the inside. Users will likely also want to set up sshd inside the container to run on a different port (the host presumably already uses 22) for ease of accessing the container.

There is also an option to run the container in a separate network namespace (--private-network), which essentially turns off networking for the container. Only the loopback interface is available to the container, so no network connections of any kind can be made, though it could still read and write using socket file descriptors that were passed to it. That would be a way to isolate an internet-facing service, for example.

There are a number of different use cases for the feature, but it also looks like something that will be built upon in the future. Allowing for tightened security, possibly using user ID namespaces, would be one possibility. Adding support for network namespaces that have more than just the loopback interface could be interesting as well. Since FESCo approved the feature for Fedora 19 at its February 6 meeting, more users of the feature can be expected. That means that more use cases will be found, which seems likely to lead to expanded functionality, but it's a useful feature as it stands.

Comments (10 posted)

Brief items

Distribution quote of the week

I tend to think that when a project is hurting its users instead of helping them, even with good intentions, something is very wrong about that project.
-- Lionel Dricot

Anaconda didn’t just shed its skin
-- Ryan Lerch

the real solution to all these problems is openCDE, which I look forward to proposing as default in the F20 cycle
-- Jef Spaleta

Comments (none posted)

Fedora 18 for ARM released

The wait for a Fedora 18 build for ARM systems is over. "The Fedora 18 for ARM release includes pre-built images for Versatile Express (QEMU), Trimslice (Tegra), Pandaboard (OMAP4), GuruPlug (Kirkwood), and Beagleboard (OMAP3) hardware platforms. Fedora 18 for ARM also includes an installation tree in the yum repository which may be used to PXE-boot a kickstart-based installation on systems that support this option, such as the Calxeda EnergyCore (HighBank)." See the release announcement for more information.

Full Story (comments: 4)

Linaro 13.01 released

Linaro 13.01 has been released. Linaro is a project that focuses on "consolidating and optimizing open source software for the ARM architecture". Linaro provides a common foundation of system software (kernel, etc.) and tools for various ARM distributions to use. Detailed information on 13.01 can be found in the release notes. "The Developer Platform Team has enabled 64bit HipHop VM development in OpenEmbedded, continued to merge ARMv8 support into the OpenEmbedded platform and upstream, engaged initial support for the Arndale board and released Linux Linaro 3.8-rc4 2013.01."

Comments (none posted)

Distribution News

Red Hat Enterprise Linux

Red Hat Enterprise Linux 3 - 1-Year End Of Support Notice

Red Hat has issued an advisory that Red Hat Enterprise Linux 3 will reach the end of its Extended Lifecycle Support January 30, 2014.

Full Story (comments: none)

Newsletters and articles of interest

Distribution newsletters

Comments (none posted)

5 Ubuntu alternatives worth checking out (ExtremeTech)

ExtremeTech takes a look at five Ubuntu derivatives; BackBox, Bio Linux, PinguyOS, Poseidon, and XBMCbuntu. "Although BackTrack Linux is generally-considered the de facto distribution for penetration testing, BackBox has emerged as a promising Ubuntu alternative. The latest release is BackBox Linux 3 and it features an Ubuntu base with Linux kernel 3.2, a customized XFCE 4.8 desktop, and a number of computer forensics tools. The project began as a small project led by Raffaele Forte approximately three years ago."

Comments (1 posted)

Page editor: Rebecca Sobol
Next page: Development>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds