LSS: Secure Linux containers
While the Linux Security Summit (LSS) was held later in the week, it was logically part of the minisummits that accompanied the Kernel Summit—organizer James Morris made a forward-reference report on LSS as part of the minisummit reports. Day one was filled with talks on various topics of interest to the assembled security developers, while day two was mostly devoted to reports from the kernel security subsystems. We plan to write up much of LSS over the coming weeks; the first installment covers a talk given by SELinux developer Dan Walsh on secure Linux containers.
Walsh's opening slide had a picture of a "secure" Linux container (label seen at right)—a plastic "unix ware" storage container—but his talk was a tad more serious. Application sandboxes are becoming more common for isolating general-purpose applications from each other. There are a variety of Linux tools that can be used to create sandboxes, including seccomp, SELinux, the Java virtual machine, and virtualization. The idea behind sandboxing is the age-old concept of "defense in depth".
There is another mechanism that can be used to isolate applications: containers. When most people think of containers, they think of LXC, which is a command-line tool created by IBM. But, the Linux kernel knows nothing about containers, per se, and LXC is built atop Linux namespaces. The secure containers project did not use LXC directly; instead it uses libvirt-lxc.
Using namespaces, child processes can have an entirely different view of the system than does the parent. Namespaces are not all that new, RHEL5 and Fedora 6 used the pam_namespace to partition logins into "secret" vs. "top secret" for example. The SELinux sandbox also used namespaces and was available in RHEL6 and Fedora 8. More recently, Fedora 17 uses systemd which has PrivateTmp and PrivateNetwork directives for unit files that can be used to give services their own view of /tmp or the network. There are 20-30 services in Fedora 17 that are running with their own /tmp, Walsh said.
In addition, Red Hat offers the OpenShift service which allows anyone to have their own Apache webserver for free on Red Hat servers. It is meant to remove the management aspect so that developers can concentrate on developing web applications that can eventually be deployed elsewhere. Since there are many different Apache instances running on the OpenShift servers, sandboxing is used to keep them from interfering with each other.
There are several different kinds of namespaces in Linux. The mount namespace gives processes their own view of the filesystem, while the PID namespace gives them their own set of process IDs. The IPC and Network namespaces allow for private views of those resources, and the UTS namespace allows the processes to have their own host and domain names. The UID namespace is another that is not yet available, and one that concerns Walsh because of its intrusiveness. It would give a private set of UIDs, such that UID 0 inside of the namespace is not the same as root outside.
Secure Linux containers uses libvirt-lxc to set up namespaces that effectively create containers to hold processes that are isolated from those in other containers. Libvirt-lxc has a C API, but also has bindings for several different higher-level languages. It can set up a container, with a firewall, SELinux type enforcement (TE) and multi-category security (MCS), bind mounts that pass through to the host filesystem, and so on. Once that is done, it can start an init process (systemd in this case) inside the container so that it appears to be almost a full Linux system inside the container. In addition, these containers can be managed using control groups (cgroups) so that no one container can monopolize resources like memory or CPU.
But, libvirt-lxc has a complex API that is XML-based. Walsh wanted something simpler, so he created libvirt-sandbox with a key-value based configuration. He intends to replace the SELinux sandbox using libvirt-sandbox, but it is not quite ready for that yet.
To make things even easier, Walsh created a Python script that makes it "dirt simple" for an administrator to build a container or set of containers. He said that Red Hat is famous for building "cool tools that no one uses" because they are too complicated, so he set out to make something very simple to use.
The tool can be used as follows:
virt-sandbox-service create -C -u httpd.service.apache1
That call will do multiple things under the covers. It creates a systemd
unit file for the container, which means that standard systemd commands can
be used to manage it. In addition, if someone puts a GUI on systemd
someday, administrators can use that to manage their containers, he said.
It also
creates the filesystems for the container. It does not use a full
chroot(), Walsh said, because he wants to be able to share
/usr between containers. For this use case (an Apache web server
container), he wants the individual containers to pick up any updates that
come from doing a yum update on the host.
It also clones the /var and /etc configuration files into its own copy. In a perfect world, the container would bind mount over /etc, but it can't do that, partly because /etc has so many needed configuration files ("/etc is a cesspool of garbage" was his colorful way of describing that). In addition, it allocates a unique SELinux MCS label that restricts the processes inside the container. "Containers are not for security", he said, because root inside the container can always escape, so the container gets wrapped in SELinux to restrict it.
Once the container has been created, it can be started with:
virt-sandbox-service start apache1
Similarly, the stop command can terminate the container. One can
also use the connect command to get a shell in the container.
virt-sandbox-service execute -C ifconfig apache1
will run a command in the container. For example, there is no
separate cron running in each of the containers, instead the
execute is used to do things like logrotate from the
host's cron.
The systemd unit file that gets created can start and stop multiple container instances with a single command. Beyond that, using the ReloadPropagatedFrom directive in the unit file will allow an update of the host's apache package to restart all of the servers in the containers. So:
systemctl reload httpd.service
will trigger a reload in all container instances, while:
systemctl start http@.service
will start up all such services (which means all of the defined containers).
This is all recent work, Walsh said. It works "relatively well", but still needs work. There are other use cases for these containers, beyond just the OpenShift-like example he used. For instance, the Fedora project uses Mock to build packages, and Mock runs as root. That means there are some 3000 Fedora packagers who could do "bad stuff" on the build systems, so putting Mock into a secure container would provide better security. Another possibility would be to run customer processes (e.g. Hadoop) on a GlusterFS node. Another service that Walsh has containerized is MySQL, and more are possible.
Walsh demonstrated virt-sandbox-service at the end of his talk. He demonstrated some of the differences inside and outside of the container, including a surprising answer to getenforce inside the container. It reports that SELinux is disabled, but that is a lie, he said, to stop various scripts from trying to do SELinux things within the container. In addition, he showed that the eth0 device inside the container did not even appear in the host's ifconfig output (nor, of course, did the host's wlan0 appear in the container).
A number of steps have been taken to try to prevent root from breaking out
of the container, but there is more to be done. Both mount and
mknod will fail inside the container for example. These
containers are not as secure as full virtualization, Walsh said, but they are
much easier to manage than handling the multiple full operating systems that
virtualization requires. For many use cases, secure containers may be the
right fit.
| Index entries for this article | |
|---|---|
| Security | Containers |
| Security | Security Enhanced Linux (SELinux) |
| Conference | Linux Security Summit/2012 |
