Security
On the way to safe containers
Stéphane Graber and Tycho Andersen gave a presentation at the Linux Security Summit in Toronto to introduce the LXD container hypervisor. They also outlined some of the security concerns and other problems they have encountered while developing it. Graber is the LXD maintainer and project lead at Canonical, while Andersen works on the team, mostly on checkpoint/restore for containers.
LXD is a container-management tool that uses LXC containers, Graber said. It is designed to be simple to use, but comprehensive in what it covers. LXD is a root-privileged daemon, which gives it additional capabilities compared to the command-line-oriented LXC. It has a REST API that allows it to be easily scriptable, as well.
LXD is also considered "safe", though Graber did use air quotes when he said that. It uses every available kernel security feature "that we can get our hands on" for that, though user namespaces is one the primary features it depends on. LXD also scales from a single container on a developer's laptop to a corporate data center with many hosts and containers to an OpenStack installation with thousand of compute nodes.
From his slides [PDF], he showed a diagram of how all the pieces fit together (seen below at right). Multiple hosts, all running Linux (and all running the same kernel version for those interested in container migration, he cautioned), with the LXD daemon (using the LXC library) atop the kernel. The LXD REST API can then be used from the LXD command-line tool, the nova-lxd OpenStack plugin, from scripts, or even using curl, he said.
So, that is what LXD is, Graber said, but there is a list of things that it is not as well. It is not a virtualization technology itself; it uses existing virtualization technologies to implement "system containers"—those running a full Linux distribution, rather than simply an application. It is not a fork of LXC; instead it uses the LXC API to manage its containers. It is also not another application container manager; it will work with Docker, rkt, and other application container systems.
As part of its security measures, LXD uses all of the different namespace types. Graber said that a lot of work had gone into the control groups (cgroups) namespace over the last year, since LXD/LXC needed support for the cgroups version 1 (v1) API, which was not part of the original cgroup namespace work. For Linux Security Modules (LSMs), LXC supports both AppArmor and SELinux, though LXD only supports AppArmor at this point.
As far as capabilities go, LXD does drop a few, but must keep most of them available to the container since the application(s) that will be running in the system container are unknown. Those capabilities that the container will not need (e.g. CAP_MAC_ADMIN to override mandatory access control or CAP_SYS_MODULE to allow loading and unloading kernel modules) are dropped.
LXD also uses cgroups extensively and much of the talk will be about "why they're great and why they're really bad and hopefully what we can do to try and make them better", Graber said. He has spent some time over the last year trying to work out user-friendly ways to express kernel resource limits. For example, LXD uses the CPU cgroup controller to handle CPU limits for the containers, which can be expressed as a number of cores or as a limit based on CPU time. Those time limits can be configured as a percentage of the CPU or in terms of a time budget (e.g. 10ms out of every 150ms).
Similarly, memory limits can be set using a percentage or a fixed amount. LXD does not expose the kernel memory limits, since "no one knows how to set those correctly". Swapping can be enabled on a per-container basis. Disk quotas can be used if the underlying filesystem supports them in the right way for LXD; for now, that means Btrfs and ZFS. Network traffic can also be limited on a per-container basis. Containers can get an overall priority that will be applied to scheduling and other decisions based on the relative priorities of all of the containers on the system.
There are shared kernel resources that can cause problems when multiple containers are running, not necessarily because of malicious activity, but simply by accident. For example, inotify handles (used to track filesystem changes) are limited to 512 per user, which in practice means 512 shared between all of the containers. That is not enough when you are running systemd, which uses a lot of inotify handles and fails when it cannot get one rather than falling back to polling. There is no good reason to have this global limit, however, so tying the number of inotify handles to the user namespace is probably the right way to fix that.
Another problem area is the shared network tables. For example, Graber runs a "capture the flag" competition annually that uses 15,000 or so containers to simulate the internet. That creates a routing table with 3.3 million entries, which is too large for the kernel limits. The way the network tables are shared in the system means that a container user could fill up these tables so that other containers or the host can no longer open sockets. There is a similar problem with pseudoterminal slave (PTS) devices, he said.
Ulimits pose a related problem. Unless each container has its own range of user and group IDs (UIDs and GIDs), which would need to include 64K IDs per container to be "vaguely POSIX-compliant", ulimits will apply across container boundaries. Tying them to some kind of namespace would be better, but it is not entirely clear which namespace would make sense, he said.
The main isolation used for LXD containers is a user namespace. In addition, an AppArmor profile is installed to prevent cross-container access to files and other resources, though it is largely just a safety net. Some system calls are blacklisted using seccomp, as well.
Container checkpoint/restore
At that point, Andersen took over the presentation to discuss checkpoint/restore for containers. He began with some oddities that he has come across—for sysctl handling, in particular. For example, sysctls for network namespaces change the value for the namespace that opened the /proc/sys file, while the IPC and UTS namespaces change the value for whichever namespace does the write() of the value. But the only user that can open() the IPC/UTS sysctl files is real root, which would imply that file descriptors for those files would be passed to unprivileged containers, but that won't work.
He then moved on to some other checkpoint/restore problem areas. In reality, checkpoint/restore is almost the antithesis of security, Andersen said. It requires looking inside the state of a process, which needs privileges, but there are some things that even root cannot do. Checkpoint/restore uses ptrace() to scrape a process's state, but there are security technologies that block some of that access.
For example, seccomp will kill a process if a blocked system call is made, so seccomp might need to be suspended while the checkpoint is being done. Similarly, LSMs can prevent some actions that a checkpoint or restore might need to do, so LSM access control might need to be paused during those operations. Andersen did note that when discussing this idea with Kees Cook it was not entirely well-received—in fact, Cook said the feature "gave him the creeps". Beyond those problems, there is also a need to handle the checkpoint of nested namespaces, he said.
Graber then gave a demo of LXD that included migrating running containers from one host to another. As with most demos, it was a bit hard to describe; those interested can check out the YouTube video of the talk. It did serve to show some of the capabilities of LXD, its command-line interface, and the ease of setting, running, and managing containers using it. LXD is implemented in Go, while LXC is written in C.
As a recap, Graber summed up the LXD project and its wider implications. Unprivileged containers are safe by design, he said. LSMs can be used to provide a safety net to help ensure the security of those containers. It is still too easy to make a denial-of-service attack against the kernel, however, using PTSes, network tables, and other shared resources. Unprivileged APIs are regularly requested by container users, some of which are reasonable, though many others are not. Finally, checkpoint/restore for containers is working in some configurations, but there are lots of things still to be worked out.
[I would like to thank the Linux Foundation for travel support to attend the Linux Security Summit in Toronto.]
Brief items
Security quotes of the week
I knew all this because each person advertised their presence wirelessly, either over "classic" Bluetooth or the newer Bluetooth Low Energy (BTLE) protocol—and I was running an open source tool called Blue Hydra, a project from the team at Pwnie Express.
New vulnerabilities
autotrace: code execution
| Package(s): | autotrace | CVE #(s): | CVE-2016-7392 | ||||||||
| Created: | September 15, 2016 | Updated: | September 28, 2016 | ||||||||
| Description: | From the Debian-LTS advisory:
Autotrace is a program for converting bitmaps to vector graphics. It had a bug that caused an out-of-bounds write. This was caused by not allocating sufficient memory to store the terminating NULL pointer in an array. | ||||||||||
| Alerts: |
| ||||||||||
chromium-browser: multiple vulnerabilities
| Package(s): | chromium-browser | CVE #(s): | CVE-2016-5170 CVE-2016-5171 CVE-2016-5172 CVE-2016-5173 CVE-2016-5174 CVE-2016-5175 CVE-2016-7395 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | September 15, 2016 | Updated: | September 21, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Debian advisory:
CVE-2016-5170: A use-after-free issue was discovered in Blink/Webkit. CVE-2016-5171: Another use-after-free issue was discovered in Blink/Webkit. CVE-2016-5172: Choongwoo Han discovered an information leak in the v8 javascript library. CVE-2016-5173: A resource bypass issue was discovered in extensions. CVE-2016-5174: Andrey Kovalev discoved a way to bypass the popup blocker. CVE-2016-5175: The chrome development team found and fixed various issues during internal auditing. CVE-2016-7395: An uninitialized memory read issue was discovered in the skia library. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||
curl: code execution
| Package(s): | curl | CVE #(s): | CVE-2016-7167 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | September 16, 2016 | Updated: | November 2, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla entry:
It was found that provided string length arguments in four libcurl functions curl_escape(), curl_easy_escape(), curl_unescape and curl_easy_unescape were not properly checked and due to arithmetic in the functions, passing in the length 0xffffffff (2^32-1 or UINT_MAX or even just -1) would end up causing an allocation of zero bytes of heap memory that curl would attempt to write gigabytes of data into. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
distribution-gpg-keys: privilege escalation
| Package(s): | distribution-gpg-keys mock | CVE #(s): | CVE-2016-6299 | ||||||||||||||||
| Created: | September 19, 2016 | Updated: | September 21, 2016 | ||||||||||||||||
| Description: | From the Red Hat bugzilla:
It was found that mock's scm plug-in would parse a given spec file with root privileges. This could allow an attacker who is able to start a build of an rpm with a specially crafted spec file within mock's environment to elevate their privileges and escape the chroot. | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
graphicsmagick: multiple vulnerabilities
| Package(s): | GraphicsMagick | CVE #(s): | CVE-2016-7446 CVE-2016-7447 CVE-2016-7448 CVE-2016-7449 | ||||||||||||||||||||||||||||
| Created: | September 15, 2016 | Updated: | September 28, 2016 | ||||||||||||||||||||||||||||
| Description: | From the GraphicsMagick release notes:
More information may be found in the CVE assignment email. | ||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||
jackrabbit: cross-site request forgery
| Package(s): | jackrabbit | CVE #(s): | CVE-2016-6801 | ||||||||
| Created: | September 19, 2016 | Updated: | September 27, 2016 | ||||||||
| Description: | From the Debian LTS advisory:
Lukas Reschke discovered that Apache Jackrabbit, a content repository implementation for Java, was vulnerable to Cross-Site-Request-Forgery in Jackrabbit's webdav module. The CSRF content-type check for POST requests did not handle missing Content-Type header fields, nor variations in field values with respect to upper/lower case or optional parameters. This could be exploited to create a resource via CSRF. | ||||||||||
| Alerts: |
| ||||||||||
kernel: denial of service
| Package(s): | kernel | CVE #(s): | CVE-2016-3841 | ||||||||||||||||||||||||||||||||||||||||
| Created: | September 20, 2016 | Updated: | November 10, 2016 | ||||||||||||||||||||||||||||||||||||||||
| Description: | From the CVE entry:
The IPv6 stack in the Linux kernel before 4.3.3 mishandles options data, which allows local users to gain privileges or cause a denial of service (use-after-free and system crash) via a crafted sendmsg system call. | ||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||
mariadb: access restriction bypass
| Package(s): | mariadb | CVE #(s): | CVE-2016-6663 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | September 15, 2016 | Updated: | September 21, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Arch Linux advisory:
- CVE-2016-6663 (access restriction bypass): In the past mariadb used to read the main configuration file from three different locations. One of them (the datadir) is unsafe because it's writeable by the sql-server. This way a remote attacker who could gain access to the sql-server could deploy a maliciously crafted configuration file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
moin: cross-site scripting
| Package(s): | moin | CVE #(s): | CVE-2016-7146 CVE-2016-7148 CVE-2016-9119 | ||||||||||||||||||||||||||||
| Created: | September 19, 2016 | Updated: | December 2, 2016 | ||||||||||||||||||||||||||||
| Description: | From the Red Hat bugzilla:
MoinMoin 1.9.8 is out, released 2014-10-17. See https://moinmo.in/MoinMoinDownload Strongly recommended for all users and contains bug fixes and enhanced password functionality. From the Debian advisory: Several cross-site scripting vulnerabilities were discovered in moin, a Python clone of WikiWiki. A remote attacker can conduct cross-site scripting attacks via the GUI editor's attachment dialogue (CVE-2016-7146), the AttachFile view (CVE-2016-7148) and the GUI editor's link dialogue (CVE-2016-9119). | ||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||
mozilla: multiple vulnerabilities
| Package(s): | firefox thunderbird seamonkey | CVE #(s): | CVE-2016-5257 CVE-2016-5270 CVE-2016-5272 CVE-2016-5274 CVE-2016-5276 CVE-2016-5277 CVE-2016-5278 CVE-2016-5280 CVE-2016-5281 CVE-2016-5284 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | September 21, 2016 | Updated: | January 5, 2017 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Red Hat advisory:
Multiple flaws were found in the processing of malformed web content. A web page containing malicious content could cause Firefox to crash or, potentially, execute arbitrary code with the privileges of the user running Firefox. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
php: multiple vulnerabilities
| Package(s): | php | CVE #(s): | CVE-2016-7411 CVE-2016-7412 CVE-2016-7413 CVE-2016-7414 CVE-2016-7416 CVE-2016-7417 CVE-2016-7418 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Created: | September 19, 2016 | Updated: | October 14, 2016 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description: | From the Arch Linux advisory:
The package php before version 7.0.11-1 is vulnerable to multiple issues that can lead to arbitrary code execution and denial of service. - CVE-2016-7411 (arbitrary code execution): A memory Corruption vulnerability was found in php's unserialize method. This happened during the deserialized-object Destruction. - CVE-2016-7412 (arbitrary code execution): Php's mysqlnd extension assumes the `flags` returned for a BIT field necessarily contains UNSIGNED_FLAG; this might not be the case, with a rogue mysql server, or a MITM attack. A malicious mysql server or MITM can return field metadata for BIT fields that does not contain the UNSIGNED_FLAG, which leads to a heap overflow. - CVE-2016-7413 (arbitrary code execution): When WDDX tries to deserialize "recordset" element, use after free happens if close tag for the field is not found. This happens only when field names are set. - CVE-2016-7414 (arbitrary code execution): The entry.uncompressed_filesize* method does not properly verify the input parameters. An attacker can create a signature.bin with size less than 8, when this value is passed to phar_verify_signature as sig_len a heap buffer overflow occurs. - CVE-2016-7416 (arbitrary code execution): Big locale string causes stack based overflow inside libicu. - CVE-2016-7417 (insufficient validation): The return value of spl_array_get_hash_table is not properly checked and used on spl_array_get_dimension_ptr_ptr. - CVE-2016-7418 (denial of service): An attacker can trigger an Out-Of-Bounds Read in php_wddx_push_element of wddx.c. A DoS (null pointer dereference) vulnerability can be triggered in the wddx_deserialize function by providing a maliciously crafted XML string. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
php5: invalid free
| Package(s): | php5 | CVE #(s): | CVE-2016-4473 | ||||||||||||||||
| Created: | September 19, 2016 | Updated: | September 21, 2016 | ||||||||||||||||
| Description: | From the Debian LTS advisory:
An invalid free may occur under certain conditions when processing phar-compatible archives. | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
php-adodb: cross-site scripting
| Package(s): | php-adodb | CVE #(s): | CVE-2016-4855 | ||||||||||||||||
| Created: | September 19, 2016 | Updated: | September 21, 2016 | ||||||||||||||||
| Description: | From the Red Hat bugzilla:
A cross-site scripting flaw was found in one of ADOdb's test scripts. | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
tomcat: privilege escalation
| Package(s): | tomcat | CVE #(s): | CVE-2016-1240 | ||||||||||||||||||||
| Created: | September 15, 2016 | Updated: | September 21, 2016 | ||||||||||||||||||||
| Description: | From the Debian-LTS advisory:
Dawid Golunski from legalhackers.com discovered that Debian's version of Tomcat 6 was vulnerable to a local privilege escalation. Local attackers who have gained access to the server in the context of the tomcat6 user through a vulnerability in a web application were able to replace the file with a symlink to an arbitrary file. | ||||||||||||||||||||||
| Alerts: |
| ||||||||||||||||||||||
unadf: two vulnerabilities
| Package(s): | unadf | CVE #(s): | CVE-2016-1243 CVE-2016-1244 | ||||||||
| Created: | September 21, 2016 | Updated: | September 26, 2016 | ||||||||
| Description: | From the Debian LTS advisory:
It was discovered that there were two vulnerabilities in unadf, a tool to extract files from an Amiga Disk File dump (.adf): - - CVE-2016-1243: stack buffer overflow caused by blindly trusting on pathname lengths of archived files. Stack allocated buffer sysbuf was filled with sprintf() without any bounds checking in extracTree() function. - - CVE-2016-1244: execution of unsanitized input Shell command used for creating directory paths was constructed by concatenating names of archived files to the end of the command string. | ||||||||||
| Alerts: |
| ||||||||||
virtualbox: unspecified vulnerability
| Package(s): | virtualbox | CVE #(s): | CVE-2016-3612 | ||||
| Created: | September 15, 2016 | Updated: | September 21, 2016 | ||||
| Description: | From the openSUSE advisory:
CVE-2016-3612: An unspecified vulnerability in the Oracle VM VirtualBox component in Oracle Virtualization VirtualBox before 5.0.22 allowed remote attackers to affect confidentiality via vectors related to Core. | ||||||
| Alerts: |
| ||||||
wireshark: multiple vulnerabilities
| Package(s): | wireshark | CVE #(s): | CVE-2016-7176 CVE-2016-7177 CVE-2016-7178 CVE-2016-7179 CVE-2016-7180 | ||||||||||||||||
| Created: | September 21, 2016 | Updated: | September 27, 2016 | ||||||||||||||||
| Description: | From the CVE entries:
epan/dissectors/packet-h225.c in the H.225 dissector in Wireshark 2.x before 2.0.6 calls snprintf with one of its input buffers as the output buffer, which allows remote attackers to cause a denial of service (copy overlap and application crash) via a crafted packet. (CVE-2016-7176) epan/dissectors/packet-catapult-dct2000.c in the Catapult DCT2000 dissector in Wireshark 2.x before 2.0.6 does not restrict the number of channels, which allows remote attackers to cause a denial of service (buffer over-read and application crash) via a crafted packet. (CVE-2016-7177) epan/dissectors/packet-umts_fp.c in the UMTS FP dissector in Wireshark 2.x before 2.0.6 does not ensure that memory is allocated for certain data structures, which allows remote attackers to cause a denial of service (invalid write access and application crash) via a crafted packet. (CVE-2016-7178) Stack-based buffer overflow in epan/dissectors/packet-catapult-dct2000.c in the Catapult DCT2000 dissector in Wireshark 2.x before 2.0.6 allows remote attackers to cause a denial of service (application crash) via a crafted packet. (CVE-2016-7179) epan/dissectors/packet-ipmi-trace.c in the IPMI trace dissector in Wireshark 2.x before 2.0.6 does not properly consider whether a string is constant, which allows remote attackers to cause a denial of service (use-after-free and application crash) via a crafted packet. (CVE-2016-7180) | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
zookeeper: buffer overflow
| Package(s): | zookeeper | CVE #(s): | CVE-2016-5017 | ||||||||||||||||
| Created: | September 19, 2016 | Updated: | January 2, 2017 | ||||||||||||||||
| Description: | From the Debian LTS advisory:
Lyon Yang discovered that the C client shells cli_st and cli_mt of Apache Zookeeper, a high-performance coordination service for distributed applications, were affected by a buffer overflow vulnerability associated with parsing of the input command when using the "cmd:" batch mode syntax. If the command string exceeds 1024 characters a buffer overflow will occur. | ||||||||||||||||||
| Alerts: |
| ||||||||||||||||||
Page editor: Jake Edge
Next page:
Kernel development>>
