Security
The security state of KVM
One of the benefits of virtualization is security; applications running in separate virtual machines are isolated from each other and, ideally, it is very hard for a compromised guest to damage other virtual machines running on the same host. The hypervisor itself is the place where most attacks on a virtualization system will be aimed. At the 2014 KVM Forum, Andrew Honig presented his analysis of which parts of KVM are more likely to have problems, and proposed ways to limit the attack surface.
An insecure hypervisor does not provide much security to the virtual machines it hosts. Luckily, hypervisors are typically small pieces of software, and a smaller size means a reduced attack surface and a higher feasibility of auditing the code. In the case of KVM, the hypervisor runs in the same address space as the rest of the Linux kernel, including device drivers and the network stack, but only a small amount of code deals with untrusted input from the virtual machine.
Therefore, the Linux kernel is substantially insulated from possible malicious behavior of the virtual machine. Device drivers in the virtual machine talk to a user-space process (typically QEMU), and this process talks to the kernel through the regular system call interface or through special devices such as /dev/tap. QEMU is exposed to all the evil that could come from a malicious virtual machine, but only limited and low-level interfaces can be used to attack it. This makes it hard to use QEMU as a vector to exploit kernel vulnerabilities in the host. And, since QEMU is a user-space program, Linux Security Modules (LSMs) such as SELinux or AppArmor can be used to substantially mitigate the effect of arbitrary code execution if QEMU itself is subverted.
This makes the hypervisor much more interesting to attack than QEMU is. So there was a great deal of interest in Honig's talk, "Security Hardening of KVM", (slides [PDF], video [YouTube]) at the KVM Forum, which was held in Düsseldorf, Germany in October. Honig has been working on hypervisor security for about ten years. He used to try to break VMware, and found six CVEs, but his attention has shifted to KVM since he switched employers. He now works at Google, where his team takes care of securing Google Compute Engine (GCE). This is a cloud platform that uses KVM as the hypervisor. Interestingly, the user-space part of GCE is not QEMU; Google wrote its own.
So far, the team has found nine vulnerabilities in KVM. That is not all that many compared to the effort that he and his team is putting into breaking it. In Honig's words, few other parts of Linux have probably had as many "engineer-hours per line of code" spent looking for security problems. Forty thousand lines of C code can certainly be expected to have bugs.
Vulnerability types
What kind of vulnerabilities can you encounter? Privilege escalation or denial of service (DoS) in the host can happen of course, since hypervisors expose a relatively rich ioctl() API to user space; this kind of vulnerability is not really specific to hypervisors. It is slightly more interesting to have a bug that lets an unprivileged program running in the guest crash the whole virtual machine. A bug of this kind was fixed recently.
Crashing the host is worse and mostly happens because of null pointer dereferences (with the panic_on_oops=1 setting); and in some rare cases, a hypervisor bug can facilitate privilege escalation for an unprivileged program running within a guest. Which of these is worse? For a cloud provider such as Google, crashing the host is worse; its customers, however, might value the integrity of their virtual machines.
Higher up in the rankings are vulnerabilities that let guests read data from other guests or from the hypervisor. The recently discovered Xen vulnerability, XSA-108, let guests read a few kilobytes of hypervisor memory. Despite being hard to exploit, and despite the existence of worse kinds of hypervisor vulnerabilities, it received considerable press and forced major cloud providers to reboot all of their hosts.
Of course, the worst bugs of all happen when the guest can write to hypervisor memory and, in all likelihood, execute arbitrary code in hypervisor context soon after. Of the fifteen CVEs that Honig mentioned, five were of this kind: two in KVM and three in VMware.
In order to find these bugs, Honig's team resorts to fuzzing and a lot of code review. They have gained some experience and by now they know what and where to look for every time they upgrade GCE to a newer hypervisor.
Most of the problems stem from either race conditions or buffer overflows, and some are downright embarrassing. In one case for KVM, the code used an ASSERT() macro to verify the validity of an index in an array:
u32 redir_index = (ioapic->ioregsel - 0x10) >> 1; u64 redir_content; ASSERT(redir_index < IOAPIC_NUM_PINS); redir_content = ioapic->redirtbl[redir_index].bits;
Unfortunately, the bounds check is buried inside the ASSERT() call that is compiled out by default. That means the guest can read arbitrary host memory. Or, if you choose to enable it, as is the case for debug builds, an assertion failure will crash the host—pick your poison.
The code above is part of the emulation of the IOAPIC, an interrupt controller device. It turns out that device emulation is the area where Google reported most bugs, but it is not the only one.
Improving KVM security
The main task of the hypervisor is to drive execution of the virtual CPUs. Some actions of the virtual CPUs, such as reads and writes to model specific registers (MSRs) and I/O registers, cannot be done by the processor; the hypervisor will then either emulate the operation itself or ask a user-space process to complete it. MSRs right now are always handled in kernel space, and are one source of bugs. Performance-critical devices such as interrupt controllers and timers are also handled in kernel space; the IOAPIC is not really performance-critical anymore, but it used to be in 2007-2008 operating systems when KVM was being developed.
In order to process loads and stores to I/O registers, KVM includes a small x86 instruction emulator. The emulator actually has a second purpose: it is needed to handle processor states that are not supported by older Intel processors, such as the so-called "big real mode" and hardware task switching. The good news is that this second purpose is becoming obsolete, as newer processors can do almost all of this in hardware. The bad news is that, unlike RISC architectures where only a handful of instructions have to be emulated, x86 has dozens of instructions that can access memory-mapped I/O registers, and KVM has to recognize and execute them all. Thus, the emulator consists of roughly 5,000 lines of code, and has its own share of bugs.
The more these parts can be moved to user space, the more the attack surface can be reduced, Honig said. As mentioned earlier, user space is naturally confined, and it offers a wealth of mitigation techniques that do not apply to the kernel.
As newer processors include more and more virtualization features, Google is targeting fairly new Intel processors only, and high-end ones at that. In particular, the Xeon E5 v2, also known as Ivy Bridge-E, supports big real mode virtualization and can also virtualize large parts of the local APIC inside the processor.
In a perfect world, everything else would then move to user space. In practice, parts of the local APIC support will almost definitely remain in the kernel. For example, inter-processor interrupts (IPIs) are performance-critical and, in general, not virtualized by the CPU. The only accelerated special case is "self-IPIs", that is IPIs sent to the same processor that triggered them. This sounds weird but is used extensively by Windows.
Still, this means the emulator, the legacy i8259 interrupt controller, the legacy i8254 programmable timer, and the almost-legacy IOAPIC would no longer be part of the hypervisor's attack surface. Most MSR emulation could also move to user space. Honig stated a fairly ambitious goal: to reduce the attack surface by 50% (measured in lines of code and "number of pages of the Intel manual" emulated in the kernel) with at most 0.1% performance impact on macro-benchmarks.
The team's plan has been to start with everything in user space, and re-enable kernel acceleration as much as needed to satisfy their goal. This makes sense for a research project, but it is backwards compared to how this maintainer would like to see the work pushed upstream. As far as I am concerned, in fact, it would be preferable to receive many small series, each one moving a piece of KVM out of the kernel. Also, since Google has not been using either QEMU or kvmtool for the user-space part of the work, the team also has to develop patches for one of them before its improvements can be accepted upstream.
That said, this kind of hurdle should probably be expected, and it did not make the presentation any less interesting. Compared to containers, one of the strengths in virtualization is (or should be) the smaller attack surface. It is important that hypervisors keep up with the promises, and Honig's ideas are definitely going in the right direction.
Brief items
Security quotes of the week
The depressing part is there is no real solution on the horizon - even the "fuse bit" solutions some folks are touting can be easily bypased by reprogramming/deleting the entire devices. More depressing is that this problem seems so hard that while lots of people are working on attacks, the problem is frustrating enough that very few are working on solutions.
Most of these devices have 32k of firmware. Karsten's attack code including DHCP server and network stack was only a couple of K.... So, yes, your USB charger can attack your phone and your mouse can attack your laptop.
Ubuntu, ownCloud, and a hidden dark side of Linux software repositories (PC World)
Here's a PC World article on the old, insecure version of ownCloud shipped in Ubuntu 14.04 — and the difficulties in getting it updated or removed.
The writing is a little breathless, but there is a valid issue here; the software found in the more remote corners of distribution repositories may not be particularly well maintained.
New vulnerabilities
cinder: information disclosure
Package(s): | cinder | CVE #(s): | CVE-2014-7230 | ||||||||||||
Created: | November 12, 2014 | Updated: | December 3, 2014 | ||||||||||||
Description: | From the CVE entry:
The processutils.execute function in OpenStack oslo-incubator, Cinder, Nova, and Trove before 2013.2.4 and 2014.1 before 2014.1.3 allows local users to obtain passwords from commands that cause a ProcessExecutionError by reading the log. | ||||||||||||||
Alerts: |
|
curl: information leak
Package(s): | curl | CVE #(s): | CVE-2014-3707 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 7, 2014 | Updated: | January 5, 2015 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Debian advisory: Symeon Paraschoudis discovered that the curl_easy_duphandle() function in cURL, an URL transfer library, has a bug that can lead to libcurl eventually sending off sensitive data that was not intended for sending, while performing a HTTP POST operation. This bug requires CURLOPT_COPYPOSTFIELDS and curl_easy_duphandle() to be used in that order, and then the duplicate handle must be used to perform the HTTP POST. The curl command line tool is not affected by this problem as it does not use this sequence. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
deluge: deluge-web is vulnerable to POODLE
Package(s): | deluge | CVE #(s): | |||||
Created: | November 12, 2014 | Updated: | November 12, 2014 | ||||
Description: | From the Red Hat bugzilla:
The web plugin creates an SSLv3 socket. The latest version of deluge, 1.3.10, updates the web plugin to use TLSv1. | ||||||
Alerts: |
|
gnutls28: code execution
Package(s): | gnutls28 | CVE #(s): | CVE-2014-8564 | ||||||||||||||||||||||||||||||||||||||||
Created: | November 11, 2014 | Updated: | November 19, 2014 | ||||||||||||||||||||||||||||||||||||||||
Description: | From the Ubuntu advisory:
Sean Burford discovered that GnuTLS incorrectly handled printing certain elliptic curve parameters. A malicious remote server or client could use this issue to cause GnuTLS to crash, resulting in a denial of service, or possibly execute arbitrary code. | ||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
ImageMagick: multiple vulnerabilities
Package(s): | ImageMagick | CVE #(s): | CVE-2014-8354 CVE-2014-8355 CVE-2014-8562 | ||||||||||||||||||||||||||||||||
Created: | November 12, 2014 | Updated: | April 13, 2015 | ||||||||||||||||||||||||||||||||
Description: | From the openSUSE advisory:
- Out-of-bounds memory access in PCX parser (CVE-2014-8355). - Out-of-bounds memory access in resize code (CVE-2014-8354). - Out-of-bounds memory error in DCM decode (CVE-2014-8562). | ||||||||||||||||||||||||||||||||||
Alerts: |
|
kde-workspace: privilege escalation
Package(s): | kde-workspace | CVE #(s): | CVE-2014-8651 | ||||||||||||||||||||||||
Created: | November 11, 2014 | Updated: | December 31, 2015 | ||||||||||||||||||||||||
Description: | From the Ubuntu advisory:
David Edmundson discovered that the KDE Clock KCM policykit helper did not properly guard against untrusted input. Under certain circumstances, a process running under the user's session could exploit this to run programs as the administrator. See also this KDE advisory. | ||||||||||||||||||||||||||
Alerts: |
|
libreoffice: code execution
Package(s): | libreoffice | CVE #(s): | CVE-2014-3693 | ||||||||||||||||||||||||||||||||||||
Created: | November 6, 2014 | Updated: | December 4, 2014 | ||||||||||||||||||||||||||||||||||||
Description: | From the Ubuntu advisory:
It was discovered that LibreOffice incorrectly handled the Impress remote control port. An attacker could possibly use this issue to cause Impress to crash, resulting in a denial of service, or possibly execute arbitrary code. | ||||||||||||||||||||||||||||||||||||||
Alerts: |
|
libvirt: information disclosure
Package(s): | libvirt | CVE #(s): | CVE-2014-7823 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 11, 2014 | Updated: | January 6, 2015 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Ubuntu advisory:
Eric Blake discovered that libvirt incorrectly handled permissions when processing the qemuDomainFormatXML command. An attacker with read-only privileges could possibly use this to gain access to certain information from the domain xml file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
php: code execution
Package(s): | php | CVE #(s): | CVE-2014-8626 | ||||||||||||||||
Created: | November 7, 2014 | Updated: | November 12, 2014 | ||||||||||||||||
Description: | From the Red Hat advisory: A stack-based buffer overflow flaw was found in the way the xmlrpc extension parsed dates in the ISO 8601 format. A specially crafted XML-RPC request or response could possibly cause a PHP application to crash or execute arbitrary code with the privileges of the user running that PHP application. | ||||||||||||||||||
Alerts: |
|
Pound: HTTP request smuggling
Package(s): | Pound | CVE #(s): | CVE-2005-2090 | ||||||||
Created: | November 7, 2014 | Updated: | November 12, 2014 | ||||||||
Description: | From the CVE entry: Jakarta Tomcat 5.0.19 (Coyote/1.1) and Tomcat 4.1.24 (Coyote/1.0) allows remote attackers to poison the web cache, bypass web application firewall protection, and conduct XSS attacks via an HTTP request with both a "Transfer-Encoding: chunked" header and a Content-Length header, which causes Tomcat to incorrectly handle and forward the body of the request in a way that causes the receiving server to process it as a separate HTTP request, aka "HTTP Request Smuggling." | ||||||||||
Alerts: |
|
qemu: multiple vulnerabilities
Package(s): | qemu | CVE #(s): | CVE-2014-3689 CVE-2014-7815 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Created: | November 7, 2014 | Updated: | November 12, 2014 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Description: | From the Debian advisory: CVE-2014-3689 – The Advanced Threat Research team at Intel Security reported that guest provided parameter were insufficiently validated in rectangle functions in the vmware-vga driver. A privileged guest user could use this flaw to write into qemu address space on the host, potentially escalating their privileges to those of the qemu host process. CVE-2014-7815 – James Spadaro of Cisco reported insufficiently sanitized bits_per_pixel from the client in the QEMU VNC display driver. An attacker having access to the guest's VNC console could use this flaw to crash the guest. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Alerts: |
|
sssd: restriction bypass
Package(s): | sssd | CVE #(s): | CVE-2014-0249 | ||||||||
Created: | November 12, 2014 | Updated: | October 27, 2016 | ||||||||
Description: | From the CVE entry:
The System Security Services Daemon (SSSD) 1.11.6 does not properly identify group membership when a non-POSIX group is in a group membership chain, which allows local users to bypass access restrictions via unspecified vectors. | ||||||||||
Alerts: |
|
tnftp: command execution
Package(s): | tnftp | CVE #(s): | CVE-2014-8517 | ||||||||||||
Created: | November 11, 2014 | Updated: | November 15, 2016 | ||||||||||||
Description: | From the Red Hat bug report:
It was reported that tnftp, an FTP client from NetBSD, could be forced to run arbitrary commands if an output file is not specified. | ||||||||||||||
Alerts: |
|
wss4j: authentication spoofing
Package(s): | wss4j | CVE #(s): | CVE-2014-3623 | ||||||||
Created: | November 7, 2014 | Updated: | December 29, 2014 | ||||||||
Description: | From the CVE entry: Apache WSS4J before 1.6.17 and 2.x before 2.0.2, as used in Apache CXF 2.7.x before 2.7.13 and 3.0.x before 3.0.2, when using TransportBinding, does properly enforce the SAML SubjectConfirmation method security semantics, which allows remote attackers to conduct spoofing attacks via unspecified vectors. | ||||||||||
Alerts: |
|
xml-security: denial of service
Package(s): | xml-security | CVE #(s): | CVE-2013-4517 | ||||||||
Created: | November 7, 2014 | Updated: | December 31, 2014 | ||||||||
Description: | From the CVE entry: Apache Santuario XML Security for Java before 1.5.6, when applying Transforms, allows remote attackers to cause a denial of service (memory consumption) via crafted Document Type Definitions (DTDs), related to signatures. | ||||||||||
Alerts: |
|
zarafa: multiple vulnerabilities
Package(s): | zarafa | CVE #(s): | |||||||||
Created: | November 10, 2014 | Updated: | November 12, 2014 | ||||||||
Description: | From the Fedora advisory:
This R1 release of the 7.1.11 final release addresses the WebAccess install problem on RPM-based systems and resolves the dependencies problems under Ubuntu 14.04. Downstream changes
| ||||||||||
Alerts: |
|
zeromq: man-in-the-middle attack
Package(s): | zeromq | CVE #(s): | CVE-2014-7202 CVE-2014-7203 | ||||||||
Created: | November 11, 2014 | Updated: | November 25, 2014 | ||||||||
Description: | From the CVE entries:
stream_engine.cpp in libzmq (aka ZeroMQ/C++)) 4.0.5 before 4.0.5 allows man-in-the-middle attackers to conduct downgrade attacks via a crafted connection request. (CVE-2014-7202) libzmq (aka ZeroMQ/C++) 4.0.x before 4.0.5 does not ensure that nonces are unique, which allows man-in-the-middle attackers to conduct replay attacks via unspecified vectors. (CVE-2014-7203) | ||||||||||
Alerts: |
|
Page editor: Jake Edge
Next page:
Kernel development>>