|
|
Log in / Subscribe / Register

Hardening virtio

August 9, 2021

This article was contributed by Marta Rybczyńska

Traditionally, in virtualized environments, the host is trusted by its guests, and must protect itself from potentially malicious guests. With initiatives like confidential computing, this rule is extended in the other direction: the guest no longer trusts the host. This change of paradigm requires adding boundary defenses in places where there have been none before. Recently, Andi Kleen submitted a patch set attempting to add the needed protections in virtio. The discussion that resulted from this patch set highlighted the need to secure virtio for a wider range of use cases.

Virtio offers a standardized interface for a number of device types (such as network or block devices). With virtio, the guest runs a simplified, common driver, and the host handles the connection to the real underlying device. The communication between the virtio device (host side) and the driver (guest side) happens using data structures called virtqueues, which are typically memory buffers, though the actual implementation depends on the bus used.

The scope of the hardening

In the confidential-computing world, the host is not allowed to access guest memory that was not explicitly shared with it. In addition, the guest's memory can be encrypted by the processor with a key unknown to the host. Kleen's work is designed to build on Intel's upcoming hardware feature, called Trust Domain Extensions (TDX), which is designed to protect guests in cloud environments. It is built using a number of architecture extensions, including memory encryption with Multi-Key Total Memory Encryption (MKTME) (covered here when a different memory-encryption API was proposed in 2019), and a new CPU mode called Secure-Arbitration Mode (SEAM). In the protected mode, code running under SEAM can only use a specified (encrypted) memory range, while all other processes (and DMA operations) cannot access that zone. Virtio, as a commonly used interface between the guest and the host, must take extra care to avoid compromising the security that TDX provides.

Until recently, virtio drivers assumed that the other side could be trusted. As a consequence, they have sometimes lacked necessary checks when working with the various metadata (operation descriptors, ring positions, result codes, etc.) shared with the device (i.e. the host); thus they could fail to catch bad pointers, out-of-range buffer indices, and similar errors. A malicious host could thus exploit buffer overruns and gain access to guest memory. Checking metadata from devices is also necessary in other cases, as virtio is no longer only used between a guest and a host — some physical devices are now implementing the virtio interface.

The patches can be grouped into three parts. The first one is the hardening in virtio itself, placed in virtio-ring. It also includes the disabling of some virtio modes. The second part enables the mode restrictions for x86 systems with TDX enabled. Finally, the last part includes changes in swiotlb, which enables DMA operations in situations where they are not otherwise possible by copying data through an intermediate ("bounce") buffer. The hardening included in the patch set adds additional checks for malicious pointers.

Virtio modes

Virtio defines many modes with different memory organizations, depending on the needs of the device and the driver. This creates multiple code paths to harden; apparently some of them are easier to fix than the others. Kleen decided to protect only the so-called split mode, where each virtqueue consists of different parts, each of those writable by either the driver or the device, but not by both at the same time.

In the proposed patch set, the other modes are disallowed when the guest runs in the TDX protected mode. This choice disallows indirect descriptors, a split-mode extension that allows the allocation of a number of descriptors in a separate memory area, improving performance by increasing the capacity of the ring. Also disabled is the packed mode, a more compact, in-memory layout. This restriction caused a number of comments. Jason Wang observed that disabling indirect descriptors causes a significant performance loss. Kleen had problems securing this mode and thinks it is too difficult to protect. Wang thinks the problem can be solved and promised to post a patch set.

Andy Lutomirski also disagreed with the approach of disabling all modes except one. He highlighted, later in the thread, that devices must not be allowed to corrupt the driver in any setting, so the hardening should be more generic:

For most Linux drivers, a report that a misbehaving device can corrupt host memory is a bug, not a feature. If a USB device can corrupt kernel memory, that's a serious bug. If a USB-C device can corrupt kernel memory, that's also a serious bug, although, sadly, we probably have lots of these bugs. If a Firewire device can corrupt kernel memory, news at 11. If a Bluetooth or WiFi peer can corrupt kernel memory, people write sonnets about it and give it clever names. Why is virtio special?

According to Lutomirski, the driver should be made secure for all use cases, not just the ones using TDX. Disabling other modes only when running TDX does not solve the problem, as bugs in those modes could be exploited to attack systems today. He also noted that virtio is not only implemented in software, but there are also hardware devices that expose a virtio-compatible interface. In another message he suggested splitting the driver into a modern version and a legacy one (including all modes that are not used in practice, or could not be fixed without compatibility issues) and actually harden the modern one completely.

Kleen disagreed, stating that there is no memory protection in other cases (possibly those not using a mechanism like TDX) and there is a risk of compatibility problems (that he did not identify). The boundary checks are enabled unconditionally, but the other virtio modes are only disabled when TDX is active. The discussion ended this way, without clear conclusions.

Similar work

In the discussion, Wang noted that there are similar hardening needs, including support for AMD Secure Encrypted Virtualization (SEV). Another need for virtio hardening comes from SmartNICs and devices implementing virtio, notably including vDPA — a device type that implements virtio for the data path, but has a vendor driver for the control path — and VDUSE, a vDPA device implemented in user space. They have similar problems and should not trust the metadata provided by the device. According to Kleen, those other cases should work with his changes with a few additions.

Conclusions and next steps

Hardening device drivers against malicious devices is an objective welcomed by kernel developers. The discussion shows that there is a need, with multiple use cases, and that different pieces have fixes in the works. Kleen's patch set received mixed reviews in its current form. The main issue seems to be the fact that it is too closely linked to the TDX work and the kernel developers would prefer a more generic solution. We are likely going to see more iterations of this work, and other hardening fixes in virtio, in the future.

Index entries for this article
KernelVirtualization/virtio
GuestArticlesRybczynska, Marta


to post comments

Hardening virtio

Posted Aug 9, 2021 15:57 UTC (Mon) by nickodell (subscriber, #125165) [Link] (3 responses)

Typo: "xa"

Typos

Posted Aug 9, 2021 16:02 UTC (Mon) by corbet (editor, #1) [Link] (2 responses)

Fixed. For future reference, please note the text in bold just over the comment window; typo reports sent to lwn@lwn.net will result in a fix without forcing many others to read your (now resolved) report. Thanks.

Typos

Posted Aug 12, 2021 5:40 UTC (Thu) by patrakov (subscriber, #97174) [Link] (1 responses)

Just in case - many Russian web sites use Orphus (https://orphus.ru/en/) to report typos. It's convenient, because all it takes for the reader is to select the bad word or phrase, and press Ctrl+Enter. Then a form pops up to optionally suggest the correct spelling or (if this is a factual error, not a typo) tell the author what's wrong.

Maybe LWN could adopt something similar? Email is archaic.

Typos

Posted Aug 14, 2021 5:51 UTC (Sat) by Seirdy (guest, #137326) [Link]

Currently, all features work without JS; I'd rather keep it that way.

Hardening virtio

Posted Aug 9, 2021 23:38 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

TDX and SGX considerably overlap in that they both (purport) to provide confidential computing. Intel also supports nested virtualization, so do we really need SGX now?

The interactions between all these processor modes really scare me. Intel seems to have an insatiable appetite for complexity.

Hardening virtio

Posted Aug 10, 2021 6:36 UTC (Tue) by pbonzini (subscriber, #60935) [Link] (1 responses)

TDX cannot be used in a nested VM, and you cannot use nested virtualization inside a TDX virtual machine.

Hardening virtio

Posted Aug 10, 2021 10:35 UTC (Tue) by roc (subscriber, #30627) [Link]

Is that an implementation limitation or something fundamental?

Hardening virtio

Posted Aug 10, 2021 18:30 UTC (Tue) by luto (subscriber, #39314) [Link]

For better or for worse, TDX depends on SGX for attestation.

Hardening virtio

Posted Aug 12, 2021 2:08 UTC (Thu) by mmirate (guest, #143985) [Link] (3 responses)

How can the guest possibly be able to not-trust its host? This seems fundamentally impossible.

Anything that the guest does, the host can skip over, as if by attaching gdb, setting a breakpoint right before the defense, and executing a jump to after it.

And the host can always spoof the encryption primitives - same way it implements all the other CPU instructions - and do nasty things like respond to "please generate us a key" with a key that the host knows.

Hardening virtio

Posted Aug 12, 2021 9:08 UTC (Thu) by excors (subscriber, #95769) [Link] (1 responses)

> How can the guest possibly be able to not-trust its host? This seems fundamentally impossible.

I think the fundamental idea is that the guest can trust the CPU that the host is running on, and the CPU can observe and restrict the host's behaviour. Then a remote user can effectively open a secure authenticated channel to some special component of the CPU, ask if it's running the user's guest image in a secure configuration that's protected from the host, and the user can send their private data to the guest once they know it's good. That removes the need to trust the host.

The host can't spoof the responses to the remote user, because they'll be signed with some Intel-provided private key that's inaccessible to the host.

Hardening virtio

Posted Aug 16, 2021 12:30 UTC (Mon) by mrybczyn (subscriber, #81776) [Link]

Yes, you're right. The guest doesn't trust the host OS, but it does trust the CPU and its microcode here.

Hardening virtio

Posted Aug 12, 2021 9:35 UTC (Thu) by james (guest, #1325) [Link]

It's much the same problem as "how can you have working digital restrictions management on a general-purpose computer", and the solutions are similar, too.

For example, as the article briefly referred to, it's possible to have encrypted memory with per-VM keys that the host OS can't access, so gdb can't make sense of what it reads and can't write bytes that the guest will interpret as a jump. (I note AMD has an immutable per-VM flag set at VM creation allowing or preventing guest debugging.)

The keys are generated by firmware (on AMD Epyc systems, this is running on an embedded ARM processor): it's a simple matter of cryptography to generate an encrypted channel and attest to a remote system that the key was generated in firmware on a genuine CPU. It's also possible to measure the state of the guest, and for the firmware to attest to that: at this point, the remote system can provide necessary keys for the guest to continue booting and access whatever it needs.

You are, as always, trusting that the processor and microcode don't have critical bugs. And the host OS can obviously impose denial of service, by turning itself off if nothing else.


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds