Challenges in protecting virtual machines from untrusted entities
As an ever-growing number of workloads are being moved to the cloud, CPU vendors have begun to roll out purpose-built hardware features to isolate virtual machines (VMs) from potentially hostile parties. These processor features, and their extensions, enable the notion of "secure VMs" (or "confidential VMs") — where a VM's "sensitive state" needs to be protected from untrusted entities. Drawing from his experience contributing to the secure VM implementation for the s390 architecture, Janosch Frank described the challenges involved in a talk at the 2020 (virtual) KVM Forum. Though the implementations across CPU vendors may vary, there are many shared problems, which opens up possibilities for collaboration.
Secure Encrypted Virtualization (SEV) from AMD (more information is available in the slides [PDF] from a talk at last year's KVM Forum and LWN's brief recap of it), Trust Domain Extensions (TDX) by Intel, and IBM's Secure Execution for s390 (last year's KVM Forum talk [YouTube] about it) and Power are some of the hardware technologies that aim to protect virtual machines from potential malicious entities. Other architectures, such as Arm, are expected to follow suit.
The sensitive state of a secure VM should not be accessible from the hypervisor, instead a "trusted entity" — a combination of software, CPU firmware, and hardware — manages it. But this raises a question: What counts as "sensitive state"? The lion's share comprises the guest's memory contents, which can contain disk encryption keys and other sensitive data. In addition, guest CPU registers can hold sensitive cryptographic key fragments. The execution path of the VM is another; a rogue hypervisor can potentially change the execution flow of a VM — e.g. it can inject an exception into the guest, which is highly undesirable. Therefore, effective "VM controls" that decide which instructions to execute, and how they're executed, must be protected. Furthermore, a hostile hypervisor, even if it can't extract any information from its guests, can still mount a denial-of-service (DoS) attack on them.
Then there is "data at rest" (i.e. guest data stored on disk), which is often not protected by the trusted entity; it is the VM's responsibility to protect it with common techniques such as disk encryption. Successfully protecting VMs and their data allows users to deploy sensitive workloads in public clouds.
Threat vectors
An approach to narrow down the threat vectors to defend against is by defining the
nature of trust, more commonly known as "threat modeling". Co-located VMs
and their host hypervisor are to be considered completely untrusted in a
public cloud setup. AMD's SEV
[PDF] uses the fuzzily-defined "benign but vulnerable hypervisor
"
model, where "the hypervisor is not believed to be 100% secure, but it is
trusted to act with benign intent
" — i.e. the hypervisor might not actively
try to compromise SEV-enabled VMs, but it could contain exploitable
vulnerabilities. For example, AMD treats only the processor hardware and
firmware as fully trusted, along with the "secure VM" itself.
And what is not trusted? Cloud operators, processor firmware, SMM
(System Management Mode), host OS and its hypervisor, all external PCI(e)
devices, and more. "Untrusted" here means these components are assumed to
be "malicious, potentially conspiring with other untrusted components in an
effort to compromise the security guarantees of an SEV-SNP VM
". In a
similar vein, Intel's
TDX [PDF] defines its "trust boundaries". Intel's hardware, including
its TDX module, is trusted; so are its Authenticated Code
Modules in firmware. The rest
is all untrusted.
Frank outlined some common building blocks to guard against attacks from the untrusted entities and protect the sensitive state of a VM. First, encrypting the VM's memory and other bits it can modify (e.g. guest CPU registers), so that an attacker only sees gibberish without the key. Second is to restrict access to sensitive guest data via access controls. Third, "integrity verification", to make sure a VM only accesses state that is not altered by a hostile party.
Protecting memory, vCPU registers, and boot integrity
Making the guest memory unreadable, by encrypting it so it is inaccessible to every other entity, except the guest itself and the trusted entity, provides "memory confidentiality". One way to achieve this is by letting the CPU's memory controller do the heavy lifting for memory encryption — each guest gets its own key that is stored in hardware, never to leave.
Storing encryption keys in hardware also protects against "cold-boot attacks", a kind of side-channel attack that requires access to hardware to dump sensitive information stored in RAM. However, keys stored in hardware means there's a limit to the number that can be stored — e.g. the first generation AMD EYPC ("Zen") CPU had a hard-limit of 15 encryption keys, which severely limits the number of secure VMs. But a later generation CPU (AMD "Zen 2") extended that to 509 keys.
Encrypting guest memory alone won't suffice; it also needs to be tamper-proof — because despite encryption, the hypervisor can still corrupt the guest RAM. Tampering with guest RAM can be prevented by means of architecture-specific hardware access controls. Reads and writes originating from outside the secure VM will result in an exception; this protects the integrity of the memory, assuming it never leaves the protected state. Further, this also allows "rogue accesses" to be traced and logged, Frank noted.
Guest CPU registers need to be unreadable by external parties. The hypervisor should only be able to read from or write to specific registers, in cases where it needs to emulate a CPU instruction. Therefore, the trusted entity must encrypt all or specific guest CPU registers. The state of a VM, both while it is being initialized to run and while it is running, is stored in a vendor-specific data structure known as the VM "control block". The guest CPU registers are isolated by providing a dummy VM control block to the hypervisor, while the trusted entity manages the real control block.
Yet another challenge is that only the user-approved executables should be allowed to be booted by the guest. There are two ways to handle this problem: one is boot-data encryption — here the executable (e.g. a guest kernel) is encrypted, and gets a header that holds key slots for each physical machine the executable is allowed to run on; plus some integrity-verification data. Then, the processor's firmware (a trusted entity) searches for a key slot it can unlock, to retrieve the executable's encryption key from it. The other is "remote attestation" — an idea that has existed for ages, but is fiendishly difficult to implement and manage — which allows a virtual machine to authenticate itself to a remote party, the owner of the VM, by proving that it is indeed running approved executables. This proof gives confidence to the owner that the guest is executing authorized workloads on genuine and authenticated hardware.
Attestation allows for quick changes of authorization rules for VMs and other entities; whereas boot-data encryption is easier to implement and doesn't require network connectivity. But relying only on boot-data encryption has its problems: authorizing a new machine involves building the executable afresh with a new key slot and updating the executable may require user intervention. There is also the perennial problem of distributing and verifying public/private key pairs.
Often, combining all of these techniques yields the best results, Frank emphasized.
What about I/O and swap?
Once a secure VM is up and running, it might want to do I/O and swap. A device trying to perform I/O on an encrypted guest memory page will only see garbled data or get access exceptions. There needs to be a way to "unprotect" some special I/O pages. This ought to be done based on an explicit request from the guest. The guest I/O is bounced off the shared pages, therefore sensitive data must be encrypted — by using common mechanisms such as SSH, HTTPS, LUKS disk encryption, and so on — before the guest does any I/O with it. However, special handling for I/O means degraded performance, but Frank was confident that the degradation can be significantly reduced in the future.
Getting swap to work is another challenge. During swap-out, where a memory page is pushed to a storage device, it needs to be made accessible in encrypted form to the host, so that it can be written to the device. On swap-in, where the page is pulled back to the main memory, it should get integrity-checked and decrypted before the guest can access the page again. There also needs to be protection against "replay" attacks, where an attacker can replace a VM's memory with a stale copy. The hypervisor mediates the swap-out and swap-in process with help from the trusted entity, which safeguards the entire operation.
But as is the case with special I/O handling, swapping incurs significant performance penalty; so secure VMs should be avoided in environments where memory is over-committed.
Current development efforts
An approach
proposed to "generalize memory encryption models
" proved quite
difficult to find common ground around, as CPU architectures differ a tad too much
in their implementation of secure VMs. Intel is still in the process of adding
support for secure VMs (see also: Sean Christopherson's slides
from a KVM
Forum talk on Intel TDX [PDF]). Frank noted that s390
already hooks into the Linux memory-management subsystem to pin I/O pages;
other platforms will need similar hooks. However, he expects the resulting
mainline kernel inclusion to take more time as it requires common code
changes. IBM's Secure Extension is available since Linux 5.4 for its
Power architecture and in Linux 5.7 for s390.
Support for AMD's SEV was introduced in Linux 4.16 for KVM-based guests; Linux 5.10 has support for AMD's extensions Encrypted State and support for Secure Nested Paging is underway. Further, there is an in-progress effort to add support for SEV's "address space IDs" (ASIDs) to control groups in Linux. The problem area involves the earlier-mentioned limit to maximum possible hardware encryption keys. Developers for other CPU architectures quickly expressed interest in finding a common approach to tackle this problem.
A pressing concern that Frank pointed out is the complexity of testing: setting up boot-data encryption and configuring attestation environments are both cumbersome. There will be more compile-time options for the kernel, new KVM ioctl() calls, new interfaces to the trusted entity, and changes to user-space components (e.g. QEMU, libvirt). All of this increases the testing burden.
Future
Live migration is expected to be tackled down the road. During migration, the entirety of the VM's state needs to be encrypted and its integrity verified on the destination. Frank anticipates backward compatibility to be a major challenge — migration logic that is traditionally handled by the hypervisor now partially moves to the trusted entity. Further, migration policies need to determine the possible target hosts a secure VM can be migrated to, which involves many variables. There is plenty of work for the coming years. Additionally, we might see "secure I/O devices" which only respond to I/O requests from an authorized secure VM and don't speak to the host at all.
Another potentially tricky topic is the ability to capture memory contents in the event of a guest kernel crash. With kdump, it is possible to capture a crash dump to an encrypted disk, but it only works if the code to do so can still be executed inside the guest. It needs to be possible to boot the "capture kernel" that kdump loads (via the kexec subsystem) to actually write the memory contents to disk.
Not least of all, Frank noted that further protections against side-channel attacks such as disabling simultaneous multithreading (SMT) and extra cache flushing need to be enforced by the trusted entity.
Overall, the basic blocks that underpin secure VMs are common across various CPU architectures. One concrete area where vendors can work together is on boot-related tooling. Both remote attestation and encrypting boot executables require extensive tooling. If CPU vendors can manage to not overly diverge in this area, they can work together on common tooling, instead of everyone maintaining their own bespoke tooling — e.g. sev-tool from AMD, genprotimg for s390, and so forth.
Enarx is
a relatively recent project that
wants to "make it simple to deploy [sensitive] workloads to a variety of
different Trusted Execution Environments
" (TEEs). It is CPU-architecture independent
and thus aims to create an abstraction for the different TEEs from processor
vendors. More specifically, Enarx provides encryption for "data in
use" (as opposed to data at rest or in transit) and manages
attestation — all without having to start from scratch for every
hardware platform.
"The importance and complexity of secure VMs will continue to increase with each [hardware] extension that is released for an architecture. Fortunately, we still have time to come together and discuss the collaboration possibilities," Frank concluded. Since several CPU architectures have introduced the idea and developers are working on the implementation of secure VMs, it is the perfect time for all of the vendors to work together.
[I'd like to thank Janosch Frank for a careful review of this article.]
Index entries for this article | |
---|---|
Security | Linux kernel |
Security | Virtualization |
GuestArticles | Chamarthy, Kashyap |
Conference | KVM Forum/2020 |
Posted Jan 3, 2021 8:47 UTC (Sun)
by moxfyre (guest, #13847)
[Link] (4 responses)
But here's what I don't understand: how can a VM possibly verify that it's running under a "trusted entity" which takes pains not to be able to access the guest's secret data, as opposed to a standard omniscient hypervisor — or more pointedly a malicious hypervisor which pretends to be a "trusted entity"?
If the VM can't actually verify its host environment's construction, then… what's the point?
It seems to come down to "trusting your cloud/VM hosting provider not to do leaky or malicious things", which is about where we are anyway.
Posted Jan 5, 2021 0:23 UTC (Tue)
by moxfyre (guest, #13847)
[Link] (3 responses)
Posted Jan 6, 2021 10:54 UTC (Wed)
by penguin42 (guest, #72294)
[Link]
Posted Jan 6, 2021 10:54 UTC (Wed)
by kashyap (subscriber, #55821)
[Link]
And indeed, since the "trusted entity" is the hardware, so you're placing trust in the CPU/SoC. I'll let more clueful people than me to correct me or add further details.
Posted Jan 6, 2021 19:44 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link]
All of this technology sounds fascinating, if cumbersome to implement.But how the virtual machine verify its host?
Based on this very interesting blog post from James Bottomley (https://blog.hansenpartnership.com/deploying-encrypted-images-for-confidential-computing)… it appears that this is, essentially, a rather complex scheme that — if it works correctly — means that the VM/guest owners have to place very little trust in the cloud provider, but a lot of trust in the CPU/SoC manufacturer.
But how will the virtual machine verify its host?
But how will the virtual machine verify its host?
back to the CPU vendor, and you have to trust the CPU vendor to have implemented the mechanism.
To your first comment, it's not necessarily that the guest can verify it's running under a trusted entity - it's that a 3rd party (running outside the potentially dodgy cloud) can verify that the VM they're talking to is running in a trusted setup before you give that VM work or a secret that is then used to do something.
But how will the virtual machine verify its host?
But how will the virtual machine verify its host?