Keeping memory contents secret

By Jonathan Corbet
November 15, 2019

One of the many responsibilities of the operating system is to help processes keep secrets from each other. Operating systems often fail in this regard, sometimes due to factors — such as hardware bugs and user-space vulnerabilities — that are beyond their direct control. It is thus unsurprising that there is an increasing level of interest in ways to improve the ability to keep data secret, perhaps even from the operating system itself. The MAP_EXCLUSIVE patch set from Mike Rapoport is one example of the work that is being done in this area; it also shows that the development community has not yet really begun to figure out how this type of feature should work.

MAP_EXCLUSIVE is a new flag for the mmap() system call; its purpose is to request a region of memory that is mapped only for the calling process and inaccessible to anybody else, including the kernel. It is a part of a larger address-space isolation effort underway in the memory-management subsystem, most of which is based on the idea that unmapped memory is much harder for an attacker to access.

Mapping a memory range with MAP_EXCLUSIVE has a number of effects. It automatically implies the MAP_LOCKED and MAP_POPULATE flags, meaning that the memory in question will be immediately faulted into RAM and locked there — it should never find its way to a swap area, for example. The MAP_PRIVATE and MAP_ANONYMOUS flags are required, and MAP_HUGETLB is not allowed. Pages that are mapped this way will not be copied if the process forks. They are also removed from the kernel's direct mapping — the linear mapping of all of physical memory — making them inaccessible to the kernel in most circumstances.

The goal behind MAP_EXCLUSIVE seems to have support within the community, but the actual implementation has raised a number of questions about how this functionality should work. One area of concern is the removal of the pages from the direct mapping. The kernel uses huge pages for that mapping, since that gives a significant performance improvement through decreased translation lookaside buffer (TLB) pressure. Carving specific pages out of that mapping requires splitting the huge pages into normal pages, slowing things down for every process in the system. The splitting of the direct mapping in another context caused a 2% performance regression at Facebook, according to Alexei Starovoitov in October; that is not a cost that everybody is willing to pay.

Elena Reshetova indicated that she has been working on similar functionality; rather than enhancing mmap(), her patch provides a new madvise() flag and requires that the secret areas be a multiple of the page size. Her version will eventually wipe any secret areas before returning the memory to general use in case the calling process doesn't do that.

Reshetova also raised the idea of mapping this memory uncached. The benefit of doing so would be to protect its contents from a whole range of speculative-execution attacks, known and unknown. On the other hand, the effect on application performance would be something between "painful" and "crippling", depending on how often the memory is accessed. Some users would likely welcome the extra protection; many others may well find that the performance penalty rules out this feature's use entirely. Andy Lutomirski said that uncached memory should only be provided if it is explicitly asked for, but Alan Cox responded that users generally do not know whether they want uncached memory or not.

More to the point, Cox continued, there may be any of a number of things that the system might do to protect the contents of secret memory; those things will vary from one system to the next and users will not be in a position to know what any specific system should use. That makes it all the more important to nail down what the MAP_EXCLUSIVE flag really means:

IMHO the question is what is the actual semantic here. What are you asking for ? Does it mean "at any cost", what does it guarantee (100% or statistically), what level of guarantee is acceptable, what level is -EOPNOTSUPP or similar ?

James Bottomley took this argument even further, describing MAP_EXCLUSIVE as "a usability problem". Protecting secret data might, on some systems, involve hardware technologies like TME and SEV, for example, but developers cannot know that in a general way. Somehow, Bottomley suggested, the kernel should make the best choice it can for how to protect secret memory; one such choice could be to make the memory uncached only on systems where the speculative-execution mitigations are not active. Lutomirski worried that this approach would not work, though; there are too many variables and ways in which things could go wrong.

There is only one truly clear conclusion from this discussion: a desire for memory with higher levels of secrecy exists, but the development community lacks a clear idea of how that secrecy should be implemented and how it should be presented to the user. That suggests that this feature will not be showing up in a mainline kernel anytime soon. Getting memory secrecy wrong risks saddling the community with the maintenance of a misdesigned interface and, possibly, giving application developers a false sense of security. It is better to go slow in the hope of getting things right.

Index entries for this article
Kernel	Memory management/Address-space isolation

Keeping memory contents secret

Posted Nov 15, 2019 21:03 UTC (Fri) by SEJeff (guest, #51588) [Link] (8 responses)

Something somewhat related is AMD's SEV feature which will encrypt guest memory with a key created on demand for that vm. It allows VM memory to be unreadable from the host (as it is all encrypted by the CPU). Redhat has a project to run VMs securely in the cloud for this. It is named Enarx:

https://github.com/enarx/enarx.github.io/wiki/Enarx-Intro...

https://developer.amd.com/sev/

Keeping memory contents secret

Posted Nov 15, 2019 22:04 UTC (Fri) by roc (subscriber, #30627) [Link] (6 responses)

Reusing the virtualization boundary for encrypted enclaves makes a lot more sense to me than SGX. I assume Intel took the SGX route because they have (had?) a greater appetite for complexity and there are some use cases that are lower-overhead with SGX.

Keeping memory contents secret

Posted Nov 16, 2019 0:54 UTC (Sat) by wahern (subscriber, #37304) [Link] (5 responses)

The distinguishing characteristic of SGX is remote attestation[1], which is relied upon for sending decryption keys for DRM'd content. And it was designed to make it easy to load and run SGX-protected code (e.g. DRM modules packaged with web browsers) as shared libraries from *within* userspace applications. Stronger DRM in browsers would have been a more difficult sell if usage was more complicated.

The confidentiality guarantee of SEV was eviscerated by timing attacks long before SGX was eviscerated, though certainly after the SEV papers nobody should have trusted SGX absent affirmative evidence that such timing attacks weren't going to keep metastasizing. The benefit of SEV is effectively the same as simply not mapping certain userspace regions into kernel space. If you control the kernel/hypervisor you can read the memory either way--either the easy way or, with SEV, the "hard way", which is actually not that difficult per the published attacks.

[1] AFAIU, SGX permits the userspace library to sign a challenge from the content provider; the content provider in turns asks Intel (through an online and, presumably, very expensive web service) to confirm validity. The kernel has nothing to do with any of this except for some basic initialization and management of SGX, which AFAIU is still absent in Linus' tree.

Keeping memory contents secret

Posted Nov 16, 2019 9:14 UTC (Sat) by roc (subscriber, #30627) [Link] (2 responses)

Do any browser DRM modules actually use SGX today? I'm not aware of it.

Keeping memory contents secret

Posted Nov 16, 2019 9:55 UTC (Sat) by wahern (subscriber, #37304) [Link] (1 responses)

4K Netflix was and apparently still is restricted to Windows 10 on Kaby Lake CPUs, which I and others assumed was because Kaby Lake was the first (or second) desktop CPU with SGX support. See https://help.netflix.com/en/node/23931 (expand Netflix Features : Netflix website)

Here's an interesting/confusing HN thread from 10 months ago with various claims: https://news.ycombinator.com/item?id=18828654. Despite the contradictory assertions, there's enough context, like the distinction between L1 and L3, to suggest hardware-backed DRM schemes are being relied upon even by Firebox and Chrome, but probably not using SGX, especially considering the Netflix requirements.

Keeping memory contents secret

Posted Nov 25, 2019 10:48 UTC (Mon) by sandeep_89 (guest, #127524) [Link]

Well, Chrome and Firefox do play protected media from streaming websites like Netflix, Prime Video, Hotstar etc. with 1080p quality on WIndows, but with software decoding for some of them (Prime Video, Hotstar).

Keeping memory contents secret

Posted Nov 16, 2019 9:31 UTC (Sat) by wahern (subscriber, #37304) [Link] (1 responses)

I may have been confusing SME and SEV regarding timelines. Either I'm remembering attacks on SME, or I was remembering this *proposed* SEV attack from 2017 (before SEV shipped): https://arxiv.org/abs/1712.05090 And apparently SEV does provide remote attestation, though it's broken: https://arxiv.org/abs/1908.11680

In any event, with or without remote attestation it's not reasonable to trust that a guest's memory is unreadable from a host; these technologies aren't holding up well to side channel attacks, not even on AMD chips, which have otherwise been comparably resilient. Better to consider it as defense-in-depth.

I think all of these developments augur *against* providing exacting semantics for anything promising confidentiality. The situation is *far* too fluid. We can't even say with strong confidence that SEV, not to mention SME or SGX, suffice. Any interface will be best effort as a practical matter, and will very likely need to be tweaked in the future in ways that change the performance and security characteristics. If you don't want developers to develop a false sense of security, then keep things conspicuously vague! Alternatively or in addition, avoid abstraction and pass through specific architecture interfaces and semantics as closely as possible, conspicuously passing along the risk, uncertainty, and responsibility to the developer. Anyhow, sometimes security is best served by recognizing that choice is an illusion and avoid giving choices.

The irony is that aside from attestation and physical attacks, the demand and urgency of these things come from the failures of existing software and hardware to meet *existing* confidentiality and security guarantees; guarantees that should already suffice. We should think twice about writing any more checks (i.e. particular confidentiality semantics) we aren't absolutely sure can be cashed. Anyhow, no company would care whether an AWS hypervisor could read guest memory if they could absolutely expect AWS' software and hardware to work as designed. The desire for zero-trust only exists in the minds of geeks, techno-libertarians, and Hollywood studios. Organizations are only going to depend on these new features to prove service providers' and their own diligence. That's not a cynical statement, just a recognition that at the end of the day they depend, and must depend, on the geeks to make reasonable decisions and continual adjustments. And the same is true for these types of security issues when it comes to the relationship between kernel interfaces and userland applications. The history of system entropy interfaces is particularly instructive.

Keeping memory contents secret

Posted Nov 18, 2019 18:49 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> I think all of these developments augur *against* providing exacting semantics for anything promising confidentiality. The situation is *far* too fluid. We can't even say with strong confidence that SEV, not to mention SME or SGX, suffice. Any interface will be best effort as a practical matter, and will very likely need to be tweaked in the future in ways that change the performance and security characteristics. If you don't want developers to develop a false sense of security, then keep things conspicuously vague! Alternatively or in addition, avoid abstraction and pass through specific architecture interfaces and semantics as closely as possible, conspicuously passing along the risk, uncertainty, and responsibility to the developer. Anyhow, sometimes security is best served by recognizing that choice is an illusion and avoid giving choices.

If I had to pick between a best-effort, vague interface, or a specific interface that's tied to implementation details, I'm pretty sure the former is more future proof. In the best case scenario, we can opportunistically begin offering real guarantees as they become available, and in a worst case scenario, we can just deprecate the whole thing since it never offered any guarantees to begin with.

> Anyhow, no company would care whether an AWS hypervisor could read guest memory if they could absolutely expect AWS' software and hardware to work as designed. The desire for zero-trust only exists in the minds of geeks, techno-libertarians, and Hollywood studios.

Certain industries have a tendency to ask for guarantees that are perhaps unnecessary or impractical, but are nevertheless required by some combination of laws, regulations, and industry standards. See for example PCI DSS, HIPAA, FIPS, and so on. It is entirely fair to think that this is a foolish thing for those industries to do, but ultimately, it's their money, and they are choosing to spend it (indirectly via AWS et al.) on building these features into the kernel.

Keeping memory contents secret

Posted Nov 16, 2019 9:49 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

Note that Enarx is not specifically about running VMs. It provides a trusted execution environment where there is mutual distrust between the host and the application (only the processor is trusted by both). The fact that KVM plus SEV can be used to realize the trusted execution environment is more or less an implementation detail; whether KVM+SEV is used, or rather something like SGX that doesn't involve virtualization, is completely transparent to the application.

Caching policy decisions

Posted Nov 15, 2019 23:14 UTC (Fri) by brouhaha (subscriber, #1698) [Link] (1 responses)

I'm in favor of having the user decide whether they want the secret page(s) to be cacheable. If the user doesn't know whether they want cached memory or not, the user had better darn well figure that out. While the kernel may know about what side channel mitigations are present or necessary, in general the kernel has far LESS knowledge of what the user's intended use case is for the secret memory, e.g., what access patterns are likely to be used with it.

Caching policy decisions

Posted Nov 17, 2019 7:56 UTC (Sun) by abufrejoval (guest, #100159) [Link]

Just how effective is the cache segmentation introduced with Broadwell in cases like this?
Or could/should it be enhanced so that cache segments can be exclusively assigned to sensitive portions of memory?

I guess it's also time to look at cachability being binary choice, when three to four levels of cache are becoming common.

Security sensitive code/data often won't be the majority of a machine's workload and if you have a high-core count legacy CPU, why should you not make a core temporarily exclusive to such code, to avoid side-channel issues. If you can exclude L2+ caching, performance won't suffer nearly as much, right?

This might not work where 2% performance loss put you out of business, but only wish that were true for Facebook, I guess.

Keeping memory contents secret

Posted Nov 16, 2019 7:25 UTC (Sat) by smurf (subscriber, #17840) [Link]

Will ptrace'ing that process also be disallowed, or did they forget to think about that vector?

Keeping memory contents secret

Posted Nov 16, 2019 14:41 UTC (Sat) by jthill (subscriber, #56558) [Link]

Seems to me any security need worth extreme measures like this can afford requiring huge-tlb pages. So what if you don't need 2MB of your-eyes-only storage, that's what you get because that's what's cheapest to provide.

Keeping memory contents secret

Posted Nov 17, 2019 13:34 UTC (Sun) by jcm (subscriber, #18262) [Link]

These are all approaches designed to mitigate attacks that have already occurred. The idea of unmapping memory is perhaps a good one, after all lots of things don’t need be to speculatively (pre)fetched into the cache hierarchy (and exposed to cache side-channel timing analysis) but simply saying “don’t cache stuff” isn’t a good idea. There are many examples of attacks already that don’t rely upon caches. Killing performance to solve yesterday’s vulnerability isn’t the answer.