Keeping memory contents secret
MAP_EXCLUSIVE is a new flag for the mmap() system call; its purpose is to request a region of memory that is mapped only for the calling process and inaccessible to anybody else, including the kernel. It is a part of a larger address-space isolation effort underway in the memory-management subsystem, most of which is based on the idea that unmapped memory is much harder for an attacker to access.
Mapping a memory range with MAP_EXCLUSIVE has a number of effects. It automatically implies the MAP_LOCKED and MAP_POPULATE flags, meaning that the memory in question will be immediately faulted into RAM and locked there — it should never find its way to a swap area, for example. The MAP_PRIVATE and MAP_ANONYMOUS flags are required, and MAP_HUGETLB is not allowed. Pages that are mapped this way will not be copied if the process forks. They are also removed from the kernel's direct mapping — the linear mapping of all of physical memory — making them inaccessible to the kernel in most circumstances.
The goal behind MAP_EXCLUSIVE seems to have support within the community, but the actual implementation has raised a number of questions about how this functionality should work. One area of concern is the removal of the pages from the direct mapping. The kernel uses huge pages for that mapping, since that gives a significant performance improvement through decreased translation lookaside buffer (TLB) pressure. Carving specific pages out of that mapping requires splitting the huge pages into normal pages, slowing things down for every process in the system. The splitting of the direct mapping in another context caused a 2% performance regression at Facebook, according to Alexei Starovoitov in October; that is not a cost that everybody is willing to pay.
Elena Reshetova indicated that she has been working on similar functionality; rather than enhancing mmap(), her patch provides a new madvise() flag and requires that the secret areas be a multiple of the page size. Her version will eventually wipe any secret areas before returning the memory to general use in case the calling process doesn't do that.
Reshetova also raised the idea of mapping this memory uncached. The benefit of doing so would be to protect its contents from a whole range of speculative-execution attacks, known and unknown. On the other hand, the effect on application performance would be something between "painful" and "crippling", depending on how often the memory is accessed. Some users would likely welcome the extra protection; many others may well find that the performance penalty rules out this feature's use entirely. Andy Lutomirski said that uncached memory should only be provided if it is explicitly asked for, but Alan Cox responded that users generally do not know whether they want uncached memory or not.
More to the point, Cox continued, there may be any of a number of things that the system might do to protect the contents of secret memory; those things will vary from one system to the next and users will not be in a position to know what any specific system should use. That makes it all the more important to nail down what the MAP_EXCLUSIVE flag really means:
James Bottomley took this
argument even further, describing MAP_EXCLUSIVE as "a
usability problem
". Protecting secret data might, on some systems,
involve hardware technologies like TME and
SEV, for example, but developers cannot know that in a general way.
Somehow, Bottomley suggested, the kernel should make
the best choice it can for how to protect secret memory; one such choice
could be to make the memory uncached only on systems where the
speculative-execution mitigations are not active. Lutomirski worried
that this approach would not work, though; there are too many variables and
ways in which things could go wrong.
There is only one truly clear conclusion from this discussion: a desire for
memory with higher levels of secrecy exists, but the development community
lacks a clear idea of how that secrecy should be implemented and how it
should be presented to the user. That suggests that this feature will not
be showing up in a mainline kernel anytime soon. Getting memory secrecy
wrong risks saddling the community with the maintenance of a misdesigned
interface and, possibly, giving application developers a false sense of
security. It is better to go slow in the hope of getting things right.
Index entries for this article | |
---|---|
Kernel | Memory management/Address-space isolation |
Posted Nov 15, 2019 21:03 UTC (Fri)
by SEJeff (guest, #51588)
[Link] (8 responses)
https://github.com/enarx/enarx.github.io/wiki/Enarx-Intro...
https://developer.amd.com/sev/
Posted Nov 15, 2019 22:04 UTC (Fri)
by roc (subscriber, #30627)
[Link] (6 responses)
Posted Nov 16, 2019 0:54 UTC (Sat)
by wahern (subscriber, #37304)
[Link] (5 responses)
The confidentiality guarantee of SEV was eviscerated by timing attacks long before SGX was eviscerated, though certainly after the SEV papers nobody should have trusted SGX absent affirmative evidence that such timing attacks weren't going to keep metastasizing. The benefit of SEV is effectively the same as simply not mapping certain userspace regions into kernel space. If you control the kernel/hypervisor you can read the memory either way--either the easy way or, with SEV, the "hard way", which is actually not that difficult per the published attacks.
[1] AFAIU, SGX permits the userspace library to sign a challenge from the content provider; the content provider in turns asks Intel (through an online and, presumably, very expensive web service) to confirm validity. The kernel has nothing to do with any of this except for some basic initialization and management of SGX, which AFAIU is still absent in Linus' tree.
Posted Nov 16, 2019 9:14 UTC (Sat)
by roc (subscriber, #30627)
[Link] (2 responses)
Posted Nov 16, 2019 9:55 UTC (Sat)
by wahern (subscriber, #37304)
[Link] (1 responses)
Here's an interesting/confusing HN thread from 10 months ago with various claims: https://news.ycombinator.com/item?id=18828654. Despite the contradictory assertions, there's enough context, like the distinction between L1 and L3, to suggest hardware-backed DRM schemes are being relied upon even by Firebox and Chrome, but probably not using SGX, especially considering the Netflix requirements.
Posted Nov 25, 2019 10:48 UTC (Mon)
by sandeep_89 (guest, #127524)
[Link]
Posted Nov 16, 2019 9:31 UTC (Sat)
by wahern (subscriber, #37304)
[Link] (1 responses)
In any event, with or without remote attestation it's not reasonable to trust that a guest's memory is unreadable from a host; these technologies aren't holding up well to side channel attacks, not even on AMD chips, which have otherwise been comparably resilient. Better to consider it as defense-in-depth.
I think all of these developments augur *against* providing exacting semantics for anything promising confidentiality. The situation is *far* too fluid. We can't even say with strong confidence that SEV, not to mention SME or SGX, suffice. Any interface will be best effort as a practical matter, and will very likely need to be tweaked in the future in ways that change the performance and security characteristics. If you don't want developers to develop a false sense of security, then keep things conspicuously vague! Alternatively or in addition, avoid abstraction and pass through specific architecture interfaces and semantics as closely as possible, conspicuously passing along the risk, uncertainty, and responsibility to the developer. Anyhow, sometimes security is best served by recognizing that choice is an illusion and avoid giving choices.
The irony is that aside from attestation and physical attacks, the demand and urgency of these things come from the failures of existing software and hardware to meet *existing* confidentiality and security guarantees; guarantees that should already suffice. We should think twice about writing any more checks (i.e. particular confidentiality semantics) we aren't absolutely sure can be cashed. Anyhow, no company would care whether an AWS hypervisor could read guest memory if they could absolutely expect AWS' software and hardware to work as designed. The desire for zero-trust only exists in the minds of geeks, techno-libertarians, and Hollywood studios. Organizations are only going to depend on these new features to prove service providers' and their own diligence. That's not a cynical statement, just a recognition that at the end of the day they depend, and must depend, on the geeks to make reasonable decisions and continual adjustments. And the same is true for these types of security issues when it comes to the relationship between kernel interfaces and userland applications. The history of system entropy interfaces is particularly instructive.
Posted Nov 18, 2019 18:49 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
If I had to pick between a best-effort, vague interface, or a specific interface that's tied to implementation details, I'm pretty sure the former is more future proof. In the best case scenario, we can opportunistically begin offering real guarantees as they become available, and in a worst case scenario, we can just deprecate the whole thing since it never offered any guarantees to begin with.
> Anyhow, no company would care whether an AWS hypervisor could read guest memory if they could absolutely expect AWS' software and hardware to work as designed. The desire for zero-trust only exists in the minds of geeks, techno-libertarians, and Hollywood studios.
Certain industries have a tendency to ask for guarantees that are perhaps unnecessary or impractical, but are nevertheless required by some combination of laws, regulations, and industry standards. See for example PCI DSS, HIPAA, FIPS, and so on. It is entirely fair to think that this is a foolish thing for those industries to do, but ultimately, it's their money, and they are choosing to spend it (indirectly via AWS et al.) on building these features into the kernel.
Posted Nov 16, 2019 9:49 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link]
Posted Nov 15, 2019 23:14 UTC (Fri)
by brouhaha (subscriber, #1698)
[Link] (1 responses)
Posted Nov 17, 2019 7:56 UTC (Sun)
by abufrejoval (guest, #100159)
[Link]
I guess it's also time to look at cachability being binary choice, when three to four levels of cache are becoming common.
Security sensitive code/data often won't be the majority of a machine's workload and if you have a high-core count legacy CPU, why should you not make a core temporarily exclusive to such code, to avoid side-channel issues. If you can exclude L2+ caching, performance won't suffer nearly as much, right?
This might not work where 2% performance loss put you out of business, but only wish that were true for Facebook, I guess.
Posted Nov 16, 2019 7:25 UTC (Sat)
by smurf (subscriber, #17840)
[Link]
Posted Nov 16, 2019 14:41 UTC (Sat)
by jthill (subscriber, #56558)
[Link]
Posted Nov 17, 2019 13:34 UTC (Sun)
by jcm (subscriber, #18262)
[Link]
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret
Caching policy decisions
Caching policy decisions
Or could/should it be enhanced so that cache segments can be exclusively assigned to sensitive portions of memory?
Keeping memory contents secret
Keeping memory contents secret
Keeping memory contents secret