Notes from the Intelpocalypse
Notes from the Intelpocalypse
Posted Jan 4, 2018 2:30 UTC (Thu) by excors (subscriber, #95769)Parent article: Notes from the Intelpocalypse
ARM's information at https://developer.arm.com/support/security-update says the Meltdown issue ("variant 3") only affects Cortex-A75 (which is very new - I'm not sure it's in any shipping devices yet). Some more common ones (A15/A57/A72) are affected by "variant 3a", where you speculatively read a supposedly-inaccessible system register instead of memory, which is a less serious problem since system registers don't contain as much sensitive information as memory. I think that means most Android phone users don't need to worry much about it.
(But it looks like all the out-of-order ARMs are vulnerable to Spectre.)
Posted Jan 4, 2018 3:15 UTC (Thu)
by ariagolliver (✭ supporter ✭, #85520)
[Link] (49 responses)
Posted Jan 4, 2018 3:25 UTC (Thu)
by nix (subscriber, #2304)
[Link] (8 responses)
Definitely a major change from the way caches work internally now, but not in any way impossible.
Posted Jan 4, 2018 7:53 UTC (Thu)
by kentonv (subscriber, #92073)
[Link] (7 responses)
Posted Jan 4, 2018 11:36 UTC (Thu)
by nix (subscriber, #2304)
[Link] (6 responses)
Posted Jan 4, 2018 19:40 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Posted Jan 4, 2018 20:19 UTC (Thu)
by bronson (subscriber, #4806)
[Link] (4 responses)
(In addition to being extremely well known for crypto timing attacks, it's how LIGO can measure 1/1000th of the width of a proton.)
Posted Jan 4, 2018 20:37 UTC (Thu)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Jan 4, 2018 21:44 UTC (Thu)
by roc (subscriber, #30627)
[Link] (2 responses)
Even if you think you can fix all those (I don't see how), it's difficult to be confident people aren't going to come up with new ways to estimate time. And each mitigation you introduce degrades the user experience.
Posted Jan 4, 2018 22:55 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Jan 5, 2018 17:26 UTC (Fri)
by anselm (subscriber, #2796)
[Link]
One important observation with covert channels is that in general, covert channels cannot be removed completely. Insisting that a system be 100% free of all conceivable covert channels is therefore not reasonable.
People doing security evaluations are usually satisfied when the covert channels that do inevitably exist provide such little bandwidth that they are, in practice, no longer useful to attackers.
Posted Jan 4, 2018 5:22 UTC (Thu)
by jimzhong (subscriber, #112928)
[Link] (31 responses)
Posted Jan 4, 2018 5:37 UTC (Thu)
by sfeam (subscriber, #2841)
[Link] (12 responses)
Posted Jan 4, 2018 7:22 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (11 responses)
Posted Jan 4, 2018 14:15 UTC (Thu)
by droundy (subscriber, #4559)
[Link]
Posted Jan 4, 2018 21:50 UTC (Thu)
by roc (subscriber, #30627)
[Link] (9 responses)
It would hurt performance but what else would really work?
Posted Jan 4, 2018 22:28 UTC (Thu)
by rahvin (guest, #16953)
[Link] (2 responses)
Posted Jan 4, 2018 22:40 UTC (Thu)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Jan 5, 2018 1:46 UTC (Fri)
by rahvin (guest, #16953)
[Link]
Posted Jan 4, 2018 22:51 UTC (Thu)
by sfeam (subscriber, #2841)
[Link] (1 responses)
Posted Jan 4, 2018 23:00 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Posted Jan 4, 2018 22:57 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Jan 4, 2018 23:01 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Posted Jan 5, 2018 0:03 UTC (Fri)
by excors (subscriber, #95769)
[Link]
Then you'd want to rearchitect software to minimise the amount of domain-switching. E.g. instead of a syscall accessing protected data from the same core as the application, it would just be a stub that sends a message to a dedicated kernel core. Neither core would have to flush their own cache, and they couldn't influence each other's cache. Obviously you'd have to get rid of cache coherence (I don't see how your proposal would be compatible with coherence either), and split shared L2/L3 caches into dynamically-adjustable per-domain partitions, and no hyperthreading, etc.
Then maybe someone will notice that DRAM chips remember the last row that was accessed, so a core can touch one of two rows and another core can detect which one responds faster, and leak information that way. Then we'll have to partition DRAM by domain too.
Eventually we might essentially have a network of tiny PCs, each with its own CPU and RAM and disk and dedicated to a single protection domain, completely isolated from each other except for an Ethernet link.
Hmm, I'm not sure that will be good enough either: Spectre gets code in one domain (e.g. the kernel) to leak data into cache that affects the timing of a memory read in another domain (e.g. userspace), but couldn't it work with a purely kernel-only cache, if you simply find an easily-timeable kernel call that performs the memory read for you? Then it doesn't matter how far removed the attacker is from the target.
Posted Jan 5, 2018 13:44 UTC (Fri)
by welinder (guest, #4699)
[Link]
I don't see tagging every memory location with an owner as a viable option.
Posted Jan 4, 2018 9:46 UTC (Thu)
by epa (subscriber, #39769)
[Link] (15 responses)
Posted Jan 4, 2018 11:08 UTC (Thu)
by excors (subscriber, #95769)
[Link] (9 responses)
The Meltdown PoC puts the memory read itself inside a speculative execution path, but I assume that's not strictly needed - it just makes the attack quicker/easier since you don't need to deal with a real page fault handler (because the fault gets unwound by the outer level of speculation).
Apparently the protection bits are stored alongside the data in L1$, so it seems like it shouldn't be expensive for the CPU to check those bits simultaneously with fetching the value, and then it can immediately replace the value with 0 or pretend it was a cache miss or whatever, so that it doesn't continue executing with the protected value. (But maybe it's more complicated than that in reality.)
Posted Jan 4, 2018 11:27 UTC (Thu)
by MarcB (subscriber, #101804)
[Link] (8 responses)
But this has no effect on Spectre, which is based on speculative execution without crossing security boundaries.
Posted Jan 4, 2018 22:56 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (3 responses)
I don't understand: array1->data[offset] is out of boundaries. If it were not then what information would be leaked?
Posted Jan 4, 2018 23:18 UTC (Thu)
by rahvin (guest, #16953)
[Link]
Posted Jan 4, 2018 23:26 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
Posted Jan 4, 2018 23:52 UTC (Thu)
by sfeam (subscriber, #2841)
[Link]
Posted Jan 4, 2018 23:17 UTC (Thu)
by samiam95124 (guest, #120873)
[Link] (3 responses)
Posted Jan 5, 2018 16:25 UTC (Fri)
by MarcB (subscriber, #101804)
[Link] (2 responses)
My understanding of Meltdown is that it uses the limited speculative execution caused by classic pipelining+out-of-order execution (it does not use the advanced speculative execution that is used by Spectre). Or does it just use the reordering of stages, i.e. "read" before "check"?
It boldly accesses memory it is not allowed to access and then "defuses" the exception by forking beforehand and sacrificing the child process. Or it avoids exceptions by using TSX and rolling back. It then checks if a given address was loaded into cache or not by the forbidden access.
And apparently this does not work on AMD - and AMD claimed to never make speculative accesses to forbidden addresses - i.e. they must be checking earlier or never reorder "read" before "check".
However, I do not see, how AMD could do this with TSX; there allowing this forbidden access seems to be part of the spec. Or does Ryzen not have TSX?
Posted Jan 5, 2018 17:34 UTC (Fri)
by foom (subscriber, #14868)
[Link]
(Also, no, AMD doesn't implement it)
Posted Jan 6, 2018 14:58 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Needless to say, if you have a way to exfiltrate the data other than a shared memory region, you can use it: the basic attack (relying on side-effects of speculations bound to fail) is the same.
Posted Jan 4, 2018 23:15 UTC (Thu)
by samiam95124 (guest, #120873)
[Link] (4 responses)
The key to speculative execution is that it has to cause no side effects that would not be there if the processor didn't speculatively execute at all. Obviously there is one the CPU designers didn't think of, which is access time. That's what makes this exploit a really, really clever one.
Posted Jan 5, 2018 6:52 UTC (Fri)
by epa (subscriber, #39769)
[Link] (3 responses)
I suggest that if practical, speculative execution should take memory protection into account, and if it gets to the point where an exception would be triggered, just stop speculating at that point and don't actually fetch the value from memory.
Posted Jan 5, 2018 7:03 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Jan 10, 2018 11:35 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
The thought occurs that a processor could have two active permission modes: one for normal execution and one for speculation. So even though the processor is executing in kernel mode (Ring 0), speculative accesses still get the memory permissions associated with user space. So the transition from user to kernel space would be broken into two steps: first the processor switches to kernel mode but leaves speculative accesses unprivileged; later, once deep inside the kernel, an explicit instruction could enable speculative fetches to kernel memory too.
(That might still let you snoop on another userspace process, of course.)
Posted Jan 15, 2018 19:00 UTC (Mon)
by ttonino (guest, #4073)
[Link]
Posted Jan 4, 2018 23:10 UTC (Thu)
by samiam95124 (guest, #120873)
[Link] (1 responses)
I suspect with time we will see several hardware fixes, but obviously with brand new CPUs.
Posted Jan 5, 2018 11:01 UTC (Fri)
by epa (subscriber, #39769)
[Link]
I think your proposal of returning zeroes only for speculative loads and faulting on the normal ones is preferable, if it can be implemented efficiently.
Posted Jan 4, 2018 15:19 UTC (Thu)
by jcm (subscriber, #18262)
[Link] (7 responses)
Posted Jan 4, 2018 16:36 UTC (Thu)
by ortalo (guest, #4654)
[Link] (2 responses)
Posted Jan 4, 2018 18:12 UTC (Thu)
by jcm (subscriber, #18262)
[Link]
Posted Jan 4, 2018 21:51 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Posted Jan 4, 2018 21:54 UTC (Thu)
by roc (subscriber, #30627)
[Link] (3 responses)
Posted Jan 4, 2018 22:52 UTC (Thu)
by jcm (subscriber, #18262)
[Link] (2 responses)
Posted Jan 4, 2018 23:02 UTC (Thu)
by roc (subscriber, #30627)
[Link]
Posted Jan 10, 2018 15:26 UTC (Wed)
by anton (subscriber, #25547)
[Link]
Other approaches for fixing the hardware without throwing out the baby with the bathwater could be to put any loaded cache lines in an enhanced version of the store buffer until speculation is resolved; and to (weakly) encrypt the address bits when accessing various shared hardware structures, combined with changing the secret frequently. I guess there are others, too.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
That might narrow the timing window but I don't think it would be sufficient to prevent the attack. The analysis of Spectre shows that hundreds of instructions may be executed speculatively before the misprediction is recognized, so snooping on the cache contents would still be possible during that interval.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
It's worse than you think. The use of cache as a side-channel was convenient for the proof-of-concept exploits but was not necessary. Mitigation that focuses on the cache rather than the speculative execution of invalid code is necessarily incomplete. The Spectre report notes: potential countermeasures limited to the memoryu cache are likely to be insufficient, since there are other ways that that speculative execution can leak information. For example, timing effects from memory bus contention, DRAM row address selection status, availability of virtual registers, ALU activity, [...] power and EM.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
the cpu chip -- memory reads that reach the main memory -- then you might get
caching effects there.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Spectre is particularly nasty if the target code runs in kernel space, hence the concern about user-supplied BPF code. But that is a special case. The general case is that Spectre snoops information from any process you can persuade to execute the leaking code. The snooping is easiest if that is another thread in the same process (e.g. an un-sandboxed browser window). No kernel space is involved there.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
It boldly accesses memory it is not allowed to access and then "defuses" the exception by forking beforehand and sacrificing the child process. Or it avoids exceptions by using TSX and rolling back. It then checks if a given address was loaded into cache or not by the forbidden access.
Nope. It boldly accesses memory and then uses the value read from that memory to read one of a variety of bits of memory it shares with the attacker, but it does all of that *behind a check which will fail*, so the reads are only ever done speculatively, and no exception is raised. Unfortunately the cache-loading done by that read still happens, and the hot cache is easily detectable by having the attacker time its own reads of the possible locations. (With more than two locations, you can exfiltrate more than one bit at once, possibly much more.)
Notes from the Intelpocalypse
Sure, it can't cause a memory exception based on something that might not happen -- but ideally it shouldn't speculate accesses to memory which isn't accessible. Currently, I think it is fair to say that speculative execution 'ignores' the memory protection, in this example at least. The accessibility of the memory doesn't have any impact on what speculative execution does.
Notes from the Intelpocalypse
The key to speculative execution is that it has to cause no side effects that would not be there if the processor didn't speculatively execute at all.
I think that is an impossible goal, at least if the purpose of speculation is to improve performance. The whole point of it is for the speedup side effects. So the effect of speculative execution will always be observable; what matters is to not speculatively execute (and make observable) operations which you would not be allowed to do in non-speculative execution.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Otherwise it would be easy to load cache lines with an extra bit 'speculative=1' and if non-speculative execution encountered such a line, regard it as invalid.
Sadly, that does not work: all execution is speculative, and most (?) of it is just not rolled back.
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Notes from the Intelpocalypse
Normally branch predictors don't tag (and check) their entries at all, they just use a bunch of bits (possibly after mixing them in a non-cryptographic way) to index into the table and use whatever prediction they find there (no prediction is just as bad for performance as misprediction, so they don't bother checking). Having the ASID as tag would be enough to avoid getting the predictor primed by an attacking process (won't help against an attack from untrusted code within the same process (e.g., JavaScript code), though).
Notes from the Intelpocalypse