Notes from the Intelpocalypse

Posted Jan 4, 2018 2:30 UTC (Thu) by excors (subscriber, #95769)
Parent article: Notes from the Intelpocalypse

> Intel and ARM processors seem to be vulnerable to this issue; AMD processors evidently are not.

ARM's information at https://developer.arm.com/support/security-update says the Meltdown issue ("variant 3") only affects Cortex-A75 (which is very new - I'm not sure it's in any shipping devices yet). Some more common ones (A15/A57/A72) are affected by "variant 3a", where you speculatively read a supposedly-inaccessible system register instead of memory, which is a less serious problem since system registers don't contain as much sensitive information as memory. I think that means most Android phone users don't need to worry much about it.

(But it looks like all the out-of-order ARMs are vulnerable to Spectre.)

Notes from the Intelpocalypse

Posted Jan 4, 2018 3:15 UTC (Thu) by ariagolliver (✭ supporter ✭, #85520) [Link] (49 responses)

Wouldn't all out-of-order chips are vulnerable to Spectre? I'd be interested to read how speculation and caching could coexist on the same chip without it being vulnerable to some kind of side-channel.

Notes from the Intelpocalypse

Posted Jan 4, 2018 3:25 UTC (Thu) by nix (subscriber, #2304) [Link] (8 responses)

I think you'd have to leave enough cache empty to satisfy likely ongoing speculation, evict material read into the cache to satisfy speculation iff the speculation fails, and *not* evict anything merely to free up cache to satisfy speculations (i.e. evict at retirement time, to keep a bit of space free).

Definitely a major change from the way caches work internally now, but not in any way impossible.

Notes from the Intelpocalypse

Posted Jan 4, 2018 7:53 UTC (Thu) by kentonv (subscriber, #92073) [Link] (7 responses)

As noted in the paper, cache effects are not the only side effect of speculative execution. Other effects, like the amount of time spent speculating, seem difficult to hide...

Notes from the Intelpocalypse

Posted Jan 4, 2018 11:36 UTC (Thu) by nix (subscriber, #2304) [Link] (6 responses)

I just babbled about possible high-res-timer-related mitigations here: <https://lwn.net/Articles/742867/>. All a bit painful (and with user-visible consequences if you actually *need* accurate high-res times many times a second) but a lot less painful than the reported KPTI slowdown, ISTM.

Notes from the Intelpocalypse

Posted Jan 4, 2018 19:40 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

It won't work, unless you also disable multi-threading totally. You can cobble up a high-resolution timer by having one thread do N writes to a buffer and the other thread observing a value at a fixed offset within this buffer.

Notes from the Intelpocalypse

Posted Jan 4, 2018 20:19 UTC (Thu) by bronson (subscriber, #4806) [Link] (4 responses)

If you have enough time to perform the attack, it won't work period. Even if I'm only allowed a very low resolution timer, I can compensate by performing lots of operations and running some statistics.

(In addition to being extremely well known for crypto timing attacks, it's how LIGO can measure 1/1000th of the width of a proton.)

Notes from the Intelpocalypse

Posted Jan 4, 2018 20:37 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

"Performing lots of operations and running some statistics" probably slows the attack from a 500KiB/s flood down to a trickle, though. It seems a useful amelioration, at least.

Notes from the Intelpocalypse

Posted Jan 4, 2018 21:44 UTC (Thu) by roc (subscriber, #30627) [Link] (2 responses)

But the multithreading approach Cyberax noted is a showstopper. Note that it also works with multiple single-threaded processes that share memory. It could even be made to work without shared memory, just with one process writing a counter to a file and another process reading it.

Even if you think you can fix all those (I don't see how), it's difficult to be confident people aren't going to come up with new ways to estimate time. And each mitigation you introduce degrades the user experience.

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:55 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Another one I've heard is to submit an asynchronous disk request and time its completion.

Notes from the Intelpocalypse

Posted Jan 5, 2018 17:26 UTC (Fri) by anselm (subscriber, #2796) [Link]

One important observation with covert channels is that in general, covert channels cannot be removed completely. Insisting that a system be 100% free of all conceivable covert channels is therefore not reasonable.

People doing security evaluations are usually satisfied when the covert channels that do inevitably exist provide such little bandwidth that they are, in practice, no longer useful to attackers.

Notes from the Intelpocalypse

Posted Jan 4, 2018 5:22 UTC (Thu) by jimzhong (subscriber, #112928) [Link] (31 responses)

I think one way is to redesign branch prediction so that when a misprediction occurs, in addition to flushing instructions in the mispredicted branch, the cache is also restored to the state before taking the branch. But this fix might be expensive.

Notes from the Intelpocalypse

Posted Jan 4, 2018 5:37 UTC (Thu) by sfeam (subscriber, #2841) [Link] (12 responses)

That might narrow the timing window but I don't think it would be sufficient to prevent the attack. The analysis of Spectre shows that hundreds of instructions may be executed speculatively before the misprediction is recognized, so snooping on the cache contents would still be possible during that interval.

Notes from the Intelpocalypse

Posted Jan 4, 2018 7:22 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

And so the ways to snoop on the cache contents should be curtailed.

Notes from the Intelpocalypse

Posted Jan 4, 2018 14:15 UTC (Thu) by droundy (subscriber, #4559) [Link]

Indeed, I wonder about the possibility of separate speculative caches. Sounds terribly expensive, though.

Notes from the Intelpocalypse

Posted Jan 4, 2018 21:50 UTC (Thu) by roc (subscriber, #30627) [Link] (9 responses)

The only reasonable and watertight way to do that that I can think of is to partition the cache by protection domain. So cache lines would have owners: the kernel, specific user-space processes, and even within processes you'd want separate cache lines for JS vs the browser. A cache lookup would have to find a line owned by the current protection domain; if it did not, that has to be treated as a miss, and you would only be allowed to evict cache lines owned by the current domain.

It would hurt performance but what else would really work?

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:28 UTC (Thu) by rahvin (guest, #16953) [Link] (2 responses)

That sounds like a fix that would destroy cache effectiveness, you'd probably also enable a DOS attack that causes the cache to be partitioned until there isn't any cache left and things start locking up.

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:40 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

There are probably ways to avoid lockup. I agree the performance impact would be bad. But what else really works?

Notes from the Intelpocalypse

Posted Jan 5, 2018 1:46 UTC (Fri) by rahvin (guest, #16953) [Link]

And we all thought heart-bleed was the worst thing ever, kinda pales in comparison to this.

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:51 UTC (Thu) by sfeam (subscriber, #2841) [Link] (1 responses)

It's worse than you think. The use of cache as a side-channel was convenient for the proof-of-concept exploits but was not necessary. Mitigation that focuses on the cache rather than the speculative execution of invalid code is necessarily incomplete. The Spectre report notes: potential countermeasures limited to the memoryu cache are likely to be insufficient, since there are other ways that that speculative execution can leak information. For example, timing effects from memory bus contention, DRAM row address selection status, availability of virtual registers, ALU activity, [...] power and EM.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:00 UTC (Thu) by roc (subscriber, #30627) [Link]

Yeah, I read the paper. Just addressing the cache question since it was raised.

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:57 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Switchable caches (by PCID), perhaps?

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:01 UTC (Thu) by roc (subscriber, #30627) [Link]

That's basically the same as partitioning the cache, isn't it?

Notes from the Intelpocalypse

Posted Jan 5, 2018 0:03 UTC (Fri) by excors (subscriber, #95769) [Link]

Rather than restricting each domain to a tiny partition of the cache (which sounds painful for L1), perhaps you could let each domain use the whole cache (like now) but flush it every time you switch domain.

Then you'd want to rearchitect software to minimise the amount of domain-switching. E.g. instead of a syscall accessing protected data from the same core as the application, it would just be a stub that sends a message to a dedicated kernel core. Neither core would have to flush their own cache, and they couldn't influence each other's cache. Obviously you'd have to get rid of cache coherence (I don't see how your proposal would be compatible with coherence either), and split shared L2/L3 caches into dynamically-adjustable per-domain partitions, and no hyperthreading, etc.

Then maybe someone will notice that DRAM chips remember the last row that was accessed, so a core can touch one of two rows and another core can detect which one responds faster, and leak information that way. Then we'll have to partition DRAM by domain too.

Eventually we might essentially have a network of tiny PCs, each with its own CPU and RAM and disk and dedicated to a single protection domain, completely isolated from each other except for an Ethernet link.

Hmm, I'm not sure that will be good enough either: Spectre gets code in one domain (e.g. the kernel) to leak data into cache that affects the timing of a memory read in another domain (e.g. userspace), but couldn't it work with a purely kernel-only cache, if you simply find an easily-timeable kernel call that performs the memory read for you? Then it doesn't matter how far removed the attacker is from the target.

Notes from the Intelpocalypse

Posted Jan 5, 2018 13:44 UTC (Fri) by welinder (guest, #4699) [Link]

Even that might not be enough. If any information based on speculation has left
the cpu chip -- memory reads that reach the main memory -- then you might get
caching effects there.

I don't see tagging every memory location with an owner as a viable option.

Notes from the Intelpocalypse

Posted Jan 4, 2018 9:46 UTC (Thu) by epa (subscriber, #39769) [Link] (15 responses)

In the third example I was surprised that the access through the kernel pointer didn't generate a memory protection fault. But then I remembered that it never really 'happened' because the if-condition is always false (but mispredicted as true). The issue is surely that speculative execution ignores the memory protection. The fix would be to limit speculative execution to memory that's definitely permitted (even if that means a few odd cases now get slower).

Notes from the Intelpocalypse

Posted Jan 4, 2018 11:08 UTC (Thu) by excors (subscriber, #95769) [Link] (9 responses)

I don't think it's fair to say it ignores the memory protection - it fetches the value (from L1$) and just predicts that there won't be a fault, carries on speculatively executing as if there wasn't a fault, and then eventually checks the memory protection and unwinds (most of) the CPU state when it realises it predicted wrong. The problem is that the CPU's behaviour during the speculative part is subtly observable, and so the fetched value is observable.

The Meltdown PoC puts the memory read itself inside a speculative execution path, but I assume that's not strictly needed - it just makes the attack quicker/easier since you don't need to deal with a real page fault handler (because the fault gets unwound by the outer level of speculation).

Apparently the protection bits are stored alongside the data in L1$, so it seems like it shouldn't be expensive for the CPU to check those bits simultaneously with fetching the value, and then it can immediately replace the value with 0 or pretend it was a cache miss or whatever, so that it doesn't continue executing with the protected value. (But maybe it's more complicated than that in reality.)

Notes from the Intelpocalypse

Posted Jan 4, 2018 11:27 UTC (Thu) by MarcB (subscriber, #101804) [Link] (8 responses)

The last paragraph is basically what seems to be the difference between Intel and AMD, and why AMD is not affected by Meltdown: AMD checks permissions - and aborts, if permissions would be violated - before measurable side-effects occur, Intel afterwards.

But this has no effect on Spectre, which is based on speculative execution without crossing security boundaries.

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:56 UTC (Thu) by marcH (subscriber, #57642) [Link] (3 responses)

> But this has no effect on Spectre, which is based on speculative execution without crossing security boundaries.

I don't understand: array1->data[offset] is out of boundaries. If it were not then what information would be leaked?

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:18 UTC (Thu) by rahvin (guest, #16953) [Link]

That it was in boundary. The point is you can find the boundaries I believe. Once you know the boundaries you can start extracting data beyond the boundaries a bit at a time after a number of cycles you've extracted something potentially valuable like login credentials or encryption keys.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:26 UTC (Thu) by marcH (subscriber, #57642) [Link] (1 responses)

I kept missing one of the main differences between meltdown and spectre: spectre runs in kernel space, meltdown doesn't. Sorry for the noise.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:52 UTC (Thu) by sfeam (subscriber, #2841) [Link]

Spectre is particularly nasty if the target code runs in kernel space, hence the concern about user-supplied BPF code. But that is a special case. The general case is that Spectre snoops information from any process you can persuade to execute the leaking code. The snooping is easiest if that is another thread in the same process (e.g. an un-sandboxed browser window). No kernel space is involved there.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:17 UTC (Thu) by samiam95124 (guest, #120873) [Link] (3 responses)

Sorry, that is just not true. You are mixing speculative and non-speculative execution.

Notes from the Intelpocalypse

Posted Jan 5, 2018 16:25 UTC (Fri) by MarcB (subscriber, #101804) [Link] (2 responses)

What do you mean?

My understanding of Meltdown is that it uses the limited speculative execution caused by classic pipelining+out-of-order execution (it does not use the advanced speculative execution that is used by Spectre). Or does it just use the reordering of stages, i.e. "read" before "check"?

It boldly accesses memory it is not allowed to access and then "defuses" the exception by forking beforehand and sacrificing the child process. Or it avoids exceptions by using TSX and rolling back. It then checks if a given address was loaded into cache or not by the forbidden access.

And apparently this does not work on AMD - and AMD claimed to never make speculative accesses to forbidden addresses - i.e. they must be checking earlier or never reorder "read" before "check".

However, I do not see, how AMD could do this with TSX; there allowing this forbidden access seems to be part of the spec. Or does Ryzen not have TSX?

Notes from the Intelpocalypse

Posted Jan 5, 2018 17:34 UTC (Fri) by foom (subscriber, #14868) [Link]

TSX isn't supposed to allow you to access memory you don't have permission to access, it just triggers a different response if you try – instead of a sigsegv, you get a transaction abort.

(Also, no, AMD doesn't implement it)

Notes from the Intelpocalypse

Posted Jan 6, 2018 14:58 UTC (Sat) by nix (subscriber, #2304) [Link]

It boldly accesses memory it is not allowed to access and then "defuses" the exception by forking beforehand and sacrificing the child process. Or it avoids exceptions by using TSX and rolling back. It then checks if a given address was loaded into cache or not by the forbidden access.

Nope. It boldly accesses memory and then uses the value read from that memory to read one of a variety of bits of memory it shares with the attacker, but it does all of that *behind a check which will fail*, so the reads are only ever done speculatively, and no exception is raised. Unfortunately the cache-loading done by that read still happens, and the hot cache is easily detectable by having the attacker time its own reads of the possible locations. (With more than two locations, you can exfiltrate more than one bit at once, possibly much more.)

Needless to say, if you have a way to exfiltrate the data other than a shared memory region, you can use it: the basic attack (relying on side-effects of speculations bound to fail) is the same.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:15 UTC (Thu) by samiam95124 (guest, #120873) [Link] (4 responses)

Its not that speculative exec "ignores memory protection", but that you can't cause exceptions based on what might not even happen. Go down that road and you would be causing faults everywhere.

The key to speculative execution is that it has to cause no side effects that would not be there if the processor didn't speculatively execute at all. Obviously there is one the CPU designers didn't think of, which is access time. That's what makes this exploit a really, really clever one.

Notes from the Intelpocalypse

Posted Jan 5, 2018 6:52 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

Sure, it can't cause a memory exception based on something that might not happen -- but ideally it shouldn't speculate accesses to memory which isn't accessible. Currently, I think it is fair to say that speculative execution 'ignores' the memory protection, in this example at least. The accessibility of the memory doesn't have any impact on what speculative execution does.

I suggest that if practical, speculative execution should take memory protection into account, and if it gets to the point where an exception would be triggered, just stop speculating at that point and don't actually fetch the value from memory.

The key to speculative execution is that it has to cause no side effects that would not be there if the processor didn't speculatively execute at all.

I think that is an impossible goal, at least if the purpose of speculation is to improve performance. The whole point of it is for the speedup side effects. So the effect of speculative execution will always be observable; what matters is to not speculatively execute (and make observable) operations which you would not be allowed to do in non-speculative execution.

Notes from the Intelpocalypse

Posted Jan 5, 2018 7:03 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

AMD checks permissions before speculatively executing stuff. But this doesn't protect against Spectre.

Notes from the Intelpocalypse

Posted Jan 10, 2018 11:35 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

You are right, I believe I was only talking about Meltdown, not Spectre.

The thought occurs that a processor could have two active permission modes: one for normal execution and one for speculation. So even though the processor is executing in kernel mode (Ring 0), speculative accesses still get the memory permissions associated with user space. So the transition from user to kernel space would be broken into two steps: first the processor switches to kernel mode but leaves speculative accesses unprivileged; later, once deep inside the kernel, an explicit instruction could enable speculative fetches to kernel memory too.

(That might still let you snoop on another userspace process, of course.)

Notes from the Intelpocalypse

Posted Jan 15, 2018 19:00 UTC (Mon) by ttonino (guest, #4073) [Link]

I'm afraid that all execution is speculative, but it is not rolled back afterwards.
Otherwise it would be easy to load cache lines with an extra bit 'speculative=1' and if non-speculative execution encountered such a line, regard it as invalid.
Sadly, that does not work: all execution is speculative, and most (?) of it is just not rolled back.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:10 UTC (Thu) by samiam95124 (guest, #120873) [Link] (1 responses)

I suspect the final hardware fix would be blinding the spec exec unit from unpermissioned pages. IE., you can't cause a fault from a speculative execution from a non-permissioned page, because that would give a fault where none would actually occur. But the CPU knows that the memory accessed is not in the user ring. Without redesigning the entire spec unit, you blind the data fetch by replacing it with, say, 0s. Then the side effects are not useful.

I suspect with time we will see several hardware fixes, but obviously with brand new CPUs.

Notes from the Intelpocalypse

Posted Jan 5, 2018 11:01 UTC (Fri) by epa (subscriber, #39769) [Link]

Or indeed you could change the way memory protection works altogether such that any access to a forbidden address returns zero and sets a processor flag to be checked asynchronously. (Speculative access would not set the flag.) The kernel could then kill the process a short while later if the flag is set. This would obviously make things less robust by allowing processes to continue blithely past bad pointer accesses, at least for a short while.

I think your proposal of returning zeroes only for speculative loads and faulting on the normal ones is preferable, if it can be implemented efficiently.

Notes from the Intelpocalypse

Posted Jan 4, 2018 15:19 UTC (Thu) by jcm (subscriber, #18262) [Link] (7 responses)

To be vulnerable to branch predictor abuse, you need to be able to train the predictor. If you (correctly) index your predictor using all of the bits of the VA, as opposed to the low order bits, you remove the most obvious route of attack. It's great we can finally talk about these problems together!

Notes from the Intelpocalypse

Posted Jan 4, 2018 16:36 UTC (Thu) by ortalo (guest, #4654) [Link] (2 responses)

Could you elaborate? (More specifically, what is the 'VA' here?)

Notes from the Intelpocalypse

Posted Jan 4, 2018 18:12 UTC (Thu) by jcm (subscriber, #18262) [Link]

At some point. Let's give this all time to settle down :)

Notes from the Intelpocalypse

Posted Jan 4, 2018 21:51 UTC (Thu) by roc (subscriber, #30627) [Link]

VA = Virtual Address

Notes from the Intelpocalypse

Posted Jan 4, 2018 21:54 UTC (Thu) by roc (subscriber, #30627) [Link] (3 responses)

Seems like that would help user -> kernel attacks but not user -> user attacks.

Notes from the Intelpocalypse

Posted Jan 4, 2018 22:52 UTC (Thu) by jcm (subscriber, #18262) [Link] (2 responses)

Nah, it's VA+ASID/PCID. It's actually very simple to have a branch predictor that is safe against variant 2. You just need to have your index completely disambiguate against other live contexts. The only problem with this is it's more bits to compare, but as compared to not having any branch prediction within one of the contexts, or flushing the predictor, I know which I prefer. I expect all of the vendors to make this relatively trivial fix in future silicon and then apply CONFIG_MARKETING to over hype it.

Notes from the Intelpocalypse

Posted Jan 4, 2018 23:02 UTC (Thu) by roc (subscriber, #30627) [Link]

That makes sense, although in my defense you did say "all the bits of the VA" :-).

Notes from the Intelpocalypse

Posted Jan 10, 2018 15:26 UTC (Wed) by anton (subscriber, #25547) [Link]

Normally branch predictors don't tag (and check) their entries at all, they just use a bunch of bits (possibly after mixing them in a non-cryptographic way) to index into the table and use whatever prediction they find there (no prediction is just as bad for performance as misprediction, so they don't bother checking). Having the ASID as tag would be enough to avoid getting the predictor primed by an attacking process (won't help against an attack from untrusted code within the same process (e.g., JavaScript code), though).

Other approaches for fixing the hardware without throwing out the baby with the bathwater could be to put any loaded cache lines in an enhanced version of the store buffer until speculation is resolved; and to (weakly) encrypt the address bits when accessing various shared hardware structures, combined with changing the secret frequently. I guess there are others, too.