Avoiding retpolines with static calls

Posted Mar 29, 2020 18:47 UTC (Sun) by Cyberax (✭ supporter ✭, #52523)
In reply to: Avoiding retpolines with static calls by NYKevin
Parent article: Avoiding retpolines with static calls

Speculative (mis-)execution is not a problem in itself, it's the fact that it may cause observable side effects. So one avenue of attack on this problem is removing the side effects.

Avoiding retpolines with static calls

Posted Mar 29, 2020 20:56 UTC (Sun) by anton (subscriber, #25547) [Link] (6 responses)

Yes. E.g., do not update the caches from speculative loads. Instead, keep the load results in a private buffer and only update the caches if the loads become non-speculative, or when they are committed.

Another avenue is to make it hard to train the predictor for a Spectre v2 exploit, e.g., by scrambling the input and/or the output of the predictor with a per-thread secret that changes now and then. You can combine this avenue with the other one.

Avoiding retpolines with static calls

Posted Mar 29, 2020 21:01 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

I've spoken with hardware people about it a couple of years ago and they had some ideas like separate speculative caches, that can be referred only from speculative contexts. Or adding artificial access delay to cache lines added by from speculative contexts. I'm sure other ideas will come in future.

Avoiding retpolines with static calls

Posted Mar 30, 2020 1:52 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (4 responses)

I'm not convinced that is sufficient. If there is any statistical relationship between any part of the processor's observable behavior and any data that any higher-level code might possibly consider a "secret," then you can perform a statistical analysis of that behavior to recover that secret, with arbitrarily good statistical confidence (depending on the available data). This is basic information theory. Adding noise at random parts of the processor will force the attacker to gather more data, but if the system under attack belongs to a cloud provider, then that's not an obstacle (because renting out processors for extended periods is their entire business model).

The problem is that performance is a form of observable behavior. So you have to completely disentangle performance variations from every piece of data in main memory, cache, and registers, or else you have to exhaustively prove that the code under execution has access to that data, in every security model that the system cares about. The latter would extend to, for example, determining that the code under execution lives inside of a Javascript sandbox and should not be allowed to access the state of the rest of the web browser. I don't see how the processor can be reasonably expected to do that, so we're back to the first option: The processor's performance must not vary in response to any data at all. But then you can't have a branch predictor.

I would really like someone to convince me that I'm wrong, because if true, this is rather terrifying. But as far as I can see, there is no catch-all mitigation for this problem.

Avoiding retpolines with static calls

Posted Mar 30, 2020 2:32 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> because renting out processors for extended periods is their entire business model
Pretty much all major cloud computing providers assign CPUs exclusively to customers, except maybe for the cheapest offerings (like T2/T3 instances on AWS).

Avoiding retpolines with static calls

Posted Apr 2, 2020 8:35 UTC (Thu) by wahern (subscriber, #37304) [Link] (1 responses)

By CPUs do you mean cores or packages? Multiple cores on a single package share cache (e.g. L3) and are vulnerable to cross-core, cross-VM leaks. (See https://www.usenix.org/system/files/conference/usenixsecu....) I don't know how much of a headache are cross-package attacks that exploit cache coherency protocols, but there are such attacks in the literature, which I'd keep in mind for high-value assets absent details from AWS.

I'd be surprised if the bulk of instances were package isolated. The Xeon Platinum 8175M used for M5 instances has 24 cores per package, 48 threads, and therefore 48 vCPUs. AFAIU, EC2 doesn't share cores, but a vCPU is still the equivalent of a logical SMT-based core, so any instance type using less than 47 vCPUs would be leaving at least an entire core unused. AWS offers 2-, 8-, 16-, 32-, 48-, 64-, and 96-vCPU M5 instances. I'd bet a large number and probably a majority of customers are utilizing 2- to 32-vCPU instances, that they invariably share packages, and thus share L3 cache. And I'd also bet that 64-vCPU instances share one of their packages with other instances.

Avoiding retpolines with static calls

Posted Apr 2, 2020 16:37 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I'm pretty sure AWS splits nodes in such way that instances don't share cache.

> I'd keep in mind for high-value assets absent details from AWS.
All major cloud providers also have dedicated instances that won't be shared across customers. In case of AWS you can even run your own T2/T3 instances on top of the dedicated hardware nodes.

Avoiding retpolines with static calls

Posted Mar 30, 2020 9:50 UTC (Mon) by anton (subscriber, #25547) [Link]

It's not clear what attack scenario you have in mind. Can you give an example?

If you refer to the scrambling of the branch predictor inputs and outputs , note that the secret changes regularly (with an interval designed such that only a small part of the secret can be extracted).

Concerning attacks from within the same thread (as in JavaScript engines), assuming good-enough scrambling, the attacker cannot predict which other branches a branch is aliased with in the branch predictor, and also cannot get the prediction to branch to the gadget the attacker wants to be speculatively executed.

Concerning observable behaviour, lots of things are theoretically observable (e.g., through power consumption or electromagnetic emissions), and have been used practically on other components (e.g., emissions from CRTs or screen cables to watch what you are watching), but the danger is that you get into a "the sky is falling" mode of thinking that does not differentiate how realistically possible an attack is. I have seen hardware and software providers blamed for waiting for the demonstration of an attack before they start acting, but OTOH, if they acted on any far-fetched theoretical weakness (not even attack scenario) that somebody worries about, they would not have a product.

Avoiding retpolines with static calls

Posted Apr 2, 2020 9:32 UTC (Thu) by ras (subscriber, #33059) [Link] (1 responses)

Uhmmm, in this case the side effect they are exploiting is the speed up of the program. That happens to the very side effect caches were added for. Any mitigation will reduce that highly desirable side effect, and thus reduce the usefulness of cache's in general.

The more general problem is privilege separation. This is just a case of lower privilege code being able to deduct what higher privilege code left in its cache. I wonder how long it will be before someone figures out how to exploit spectre via a javascript JIT, or wasm. The only solution is hardware protecting the browser memory from the javascript, but the hardware doesn't provide anything lightweight enough. If we did have something lightweight enough micro kernels would be more viable.

Avoiding retpolines with static calls

Posted Apr 2, 2020 12:04 UTC (Thu) by anton (subscriber, #25547) [Link]

The question is how much a proper hardware mitigation costs compared to the current mitigations. So what are the costs of not updating caches on speculative memory accesses?

If the access becomes non-speculative (the vast majority of cases in most code), the caches will be updated a number of cycles later. The performance impact of that is minimal.

If the access is on a mispredicted path, the caches will not be updated. If the line is then not accessed before it would be evicted, this is a win. If the line is accessed, we incur the cost of another access to an outer level or main memory. A somewhat common case is when you have a short if, and then a load after it has ended; if the if is mispredicted, and the load misses the cache, the speculative execution will first perform an outer access, and then it will be reexecuted on the corrected path and will perform the outer access again; the cost here is the smaller of the cache miss cost and the branch misprediction cost. And the combination is relatively rare.

Overall, not updating caches on speculative memory accesses reduces the usefulness of caches by very little, if at all (there are also cases where it increases the usefulness of caches).

A Spectre-based exploit using JavaScript has been published pretty early. No, MMU-style memory protection is not the only solution, Spectre-safe hardware as outlined above would also be a solution.