Avoiding retpolines with static calls
Avoiding retpolines with static calls
Posted Mar 29, 2020 18:47 UTC (Sun) by Cyberax (✭ supporter ✭, #52523)In reply to: Avoiding retpolines with static calls by NYKevin
Parent article: Avoiding retpolines with static calls
Posted Mar 29, 2020 20:56 UTC (Sun)
by anton (subscriber, #25547)
[Link] (6 responses)
Another avenue is to make it hard to train the predictor for a Spectre v2 exploit, e.g., by scrambling the input and/or the output of the predictor with a per-thread secret that changes now and then. You can combine this avenue with the other one.
Posted Mar 29, 2020 21:01 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Mar 30, 2020 1:52 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
The problem is that performance is a form of observable behavior. So you have to completely disentangle performance variations from every piece of data in main memory, cache, and registers, or else you have to exhaustively prove that the code under execution has access to that data, in every security model that the system cares about. The latter would extend to, for example, determining that the code under execution lives inside of a Javascript sandbox and should not be allowed to access the state of the rest of the web browser. I don't see how the processor can be reasonably expected to do that, so we're back to the first option: The processor's performance must not vary in response to any data at all. But then you can't have a branch predictor.
I would really like someone to convince me that I'm wrong, because if true, this is rather terrifying. But as far as I can see, there is no catch-all mitigation for this problem.
Posted Mar 30, 2020 2:32 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Apr 2, 2020 8:35 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (1 responses)
I'd be surprised if the bulk of instances were package isolated. The Xeon Platinum 8175M used for M5 instances has 24 cores per package, 48 threads, and therefore 48 vCPUs. AFAIU, EC2 doesn't share cores, but a vCPU is still the equivalent of a logical SMT-based core, so any instance type using less than 47 vCPUs would be leaving at least an entire core unused. AWS offers 2-, 8-, 16-, 32-, 48-, 64-, and 96-vCPU M5 instances. I'd bet a large number and probably a majority of customers are utilizing 2- to 32-vCPU instances, that they invariably share packages, and thus share L3 cache. And I'd also bet that 64-vCPU instances share one of their packages with other instances.
Posted Apr 2, 2020 16:37 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
> I'd keep in mind for high-value assets absent details from AWS.
Posted Mar 30, 2020 9:50 UTC (Mon)
by anton (subscriber, #25547)
[Link]
If you refer to the scrambling of the branch predictor inputs and outputs , note that the secret changes regularly (with an interval designed such that only a small part of the secret can be extracted).
Concerning attacks from within the same thread (as in JavaScript engines), assuming good-enough scrambling, the attacker cannot predict which other branches a branch is aliased with in the branch predictor, and also cannot get the prediction to branch to the gadget the attacker wants to be speculatively executed.
Concerning observable behaviour, lots of things are theoretically observable (e.g., through power consumption or electromagnetic emissions), and have been used practically on other components (e.g., emissions from CRTs or screen cables to watch what you are watching), but the danger is that you get into a "the sky is falling" mode of thinking that does not differentiate how realistically possible an attack is. I have seen hardware and software providers blamed for waiting for the demonstration of an attack before they start acting, but OTOH, if they acted on any far-fetched theoretical weakness (not even attack scenario) that somebody worries about, they would not have a product.
Posted Apr 2, 2020 9:32 UTC (Thu)
by ras (subscriber, #33059)
[Link] (1 responses)
The more general problem is privilege separation. This is just a case of lower privilege code being able to deduct what higher privilege code left in its cache. I wonder how long it will be before someone figures out how to exploit spectre via a javascript JIT, or wasm. The only solution is hardware protecting the browser memory from the javascript, but the hardware doesn't provide anything lightweight enough. If we did have something lightweight enough micro kernels would be more viable.
Posted Apr 2, 2020 12:04 UTC (Thu)
by anton (subscriber, #25547)
[Link]
If the access becomes non-speculative (the vast majority of cases in most code), the caches will be updated a number of cycles later. The performance impact of that is minimal.
If the access is on a mispredicted path, the caches will not be updated. If the line is then not accessed before it would be evicted, this is a win. If the line is accessed, we incur the cost of another access to an outer level or main memory. A somewhat common case is when you have a short if, and then a load after it has ended; if the if is mispredicted, and the load misses the cache, the speculative execution will first perform an outer access, and then it will be reexecuted on the corrected path and will perform the outer access again; the cost here is the smaller of the cache miss cost and the branch misprediction cost. And the combination is relatively rare.
Overall, not updating caches on speculative memory accesses reduces the usefulness of caches by very little, if at all (there are also cases where it increases the usefulness of caches).
A Spectre-based exploit using JavaScript has been published pretty early. No, MMU-style memory protection is not the only solution, Spectre-safe hardware as outlined above would also be a solution.
Yes. E.g., do not update the caches from speculative loads. Instead, keep the load results in a private buffer and only update the caches if the loads become non-speculative, or when they are committed.
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Avoiding retpolines with static calls
Pretty much all major cloud computing providers assign CPUs exclusively to customers, except maybe for the cheapest offerings (like T2/T3 instances on AWS).
Avoiding retpolines with static calls
Avoiding retpolines with static calls
All major cloud providers also have dedicated instances that won't be shared across customers. In case of AWS you can even run your own T2/T3 instances on top of the dedicated hardware nodes.
It's not clear what attack scenario you have in mind. Can you give an example?
Avoiding retpolines with static calls
Avoiding retpolines with static calls
The question is how much a proper hardware mitigation costs compared to the current mitigations. So what are the costs of not updating caches on speculative memory accesses?
Avoiding retpolines with static calls