The "Retbleed" speculative execution vulnerabilities
Kernel and hypervisor developers have developed mitigations in coordination with Intel and AMD. Mitigating Retbleed in the Linux kernel required a substantial effort, involving changes to 68 files, 1783 new lines and 387 removed lines. Our performance evaluation shows that mitigating Retbleed has unfortunately turned out to be expensive: we have measured between 14% and 39% overhead with the AMD and Intel patches respectively.
Those mitigations were pulled into the mainline
kernel today. They are
not in the July 12 stable kernel
updates but will almost certainly show up in those channels soon.
Posted Jul 12, 2022 17:29 UTC (Tue)
by tome (subscriber, #3171)
[Link] (3 responses)
Ouch. It especially sucks to run Intel processors now.
Posted Jul 12, 2022 23:01 UTC (Tue)
by fulke (guest, #140430)
[Link]
Posted Jul 13, 2022 16:09 UTC (Wed)
by developer122 (guest, #152928)
[Link] (1 responses)
I remember in 2018 (in the original round of patches) skylake would fall back to the branch target buffer when the return stack buffer underflowed. Skylake was forced to use IBRS instead of retpolines for a while until the return-stack stuffing sequence was perfected.
So, do any of the pre-zen architectures speculate on returns the same way?
Posted Jul 14, 2022 18:39 UTC (Thu)
by peri (guest, #159703)
[Link]
Posted Jul 12, 2022 18:38 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link]
See you all in a few months or so, when we get to reset the counter again.
Posted Jul 12, 2022 18:39 UTC (Tue)
by NHO (subscriber, #104320)
[Link] (1 responses)
Posted Jul 12, 2022 18:46 UTC (Tue)
by JoeBuck (subscriber, #2330)
[Link]
Posted Jul 12, 2022 19:46 UTC (Tue)
by iabervon (subscriber, #722)
[Link] (2 responses)
Posted Jul 13, 2022 16:04 UTC (Wed)
by developer122 (guest, #152928)
[Link] (1 responses)
Posted Jul 13, 2022 18:02 UTC (Wed)
by iabervon (subscriber, #722)
[Link]
Posted Jul 12, 2022 20:16 UTC (Tue)
by flussence (guest, #85566)
[Link] (5 responses)
Posted Jul 12, 2022 23:23 UTC (Tue)
by nix (subscriber, #2304)
[Link] (4 responses)
Posted Jul 13, 2022 22:30 UTC (Wed)
by flussence (guest, #85566)
[Link] (3 responses)
Posted Jul 14, 2022 4:05 UTC (Thu)
by scientes (guest, #83068)
[Link] (2 responses)
Posted Jul 16, 2022 6:56 UTC (Sat)
by flussence (guest, #85566)
[Link] (1 responses)
Later Atoms are more or less Celerons, they changed focus to network appliances after the initial goal of "sandbag ARM out of the mainstream market" failed.
Posted Jul 19, 2022 17:02 UTC (Tue)
by anton (subscriber, #25547)
[Link]
Posted Jul 12, 2022 21:13 UTC (Tue)
by yodermk (subscriber, #3803)
[Link] (2 responses)
If the former, there's going to be a lot of mitigation disabling. I can't really see paying that price on a single-user computer where, if someone else got access, I'd be screwed anyway.
Posted Jul 12, 2022 21:30 UTC (Tue)
by deepfire (guest, #26138)
[Link] (1 responses)
Are we being collectively marched down a death valley?
--
Posted Jul 13, 2022 6:46 UTC (Wed)
by davmac (guest, #114522)
[Link]
Posted Jul 12, 2022 22:14 UTC (Tue)
by ndesaulniers (subscriber, #110768)
[Link]
Posted Jul 12, 2022 22:24 UTC (Tue)
by Sesse (subscriber, #53779)
[Link] (4 responses)
It seems Zen 3 is unaffected, perhaps?
Posted Jul 12, 2022 22:44 UTC (Tue)
by Smon (guest, #104795)
[Link] (2 responses)
Posted Jul 13, 2022 0:05 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link] (1 responses)
Posted Jul 13, 2022 5:19 UTC (Wed)
by jukivili (subscriber, #60126)
[Link]
Posted Jul 13, 2022 6:24 UTC (Wed)
by Otus (subscriber, #67685)
[Link]
Posted Jul 12, 2022 22:29 UTC (Tue)
by jreiser (subscriber, #11027)
[Link]
Posted Jul 12, 2022 23:03 UTC (Tue)
by Subsentient (guest, #142918)
[Link] (2 responses)
If this one doesn't affect my ancient Intel CPUs like Core 2 Quad and Nehalem i5 in my Thinkpad T410, that will be something.
I almost wish they didn't find these vulnerabilities at all. The cost is proving to be quite high. Maybe ignorance is bliss here? I guess if they didn't disclose them, three letter agencies would be using them instead.
Posted Jul 13, 2022 12:46 UTC (Wed)
by eru (subscriber, #2753)
[Link] (1 responses)
Posted Jul 19, 2022 17:10 UTC (Tue)
by anton (subscriber, #25547)
[Link]
The CPU architects know very well how to implement speculation properly: they do it for architectural state; if a speculative register change or memory write turns out to be wrong, it is never committed to permanent state.
For microarchitectural state (e.g., caches) they thought (and maybe still do) that they don't need to go to the lengths that they do for architectural state and can, e.g., permanently load a speculatively loaded cache line into the L1 cache, thus providing a side channel to attackers.
Posted Jul 13, 2022 4:33 UTC (Wed)
by amarao (guest, #87073)
[Link] (9 responses)
Posted Jul 13, 2022 5:07 UTC (Wed)
by epa (subscriber, #39769)
[Link] (8 responses)
Posted Jul 13, 2022 9:24 UTC (Wed)
by amarao (guest, #87073)
[Link] (7 responses)
39% is extremely high number to loose. In reverse, not having this 'fix' is having 2.5 acceleration. This is number I got by switching 10y.o. Core 2 Duo to a shiny new Ryzen CPU, and is really hurtful.
Posted Jul 13, 2022 13:00 UTC (Wed)
by birdie (guest, #114905)
[Link] (2 responses)
It doesn't mean it doesn't exist though but hackers don't seem to be interested. Application/system level vulnerabilities are easier to detect and exploit.
Lastly, all the Transient execution CPU vulnerabilities require you to be able to run arbitrary code on a remote system which means you need a way to exploit it first.
All these vulnerabilities are primarily the headache for cloud providers where they share the same CPU/RAM among a large number of clients - those guys need to enable all the mitigations or risk clients sniffing each others VMs secrets and data.
Home/Enterprise users? They may as well run with `mitigations=off`.
Posted Jul 13, 2022 15:59 UTC (Wed)
by pclouds (guest, #76590)
[Link] (1 responses)
Isn't that door wide open (ok not "wide") with JIT in browser already? I'm aware there was some mitigation to prevent javascript from exploit Spectre the last time, but would that still work with Retbleed?
Posted Jul 14, 2022 9:27 UTC (Thu)
by roc (subscriber, #30627)
[Link]
An uncompromised browser rendering process running hostile JS would find it very difficult, perhaps impossible, to exploit this Retbleed vulnerability. For example, non-buggy JS engine is not going to try to branch into kernel space so you won't be able to poison BTBs that way.
OTOH compromising a browser rendering process (e.g. by exploiting a bug in the JS engine) and then sniffing kernel memory from inside the browser renderer sandbox is a definite possibility.
Posted Jul 13, 2022 14:23 UTC (Wed)
by birdie (guest, #114905)
[Link] (1 responses)
And please don't BS me with "We are a small website, etc. etc. etc.":
1. You have a very small number of people commenting on your articles
Posted Jul 13, 2022 14:24 UTC (Wed)
by corbet (editor, #1)
[Link]
Anybody who reads LWN for any period of time knows that we are not inclined
to over-moderate; indeed, we are frequently criticized for not moderating
enough. I do not think that we are overstepping here.
Posted Jul 13, 2022 15:11 UTC (Wed)
by Paf (subscriber, #91811)
[Link] (1 responses)
I understand leaving the mitigations off but this isn’t some crazy “we used mic power levels” thing.
Posted Jul 14, 2022 8:51 UTC (Thu)
by amarao (guest, #87073)
[Link]
The main difference between demo and exploit is 'reproducibility'. Exploit shows that it can, indeed, on a given (arbitrary or almost arbitrary) system do harm. And this is what I (as operator/sysadmin/devops/sre/younameit) care. If some research may set up an amazing quantum entanglement system which steals bytes with spooky action in a distance, that's cool, but to mitigate it I want to see it doing this on my (sacrificial) server with a usual workload.
If network interrupts are coming, and few system processes active, is it still viable, or it required pristine conditions to run? If it's a threat for virtualization, what happens if we have few guests with active network interfaces, doing usual things (like ignoring ARP/BUM junk on the net)?
If their exploit can work through this, this is a big story. If not, well, it's a nice paper for future research to reference to, nothing more.
Posted Jul 13, 2022 16:35 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (5 responses)
And when the workload is critical, just don't share your resources.
You simply can't have it both fast and safe with untrusted users. Everything is observable. We're still sharing resources and as soon as you suffer from someone else's activity, in a way or another you observe it. Sometimes it's so much finegrained that you can observe in amazing details.
I'm wondering what the TOP500 of HPC machines would look like if they enabled all those ridiculous mitigations! In addition they're a pain for software engineers who constantly scratch their heads trying to do better for certain use cases, and once used differently that's a total disaster and mitigations are complicated.
Posted Jul 13, 2022 22:47 UTC (Wed)
by JoeBuck (subscriber, #2330)
[Link] (1 responses)
Posted Jul 14, 2022 9:29 UTC (Thu)
by roc (subscriber, #30627)
[Link]
It's also pretty easy to see which AWS instances guarantee you an entire socket --- it's the ones where AWS lets you use the PMU. They don't want you using the PMU to sniff other customers on the same socket via side channels. However, I don't think AWS *guarantees* that no-one else is on the socket.
Posted Jul 14, 2022 15:19 UTC (Thu)
by dmoulding (subscriber, #95171)
[Link]
Posted Jul 14, 2022 18:39 UTC (Thu)
by peri (guest, #159703)
[Link]
Posted Jul 15, 2022 12:36 UTC (Fri)
by eduperez (guest, #11232)
[Link]
Posted Jul 18, 2022 15:51 UTC (Mon)
by paulj (subscriber, #341)
[Link] (8 responses)
The CPU design people who, in the 90s, argued to move away from ever more complex, OOO, super-scalar architectures might have had a point? They argued for CPU designs moving to simple, in-order cores, and fast switching between different (and parallel running) execution contexts.
Is that what basically what people have got - performance wise - when they're running a recent Intel or AMD CPU with 16+ cores, with most of the speculative logic disabled? :)
Posted Jul 19, 2022 16:53 UTC (Tue)
by anton (subscriber, #25547)
[Link] (7 responses)
Unfortunately, we have not seen a hardware fix for Spectre yet. I think that a good fix would not cost much silicon nor much performance. And no, disabling speculation is not a good fix.
I don't think anyone runs these CPUs "with most of the speculative logic disabled". But given that we don't see good hardware fixes, it seems to me that the hardware guys think they can throw this stuff over the wall to the software people who (they think) shall employ mitigations that include eliminating branch prediction (and thus speculation), e.g. retpolines.
Posted Jul 19, 2022 18:49 UTC (Tue)
by atnot (guest, #124910)
[Link] (6 responses)
It works the other way around too. Here goes: "Software people demand that hardware people make them faster and faster processors without changing the way it is programmed, that they should just go and apply more architectural optimizations. The whole reason all of this is done, why x86 is at this point a declarative language for describing how to distribute parallel compute tasks over hardware resources using an abstract control flow graph and dataflow labels (aka "registers") is to keep the fiction that computers haven't changed since the 80s alive to programmers.
Hardware people offered solutions long ago: Itanium had explicit speculation instructions. They were perfectly aware of the troubles the current direction would bring. But software people rejected it in favor of a 64bit pdp11 because it didn't look familiar enough to them and their PDP11 languages, then made fun of them. To the point that nobody has dared publish serious research on novel CPU architectures since around 2010."
This might not be completely fair, but it's not wrong either. Software developers are at least as culpable for the current situation as hardware vendors are. There's barriers yes, but they need to be broken in both directions.
Posted Jul 19, 2022 20:33 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
No. People rejected Itanic because it just Did Not Work. It's not possible to statically predict how the program will run, because a lot of timings are dependent on inputs. Even when caches are not in play, a good old divide instruction can take anywhere from 1 to 10 cycles to complete.
Posted Jul 19, 2022 20:52 UTC (Tue)
by deater (subscriber, #11746)
[Link] (3 responses)
You should look into the later itanium, such as Poulson, which had speculation and out-of-order in an attempt to catch up.
Also look into modern x86 systems where many of the divide instructions have a constant latency.
Posted Jul 19, 2022 22:55 UTC (Tue)
by atnot (guest, #124910)
[Link] (2 responses)
Absolutely, targeting C-like languages at VLIW is very difficult and requires advanced scheduling which was not really available at the time. This was a huge factor. It fared much better with GPUs which were targeted with more easily parallelizable languages. Even those would move away eventually though, coincidentally around the time CUDA and GPGPU came about.
Itanium was definitely far from perfect. The initial implementation was terrible and the decision to encode many implementation details of the first CPUs directly into the ISA was a mistake they quickly recognized. But so was x86, we've just gotten used to it. Certainly, today's 12-wide CPUs would have a lot easier time emulating a mediocre 2000s explicitly parallel VLIW CPU than a mediocre 80s microprocessor. Even with it's flaws, Itanium is still significantly less far off from what a modern CPU actually looks like.
Posted Jul 20, 2022 10:24 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
The other issue with advanced scheduling is that an out-of-order execution design also benefits from a well-scheduled program. An out-of-order processor has a limited instruction window within which it can reschedule dynamically, and a well-scheduled program is set up so that all the rescheduling that can be done in that window is a consequence of the data the program is processing.
GPUs are a different case because they're designed for the world where single threaded performance is not particularly interesting - as long as all threads complete their work in a millisecond or so, we don't care how long each individual thread took. It's thus possible to avoid OoOE in favour of having more threads available to hardware, and better hardware for switching between threads when one thread gets blocked. In contrast, the whole point of CPUs in a modern system (with GPUs as well as CPUs) is to deal with the code where the time for one thread to complete its work sets the time for the whole operation.
I suspect that, for the subset of compute where the performance of a single thread is the most important factor, an out-of-order CPU is the best possible option. The wide-open question is whether we can design an ISA that allows us to avoid unwanted speculation completely; Itanium had that, because it was designed around making all the possible parallelism explicit, but Itanium wasn't a good ISA for out-of-order execution, and had low instruction density.
The other issue that Itanium's explicit speculation didn't account for is that we're starting to see uses of value prediction, not just memory access prediction; do we want to be explicit about all the possible speculative paths (e.g. "you can speculate that the value in r2 is less than the value in r3", or "you can speculate if you believe that r2 is between -16 and +96"), or do we instead want to find a good way to block speculation completely where it's potentially dangerous?
Posted Jul 20, 2022 18:56 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Targeting ANY languages with VLIW is difficult. The fundamental issue is that scheduling depends on input data, and no language can change that.
> Even those would move away eventually though, coincidentally around the time CUDA and GPGPU came about.
Yup. It's just not efficient to use VLIW for anything, even when OOO is not needed.
Posted Jul 20, 2022 6:52 UTC (Wed)
by anton (subscriber, #25547)
[Link]
Concerning IA-64 (later renamed into Itanium processor family), that's probably not what paulj meant, because Itanium is not simple, and switching between parallel running execution contexts was not envisioned for it in the 1990s (although it was implemented around 2010 or so). Poulson is not OoO AFAIK.
As for implementing ("emulating") IA-64 with the techniques for today's OoO hardware (the widest of which is 8-wide AFAIK), I doubt that that would be easier than implementing AMD64, Aarch64, or RISC-V; I don't see anything in IA-64 that helps the implementation significantly, and one would have to implement all the special features like the ALAT that are better handled by the dynamic alias predictor in modern CPUs; likewise, one would have to implement the compiler speculation support (based on imprecise static branch prediction) and (for performance) still have to implement the much better dynamic branch prediction and hardware speculation.
Single-issue in-order (what you call a PDP11) turns out to be a very good software-hardware interface (i.e., an architectural principle), even if the implementation is pretty far from what a straightforward implementation looks like.
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
+ retbleed= [X86] Control mitigation of RETBleed (Arbitrary
+ Speculative Code Execution with Return Instructions)
+ vulnerability.
+
+ off - no mitigation
+ auto - automatically select a migitation
+ auto,nosmt - automatically select a mitigation,
+ disabling SMT if necessary for
+ the full mitigation (only on Zen1
+ and older without STIBP).
+ ibpb - mitigate short speculation windows on
+ basic block boundaries too. Safe, highest
+ perf impact.
+ unret - force enable untrained return thunks,
+ only effective on AMD f15h-f17h
+ based systems.
+ unret,nosmt - like unret, will disable SMT when STIBP
+ is not available.
+
+ Selecting 'auto' will choose a mitigation method at run
+ time according to the CPU.
+
+ Not specifying this option is equivalent to retbleed=auto.
+
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
Bonnell (first-generation Atom) is a two-wide in-order CPU, like the P5 (first Pentium). But Bonnell has 16-19 pipeline stages (according to wikichip), while P5 has 5. Silvermont (second-generation Atom) is a two-wide OoO CPU. Celeron is a marketing name used for many different microarchitectures; some Silvermont chips were sold as Celerons, but I guess you mean something else.
The "Retbleed" speculative execution vulnerabilities
performance
performance
*: substitute the average user here.
performance
The "Retbleed" speculative execution vulnerabilities
Spoiler: Capt Mifune doesn't survive that scene.
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
Sometimes I see the sequence REP; RET advocated as a technique to speed execution on x86 because REP stops instruction prefetch. Does this sequence have any interaction with Retbleed?
"Retbleed" even with REP prefix?
Fuck.
Retbleed does, apparently, hit one of my personal AMD boxes with a Ryzen 3 2200G.
Fork
Also, could these workarounds be a fool's errand, because the vulnerabilities drop out of how speculative execution works by design, so they will always be present in some form if the CPU speculates?
No, these vulnerabilities come from treating microarchitectural state as being irrelevant.
Fork
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
2. I've not seen any SPAM attacks on you ever
3. I single-handedly moderate a website where hundreds of people leave comments - I've never enabled comments premoderation.
In response to your complaints:
Moderation
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
- the fast ones to do fast stuff with trusted users (development, games, appliances, etc)
- the slow ones to do slow stuff with tons of untrusted users (including VMs, clouds etc)
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
Who are those people that you mean? The only thing that comes to my mind is Niagara (UltraSPARC T1), but that came out in 2005, and maybe Bonnell (first-Generation Atom), but that came out in 2008. The end of both stories is that they switched to OoO (SPARC T4 in 2011 and Intel Silvermont in 2013).
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
The "Retbleed" speculative execution vulnerabilities
Software people demand that hardware people make them faster and faster processors without changing the way it is programmed
Not sure about "demand", but that's the way it works in those areas affected by the software crisis (with the classic criterion being that software costs more than hardware), i.e., pretty much everything but supercomputers and tiny embedded controllers. It would be just too costly to rewrite all software for something like the TILE or (more extreme) Greenarrays hardware, especially in a way that performs at least as well as on mainstream hardware plus Spectre fixes.
