Better CPU vulnerability mitigation configuration
Modern CPUs all have multiple hardware vulnerabilities that the kernel needs to mitigate; the 6.13 kernel has workarounds for 14 security-sensitive CPU bugs just on x86_64. Several of those have multiple variants, or multiple mitigations that apply on different microarchitectures. There are different kernel command-line options for each of these mitigations, which leads to a confusing situation for users trying to figure out how to configure their systems. David Kaplan recently posted a patch set that adds a single, unified command-line option for controlling mitigations and simplifies the logic for detecting, configuring, and applying them as well. If it is merged, the patch set could make it much easier for users to navigate the complicated web of CPU vulnerabilities and their mitigations.
The kernel provides information about the vulnerabilities it's aware of via sysfs. The /sys/devices/system/cpu/vulnerabilities/ directory contains a file for each known class of attack, with information about whether the system is vulnerable, and which mitigations have been enabled. Some of these are easy to interpret, and merely state whether a mitigation is turned on or off. Others are somewhat cryptic. For example, my x86_64 machine has the following to say about Spectre v2:
Mitigation: Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; \ PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S
This line means that Enhanced Indirect Branch Restricted Speculation (Enhanced IBRS), Indirect Branch Predictor Barrier (IBPB), Return Stack Buffer (RSB) filling, Post-barrier Return Stack Buffer Enhanced IBRS (PBRSB-eIBRS), and Branch History Injection (BHI) are all enabled by the kernel. Enhanced IBRS is a technique that prevents the CPU from speculating through indirect branches. IBPB introduces a barrier to prevent code from before the barrier from influencing branch predictions after the barrier. RSB filling makes sure that any return addresses the CPU might speculate about when encountering a return instruction are overwritten on context switch. PBRSB-eIBRS extends that protection to cover exits from virtual machines on some CPUs. Finally, BHI protects against code that manipulates the branch predictor in order to speculate through a branch. Theoretically, all of those mitigations combined should render my computer mostly immune to Spectre-based attacks.
Each of those mitigations is controlled with one of the spectre_v2=, spectre_v2_user=, and spectre_bhi= kernel command-line options. The other mitigations have their own idiosyncratically named options. The kernel's documentation on hardware vulnerabilities describes the various options, but knowing what the best configuration for a given system is requires knowledge about how the different vulnerabilities can be abused. In practice, most users probably leave the settings on their defaults, which is good for security but comes with a noticeable performance penalty.
Kaplan's patch restructures the kernel's mitigation-configuration logic to depend on a single command-line option, mitigations=. That option already exists, but can currently only be set to "off" or "auto". Kaplan's patch extends it so that the user can append a semicolon and a comma-separated list of specific attack vectors not to bother mitigating:
- no_user_kernel — don't mitigate attacks from malicious user-space programs attempting to leak kernel data.
- no_user_user — don't mitigate attacks from one user-space program on another.
- no_guest_host — don't mitigate attacks from a malicious virtual machine trying to exfiltrate data from its hypervisor.
- no_guest_guest — don't mitigate attacks from one virtual machine on another.
- no_cross_thread — don't mitigate attacks from a program or virtual machine running on one simultaneous multithreading (SMT) core against code running on its sibling core.
The default is to mitigate all of these attack vectors, so there's no separate setting for "on". The exception is cross-thread attacks, which can generally only be completely prevented by disabling simultaneous multithreading. That's usually not desirable because of the large performance impact, so users need to specify "auto,nosmt" instead of just "auto" in order to get that behavior. Generally, leaving all of the mitigations enabled is a safe choice. But for systems that won't run virtual machines, for example, adding mitigations=auto;no_guest_host,no_guest_guest to the kernel command line could win back some lost performance. In Kaplan's cover letter, he justifies switching to this set of command-line options:
While many users may not be intimately familiar with the details of these CPU vulnerabilities, they are likely better able to understand the intended usage of their system. As a result, unneeded mitigations may be disabled, allowing users to recoup more performance.
When the user specifies one of the new attack-vector options, the kernel uses its knowledge about which bugs exist in the computer's CPU to turn on the best mitigations for that case. For now, they're only effective on x86 (and x86_64), pending input from the maintainers for other architectures about which mitigations should be associated with each option. The existing mitigation-specific command-line options will continue to function as normal, although they will probably see a good deal less use going forward.
This particular design is the result of an extended discussion on the previous version of Kaplan's patch set. That version had five different new command-line options, one for each potential attack vector. Reviewers found that design unnecessarily complicated, though, and eventually agreed on the simpler interface described above. Pawan Gupta, Josh Poimboeuf, and Kaplan all suggested various ideas, but Borislav Petkov made the suggestion that ended up being closest to what Kaplan adopted in the new version of the patch set.
The reviewers were not completely convinced by Kaplan's taxonomy of potential attacks, although they seemed to generally agree that it was better than the confusing status quo. Poimboeuf thought that the cross-thread group, in particular, didn't make much sense to separate from the others. In a cross-thread attack, the code that is running on the attacking SMT core (also called a hardware thread) must be either a user-space program or a virtual machine; therefore, Poimboeuf argued, any cross-thread attack should already fall under one of the other four categories that Kaplan set out.
Kaplan didn't disagree, although he had noted earlier in the discussion that disabling SMT has to be treated specially. Still, even with the complication around SMT, Kaplan's command-line options are a good deal less complicated than the existing set.
The internal logic for selecting mitigations — which also received substantial attention from reviewers, although only in the form of code style comments — has also been cleaned up. Previously, the code was somewhat labyrinthine. With Kaplan's changes, the functionality should be the same, but the implementation has been split up into a "select", "update", and "apply" function for each mitigation the kernel knows about. The select function determines whether the mitigation ought to be enabled, the update function allows the specific configuration for the mitigation to be changed depending on which other mitigations have been selected, and the apply function does the actual job of applying the mitigation to the kernel.
The most recent version of Kaplan's patch set was posted on March 10, and therefore hasn't had much time to attract any additional discussion. Given the positive reception of the previous version, it seems quite possible that these new command-line options will be available for x86 and x86_64 soon, hopefully with other architectures to follow.
Posted Mar 19, 2025 15:59 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (15 responses)
Posted Mar 19, 2025 16:47 UTC (Wed)
by fraetor (subscriber, #161147)
[Link] (14 responses)
Posted Mar 19, 2025 17:19 UTC (Wed)
by DemiMarie (subscriber, #164188)
[Link] (8 responses)
The hardware could obviate the need for software mitigations by implementing speculative taint tracking, which ensures that speculative execution does not create any new side channels (other than power consumption) that would not otherwise be present. This costs about 15% or so.
The only way that getting rid of speculative execution could be remotely feasible is if the die area saved was used to vastly increase the core count and every program whose performance mattered was able to use all of the extra cores.
Posted Mar 19, 2025 19:36 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (7 responses)
...which, for those of you playing at home, is basically impossible in practice. There are real-world algorithms that are "embarrassingly parallel," meaning that you can basically just throw more cores at them indefinitely, but they're fairly uncommon in practice. Most algorithms have some portion that can parallelize well, and some portion that has to be serial, and most problems do not admit a 100% parallel solution (you can't begin executing JavaScript until the page's DOM has been constructed, you can't construct the DOM until you finish parsing the HTML, etc., and even if we assume idealized perfect lazy evaluation, things can still block on each other when specific data is actually required). Worse, due to economies of scale, the (useful) algorithms that can benefit from massive parallelism are already offloaded to huge data centers and the like, so the problems that you're going to solve on one device at a time are mostly of the at-least-somewhat-serial variety (or else they are parallelizable, but too small to matter).
Posted Mar 19, 2025 20:16 UTC (Wed)
by excors (subscriber, #95769)
[Link] (4 responses)
Not disagreeing with your general point, but this specifically is inaccurate: you *must* interleave JS execution, DOM construction and HTML parsing. Scripts must be executed as soon as the parser sees the "</script>", because the script can use document.write() to insert new characters immediately after the ">" - e.g.:
<script>document.write("<h")</script>1>test
will end up producing an `h1` element. And the script can read, and modify, the DOM of everything that has been parsed before it (including any complete elements produced by document.write() during the script's execution).
That makes it even harder to implement with multithreading, because there's so much synchronisation between all the components. It would be nice if it was a sequential pipeline like you suggest, but it's far messier.
Despite that, browsers still do speculative parsing and speculative DOM construction in background threads - they keep parsing while downloading and executing scripts, so they can find more resources to preload and hopefully the scripts won't do anything funny that requires a rollback of the DOM changes. (https://hacks.mozilla.org/2017/09/building-the-dom-faster...)
Posted Mar 20, 2025 12:32 UTC (Thu)
by Baughn (subscriber, #124425)
[Link] (3 responses)
Posted Mar 20, 2025 12:36 UTC (Thu)
by intelfx (subscriber, #130118)
[Link]
Posted Mar 20, 2025 13:51 UTC (Thu)
by excors (subscriber, #95769)
[Link]
This is the simplified version. It used to be that every web browser had its own unique approach, based on some combination of reverse-engineering other browsers, reverse-engineering web pages that depended on the behaviour of other browsers, and just making it up as they went along. Sometimes their behaviour would depend on TCP packet boundaries. Sometimes they'd crash. None of it was documented.
Now there are standards that document it all in great detail, very carefully designed and tested to avoid breaking compatibility with billions of old web pages, and browsers have converged on those standards, so there's only one kind of bonkers behaviour instead of many.
If you're writing web pages you can avoid a lot of the complexity and performance issues by avoiding document.write(), using <script defer>, etc. But browsers can't avoid it, because the quickest way to lose users is to be incompatible with one web page that is important them. Browsers, and CPU manufacturers, also need to compete on performance while supporting these features that were designed a decade before the first dual-core desktop CPU, so it's really hard to avoid being bottlenecked by single-thread CPU performance.
Posted Mar 20, 2025 14:05 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
That's why the general technique browsers use to handle this is to speculatively assume that the bonkers thing doesn't actually happen, and start again but using the slow interleaved serial route if they observe the bonkers thing. This puts pressure on the wider ecosystem to allow you to run things in parallel, since while you will work with the bonkers thing (you have to!), performance is much better if you stick to sanity. And if the browser has good tools for making your sites perform better, those tools will clearly flag up that you've done something bonkers that forces the browser to abandon the fast path and restart on the slow path.
The net effect is that bonkers stuff still works (even if the original author is long gone), so you can still look at a monstrosity from 1997 in your current browser and have it work, but most sites will go towards sane over time because sane is faster.
Similar applies to CPUs in some senses, too - it is reasonable for a CPU to slow down if you do something that's technically allowed but difficult to implement in a modern design, but not reasonable to break backwards compatibility just because it's hard to implement in a high performance fashion. After all, if the code ran "fast enough" on an 80386 without cache at 16 MHz, then it'll run "fast enough" on a modern PC, too, even if it's forcing the CPU to behave like a 100 MHz CPU, not a 3 GHz CPU.
Posted Mar 20, 2025 10:23 UTC (Thu)
by paulj (subscriber, #341)
[Link] (1 responses)
Posted Mar 21, 2025 18:46 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link]
> and every program whose performance mattered was able to use all of the extra cores.
Posted Mar 19, 2025 17:34 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Mar 19, 2025 17:45 UTC (Wed)
by fraetor (subscriber, #161147)
[Link] (2 responses)
Posted Mar 19, 2025 18:00 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Mar 21, 2025 12:45 UTC (Fri)
by amw (subscriber, #29081)
[Link]
The amazon link on that page didn't work for me though. Here's one that does: https://a.co/d/3r94Xq4
Posted Mar 21, 2025 22:56 UTC (Fri)
by anton (subscriber, #25547)
[Link]
An interesting pairing is Bonnell and Silvermont. Bonnell is the microarchitecture of the first Intel Atom and is in-order; Silvermont is its successor; it is also two-wide, but uses out-of-order execution. Silvermont has an advantage of about 1.5 in IPC on the median of these benchmarks, and another factor 1.5 in clock rate.
Concerning the cost of the mitigations, I have seen a factor of 2-9.5 slowdown from using retpolines on Gforth, and that mitigates only Spectre v2 (but Gforth is unusually heavy on indirect branches). I have read about slowdowns by a factor of more than 2 by various "speculative load hardening" mitigations, and that mitigates only against Spectre v1. The Linux kernel developers try to keep the costs in check by being smart about where to apply the mitigations, but that approach has huge development costs and they just have to err once on the wrong side of that edge, and the kernel can be attacked through these CPU vulnerabilities.
The better approach is to design hardware without these vulnerabilities. The "invisible speculation" approach looks most promising to me, because it costs little performance (the papers give numbers between a 20% slowdown and a small speedup, depending on the implementation variant). There are various research papers on that, and I have dabbled in the area with an overview paper.
Posted Mar 19, 2025 19:13 UTC (Wed)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Mar 20, 2025 12:45 UTC (Thu)
by draco (subscriber, #1792)
[Link]
That can't be detected automatically.
Posted Mar 21, 2025 7:42 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
> But for systems that won't run virtual machines, for example,...
That assumption can be misleading. There are a lot of build processes that use containers and or VMs for pure system configuration convenience with zero actual security boundary between the host and the guest; they're both in the same security domain. Containers and VMs are not just for security.
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
Because of backwards compatibility (you can't be sure that no web page anywhere does something bonkers), you have to be able to fall back to the interleaved serial execution model at any time.
Speculatively assuming sanity
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
What about non-reclaimable performance losses?
Performance analysis book
This chart shows the instructions per cycle of a number of microarchitectures on a number of benchmarks on a version of Gforth. The dashed lines are for in-order microarchitectures that do not speculate. The full lines are for out-of-order microarchitectures with speculative execution. Looking at the median, the best in-order microarchitecture has an IPC of about 0.9, while the best OoO microarchitecture has an IPC of almost 4. Plus, the best OoO microarchitecture is available at about 2.5 times the clock rate of the best in-order one, resulting in a total speed difference by a factor of 10.
What about non-reclaimable performance losses?
Guest-Host, Guest-Guest
Guest-Host, Guest-Guest
Single user = don't care?