LWN: Comments on "Developers split over split-lock detection" https://lwn.net/Articles/806466/ This is a special feed containing comments posted to the individual LWN article titled "Developers split over split-lock detection". en-us Thu, 02 Oct 2025 04:04:27 +0000 Thu, 02 Oct 2025 04:04:27 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Cloud providers -- lead the pack! https://lwn.net/Articles/807831/ https://lwn.net/Articles/807831/ oldtomas <div class="FormattedComment"> Since the most interested parties are the cloud providers... why not let them lead the pack and test this feature until BIOSes and other funny firmware stabilizes?<br> <p> After that, it may be time to declare it the default!<br> <p> IOW: why should everyone else play guinea pigs to the benefit of the cloud providers?<br> </div> Fri, 20 Dec 2019 14:16:33 +0000 Developers split over split-lock detection https://lwn.net/Articles/807240/ https://lwn.net/Articles/807240/ GoodMirek <div class="FormattedComment"> Maybe, but the issue could be caused by a faulty software or a hacked VM. Contract termination seems as a last resort, e.g. in case the customer does not cooperate. However, in the meantime there can be a huge performance issue caused to many others.<br> E.g. if I run 100 faulty worker nodes, spread over 100 random hosts, each host having 48 CPU cores (96 vCPU's due to hyperthreading/SMT), that looks like a potentially significant issue.<br> </div> Fri, 13 Dec 2019 08:47:07 +0000 Developers split over split-lock detection https://lwn.net/Articles/807238/ https://lwn.net/Articles/807238/ bof <div class="FormattedComment"> I'm sure if that ever becomes a thing, there's performance counters to read and recognize split lock membus locking, and automatic termination of contracts afterwards...<br> </div> Fri, 13 Dec 2019 08:26:16 +0000 Developers split over split-lock detection https://lwn.net/Articles/807235/ https://lwn.net/Articles/807235/ GoodMirek <div class="FormattedComment"> It actually sounds like DoSing a cloud provider this way would be an easy task, requiring to continuously create split-locks just on a single thread of a hyperthreaded core. Would that affect just one NUMA node or all of them?<br> </div> Fri, 13 Dec 2019 06:51:09 +0000 Developers split over split-lock detection https://lwn.net/Articles/807189/ https://lwn.net/Articles/807189/ dps <div class="FormattedComment"> A lot of commercial environments are agile but many of them build up vast piles of low priority bugs which never get fixed. In many agile environments is hard to fix code which sorts vast buffers with bubble sort because the code works. Customers can't see the source code and therefore won't be aware of this problem.<br> <p> Tasks like fixing spelling mistakes, especially those in comments, become almost impossible.<br> <p> The only way agile environments will be made to fix split locks is for them to cause something very bad to happen somewhere they care about. A demonstration that an allegedly safe interface to root allowed anybody to read /etc/shadow was required to get time to fix it.<br> <p> As you might expect no customers where told anything and no fix issued. I will neither confirm nor deny any suggestions about the company or product involved.<br> </div> Thu, 12 Dec 2019 16:03:33 +0000 Developers split over split-lock detection https://lwn.net/Articles/806872/ https://lwn.net/Articles/806872/ corbet The signal only affects the process creating the split lock. The slowdown affects the entire system... Mon, 09 Dec 2019 23:50:17 +0000 Developers split over split-lock detection https://lwn.net/Articles/806871/ https://lwn.net/Articles/806871/ nix <div class="FormattedComment"> <font class="QuotedText">&gt; Personally, I think this feature should be enabled by default, since it's a security fix against a local DoS issue, with a compatibility knob for those running systems with trusted but buggy software.</font><br> <p> Isn't a SIGBUS also a local DoS? Rather more of one than a 1000-clock stall?<br> </div> Mon, 09 Dec 2019 23:46:03 +0000 Developers split over split-lock detection https://lwn.net/Articles/806860/ https://lwn.net/Articles/806860/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; The nuances are not lost on Zijlstra; [...] nobody's gonna fix anything until the plug's been pulled.</font><br> <p> ...<br> </div> Mon, 09 Dec 2019 20:27:18 +0000 Developers split over split-lock detection https://lwn.net/Articles/806837/ https://lwn.net/Articles/806837/ naptastic <div class="FormattedComment"> <font class="QuotedText">&gt; The Big Kernel Lock MUST be removed, otherwise everything will be/remain broken</font><br> <p> The nuances are not lost on Zijlstra; they've been covered elsewhere in the thread. His point is correct though: split-locks cause huge latencies, which are only going to get worse, and nobody's gonna fix anything until the plug's been pulled.<br> <p> (ATI's graphics drivers still relied on the BKL despite YEARS of advanced warning that it was going away. When the BKL was fully removed, I couldn't upgrade my kernel; I think I waited 3 kernel releases before saying "screw it" and selling all my ATI/AMD gear in favor of Nvidia. Pull the plug. Break the bad things. It's the only way to make progress happen.)<br> </div> Mon, 09 Dec 2019 16:54:43 +0000 Developers split over split-lock detection https://lwn.net/Articles/806836/ https://lwn.net/Articles/806836/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; The logic behind that statement is: (1) most people use the defaults</font><br> <p> Why write "everyone" if/when you mean "most"? Considering the main question is "how many?", it gives the impression that one hasn't really thought about it.<br> <p> There are cases where the difference between "all" and "most" doesn't matter. But here it does because it takes very few early adopters to find bugs. Even fewer if they're running a data center.<br> <p> <font class="QuotedText">&gt; (2) there's no pressure on fixing the bug if most people aren't affected, [...] not only will broken software not be fixed (there being no pressure to do so), but also there will be pressure to never enable this feature.</font><br> <p> How's any of that worse than not merging the code at all?<br> <p> Is blocking the code itself merely used as leverage in the discussion about the default setting?<br> <p> <p> </div> Mon, 09 Dec 2019 16:34:18 +0000 Developers split over split-lock detection https://lwn.net/Articles/806762/ https://lwn.net/Articles/806762/ smooth1x <div class="FormattedComment"> <p> It would have to be optional as not every environment will be capabale of being 100% clean.<br> <p> I would expect distributions to either have this turned off on Desktop installs or have their installer have a blacklist for firmware versions.<br> <p> However for commercial environments would they not test first? <br> <p> Surely all commercial environments are agile pipeline driven environments who break early and can fix things instantly? :-&gt;&gt;<br> </div> Mon, 09 Dec 2019 12:07:54 +0000 Developers split over split-lock detection https://lwn.net/Articles/806760/ https://lwn.net/Articles/806760/ cesarb <div class="FormattedComment"> The logic behind that statement is: (1) most people use the defaults; (2) there's no pressure on fixing the bug if most people aren't affected, especially if those few people affected are using a non-default setting which can be turned off without visibly breaking anything else.<br> <p> That is, in the scenario where this feature is default-enabled, broken software will end up being fixed; in the scenario where this feature is default-disabled, not only will broken software not be fixed (there being no pressure to do so), but also there will be pressure to never enable this feature.<br> <p> Personally, I think this feature should be enabled by default, since it's a security fix against a local DoS issue, with a compatibility knob for those running systems with trusted but buggy software.<br> </div> Mon, 09 Dec 2019 12:07:14 +0000 Developers split over split-lock detection https://lwn.net/Articles/806759/ https://lwn.net/Articles/806759/ xophos <div class="FormattedComment"> Maybe they have enough data from previous similar issues, to be certain.<br> Or this is a simple hyperbole to make a point.<br> </div> Mon, 09 Dec 2019 11:43:50 +0000 Developers split over split-lock detection https://lwn.net/Articles/806758/ https://lwn.net/Articles/806758/ pbonzini <div class="FormattedComment"> Truly ancient stuff is not going to use locked instructions at all (except perhaps XCHG which got an automatic LOCK prefix with the 386, but even that is quite unlikely)<br> </div> Mon, 09 Dec 2019 11:37:17 +0000 Developers split over split-lock detection https://lwn.net/Articles/806742/ https://lwn.net/Articles/806742/ hmh <div class="FormattedComment"> I foresee it getting reverted about one week after it gets exposed to a wide range of users, unless it is optional. It is almost certain that enough embedded crapware (aka firmware) and x86-only commercial applications out there will trigger split locking.<br> <p> Not to mention truly ancient stuff that runs under DOSEMU and friends...<br> <p> So, I bet one will be able to disable this "feature" in its final form during boot, if not at any time...<br> </div> Mon, 09 Dec 2019 03:20:31 +0000 Developers split over split-lock detection https://lwn.net/Articles/806738/ https://lwn.net/Articles/806738/ Tov <div class="FormattedComment"> So it might be a nice feature to be able to enable specifically on cloud servers. However, I am sure all desktop/laptop users with less than optimal firmware implementations and little chance of getting their firmware fixed will be thrilled by the thought of kernel panics starting to appear out of nowhere.<br> <p> Furthermore people will have little sympathy of their trusty old applications suddenly being killed with SIGBUS errors due to some new standards of performance and "correctness".<br> <p> Hopefully I am misunderstanding how this is supposed to work...<br> </div> Mon, 09 Dec 2019 00:31:48 +0000 Developers split over split-lock detection https://lwn.net/Articles/806720/ https://lwn.net/Articles/806720/ tux3 <div class="FormattedComment"> I think I read cloud providers are the main force pushing for this. If it only takes a single thread busy-looping on a split-lock to stall the other 0xXX cores of your expensive hardware, you can suddenly annoy a whole lot of people for not a whole lot of money. Talk about noisy neighbors.<br> </div> Sun, 08 Dec 2019 13:42:50 +0000 Developers split over split-lock detection https://lwn.net/Articles/806712/ https://lwn.net/Articles/806712/ flussence <div class="FormattedComment"> This seems like an extreme overkill reaction to something that, as described, doesn't sound much worse than x86 BIOS/UEFI crapware stalling ring 0 and above at bad times. Which we've dealt with for a *long* time.<br> <p> So maybe there's something else going on here that Intel's not telling us.<br> </div> Sun, 08 Dec 2019 05:20:25 +0000 Developers split over split-lock detection https://lwn.net/Articles/806707/ https://lwn.net/Articles/806707/ marcH <div class="FormattedComment"> <font class="QuotedText">&gt; This feature MUST be default enabled, otherwise everything will be/remain broken</font><br> <p> There are two statements too simplistic to be true in this single sentence:<br> <p> - "Everything" implies the subset of users who would explicitly turn the feature on is exactly zero.<br> - "Remain" implies the default setting can never be changed in the future after the feature is merged and - for instance - some transition period.<br> <p> The ability to predict the future is impressive, the ability to predict it with so much precision is even more.<br> <p> Nuances maybe?<br> </div> Sun, 08 Dec 2019 01:53:02 +0000