The search for the correct amount of split-lock misery

By Jonathan Corbet
October 19, 2022

Unlike many other architectures, x86 systems support atomic operations that affect more than one cache line. This support comes at a cost, though, in terms of overall system performance and, even, security. Over the last few years, kernel developers have worked to discourage the use of this sort of "split-lock" operation. Now, though, one group of users is feeling a little too discouraged, leading to a discussion of how much misery can appropriately be inflicted upon users who use problematic but architecturally legal operations.

The problem with atomic operations that cross cache-line boundaries is that the system bus must take special measures to ensure that both cache lines are simultaneously protected from concurrent access. In practice, that means locking the bus for the duration of the operation, which can stall every other processor in the system. A malicious program executing a tight loop with a split-lock operation can destroy the performance of the system as a whole. For this reason, split-lock operations have long been frowned upon.

Unfortunately, software that is malicious (or just poorly written) turns out to be remarkably indifferent to even the most severe of frowns. So, starting in 2019, kernel developers sought more persuasive ways to get their point across. The initial work was done by Fenghua Yu but, in the end, this patch by Peter Zijlstra was merged in January 2020 for the 5.7 kernel release. It gives the kernel the ability to respond to traps caused by split-lock operations and provides three modes for that response, selectable by the split_lock_detect= command-line parameter:

off causes the kernel to behave as it did before; split-lock operations are not detected and nothing is done when they occur.
warn (the default) causes a (rate-limited) warning to go to the system log when a split-lock operation is detected.
fatal causes the kernel to immediately kill (with SIGBUS) any process attempting split-lock operations.

The hope was that the warn mode would be sufficient to alert users to the problem, and lead to software being fixed, while not actually interfering with anybody's use of their systems. By the time the 5.19 development cycle came around earlier this year, though, it seemed that little progress toward the removal of split lock operations had been made, and the denial-of-service problem was as present as before. So it was decided to take a stronger stance against split locks.

One option, of course, would be to just switch to the fatal mode by default, but that would be a rather draconian solution. Instead, Tony Luck wrote a patch with the descriptive title of "make life miserable for split lockers". It modified the warn mode to punish processes doing split locks without actually killing them. Instead, detection of a split lock would lead to a 10ms delay, then serialization via a semaphore. When this mode is selected, a malicious program performing split locks succeeds in slowing itself down, but no longer has much effect on the system as a whole. This change was applied during the 5.19 merge window.

In mid-September, a GitHub user named "pibberflibbits" posted a bug report saying that the performance of the God of War game on Linux had become "insanely low". It took a little while, but the participants in the resulting discussion eventually figured out that the problem was the split-lock penalty. Evidently one cannot be a proper god of war using just ordinary locks, so the game does a lot of split locking. Luck's patch had achieved its intended purpose; God of War players are now suitably miserable.

Guilherme G. Piccoli, though, was not celebrating this victory over the Gods; instead, he posted a patch arguing that "it seems unacceptable to regress old/proprietary userspace programs through a default configuration that previously worked". This patch restored the old behavior of the warn mode and added a new seq mode that would slow down split-lock users like warn mode does now. The warn mode would remain the default, lifting the misery from the game-playing world.

Opinions on this change were mixed. Luck pointed out that gamers can simply disable split-lock detection by rebooting with split_lock_detect=off on the kernel command line. If the seq mode were to be added, he said, it should be the default. He also suggested filing a bug with the publishers of God of War to get its misbehavior fixed.

Others disagreed, though. Joshua Ashton argued that the problem is more widespread: "It's not just about God of War specifically. There are many old titles that will never, ever, get updated to fix this problem. These titles worked perfectly fine and were performant before." Others pointed out that many gamers are unlikely to be comfortable with adjusting kernel command-line parameters. Dave Hansen observed that the misery-inflicting mode had worked as intended and had brought the problem to light. Even so, he continued:

My gut says we should keep the warnings and kill the misery. The folks who are going to be able to fix the issues are probably also the ones looking at dmesg and don't need the extra hint from the misery. The folks running Windows games don't look at dmesg and just want to play their game without misery.

Luck, though, argued that split locking creates its own misery for processes other than the one responsible, and that the current mode "serves a very useful purpose on multi-user systems". He suggested that perhaps some sort of heuristic could be developed to confine the misery to multi-user systems.

The definitive answer, though, came from Thomas Gleixner, who pointed out that slowing down split lockers by default is the only choice that distributors could make; anything else creates an easily exploitable denial-of-service vulnerability. So the slowdown needs to remain: "Attack vector prevention has precedence over broken applications". He did suggest, though, that a sysctl knob could be added to control split-lock detection; that would allow users of broken applications to get their performance back without the need to figure out how to change command-line parameters or reboot their systems.

That is the approach that Piccoli has taken for his second attempt at addressing the problem. The new kernel.split_lock_mitigate knob, if set to zero, will disable the penalization of processes using split locking (while retaining the warning sent to the system log). The default is to retain the slowdown. This patch seems to have pleased everybody involved and looks likely to find its way into the 6.2 kernel. Affected gamers will have to set the new knob appropriately, but knowing which sysctls to tweak could be said to be part of being a true God of War.

Index entries for this article
Kernel	Architectures/x86

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 16:27 UTC (Wed) by iabervon (subscriber, #722) [Link]

I wonder if it would be viable to hook this into one of the process priority mechanisms that the kernel has. Surely, it's okay to potentially stall all of the other processors on the system if the process has sufficient priority to preempt any processes that are running on those processors, and gamers could use a new launcher that expresses the fact that they really don't mind if their software builds make little or no progress while they're playing God of War.

It seems like it might be sufficient to just remove the explicit delay; you'd still be heavily penalizing processes that do split-lock operations, but only if there's anything else for them to contend with, so they'd be unaffected if they have a whole computer to themselves.

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 16:53 UTC (Wed) by epa (subscriber, #39769) [Link] (14 responses)

If it's just a game, perhaps lock safety is not that important? Can the kernel trap on the split-lock operation and execute a less strongly atomic operation instead? As an option, not enabled by default, of course.

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 17:08 UTC (Wed) by mb (subscriber, #50428) [Link]

I would be really (!!1!11) upset, if a non-atomic increment of the score counter in my favorite Hello Kitty game spoiled my all time high score!

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 17:44 UTC (Wed) by mss (subscriber, #138799) [Link] (12 responses)

Games, as other programs, can crash if their atomic operation sematics get suddenly weakened.

Just imagine how annoying random game crashes would be.

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 19:37 UTC (Wed) by epa (subscriber, #39769) [Link] (10 responses)

To which the only answer is ‘suck it and see’. If the game still runs, doesn’t crash or behave strangely, then the looser behaviour is good enough.

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 20:15 UTC (Wed) by mb (subscriber, #50428) [Link] (9 responses)

What you gained: Nothing at all. On a gaming system there is no concern of stalled or otherwise DoSed other applications, because there are none.

What you payed: The game crashes or sporadically misbehaves or is slower than before.

In a single application scenario, which a game effectively is, the only task of the kernel is to make that application run as smooth as possible.
If there is a change to the kernel, which reduces game performance with no additional gain whatsoever, then it is a regression.
Clearly, there is no stalling problem of other applications, because it would have been noticed by the game developer before. This is mitigating a problem, that doesn't exist in this scenario.

The search for the correct amount of split-lock misery

Posted Oct 19, 2022 22:14 UTC (Wed) by mpr22 (subscriber, #60784) [Link] (6 responses)

> On a gaming system there is no concern of stalled or otherwise DoSed other applications, because there are none.

Unless you're streaming to Twitch, or your multiplayer game has no built-in chat and you're using a third-party application to communicate with your fellow players, or you hate the in-game music and are running a media player to get a soundtrack you actually like, or ...

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 3:10 UTC (Thu) by mb (subscriber, #50428) [Link] (5 responses)

And neither of these situations is improved or solved by stalling the game (the app that uses split locks).
It always reduces the user experience. It doesn't improve anything.

I would go as far to argue that throttling an application due to split locks on a single user system is always wrong.
The only case where we gain something from throttling the app is on a multiuser system where one user is maliciously attacking the system as a whole. (This might be an unpriviledged remote user who might not even need a system account).

The search for the correct amount of split-lock misery

Posted Oct 24, 2022 6:20 UTC (Mon) by marcH (subscriber, #57642) [Link] (3 responses)

It improves the future for maintained and future apps. For everything except orphaned apps.

The search for the correct amount of split-lock misery

Posted Oct 28, 2022 11:52 UTC (Fri) by Vipketsh (guest, #134480) [Link] (2 responses)

If you are only ever looking to make the future good, the present will *always* suck.

Is there any known exploit using this scenario to guard against ? I mean, users shouldn't go off and install random "break my machine" programs or if they do DoS-ing an individual's machine isn't exactly beneficial, nor is it the worst that an evil application can do.

I see this change and behaviour as more inline with the seemingly recent trend that the linux kernel is to do well out-of-the-box for cloud-based deployments, and only as a second concession allow individuals to tweak things so their single-user laptops and systems may work reasonably. A little sad and quite backwards since cloud operators have lots of talent on hand to tweak and analyse their systems (and they do anyway) with which individuals can't hope to compete.

The search for the correct amount of split-lock misery

Posted Oct 28, 2022 22:18 UTC (Fri) by marcH (subscriber, #57642) [Link] (1 responses)

> If you are only ever looking to make the future good, the present will *always* suck.

Actually, by stalling abusive applications this improves the present of other applications too. Single app gaming really does not feel like a top Linux use case today. Unless you're counting ChromeOS, Android and Steam deck but then you're not administering your own system and you can count on its admins to optimize all this for you.

> Is there any known exploit using this scenario to guard against ?

This is not just about intentional abuse but also about performance under unintentional abuse too.

> recent trend that the linux kernel is to do well out-of-the-box for cloud-based deployments, and only as a second concession allow individuals

As already explained elsewhere this is not just about multi-user but about multi process in general.

The search for the correct amount of split-lock misery

Posted Oct 29, 2022 2:30 UTC (Sat) by Vipketsh (guest, #134480) [Link]

> by stalling abusive applications this improves the present of other applications too

In theory I understand, but do you or anyone else have any numbers to the magnitude of this slowdown on, say, a typical 4/8 core laptop ? Theoretical problems should always be trumped by practical breakage, until proven otherwise.

This is like arguing that some application causing heavy contention on a kernel lock is a DoS attack or, if not that, the application is slowing down the system and thus the kernel should prevent it from running. Clearly this is madness. How are split locks any different ?

> also about performance under unintentional abuse too.

The malicious case is all that matters. For everything else, one of two possible outcomes is possible: (i) the performance issue is fixed and so is not a problem or (ii) the performance issue is not fixed in which case the choices are "slow(er) system" and "program does not work". A slower system is *always* preferred to a non working one and that is doubly true for machines of individuals.

Also let's not forget that this case is only relevant to x86 -- the risc machines (arm & riscv) not only don't support this feature but often don't support *any* unaligned access, atomic or otherwise. With their rise in popularity there will practically not be any cases of this in opensource and with propriety apps the question is that it works or not and working is always preferred.

The search for the correct amount of split-lock misery

Posted Oct 28, 2022 21:31 UTC (Fri) by mrugiero (guest, #153040) [Link]

> I would go as far to argue that throttling an application due to split locks on a single user system is always wrong.

You could, probably accurately, argue that this is always wrong if forced on the user. I.e., you could argue it should be opt-in for single user systems. But some of use would rather want to be able to report these kind of issues. Let alone being the developer of one such game and wanting a fair warning that you may be causing problems. Do you code in a multi-user system most of the time? I don't.

The search for the correct amount of split-lock misery

Posted Nov 8, 2022 8:39 UTC (Tue) by roblucid (guest, #48964) [Link] (1 responses)

No, a performance bug that prevents effective multi-core operation has been detected and should be fixed.
Gamers are concerned about core scaling, the application is sabotaging itself.

The search for the correct amount of split-lock misery

Posted Nov 8, 2022 13:10 UTC (Tue) by Wol (subscriber, #4433) [Link]

The question is simple - if the majority of apps affected by this are games, then this is a user space regression. It needs to be reverted.

Seeing as it appears to be pretty much only gaming systems that are affected, then it most definitely is a user space regression. Especially as a lot of these applications cannot be fixed by the users. And driving gamers to Windows/Apple will be seriously damaging to linux.

And lastly, isn't the (mis)feature itself actually a HARDWARE bug? "Mitigating" the bug by rendering the computer pretty much unusable for its intended purpose isn't a good idea!

Cheers,
Wol

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 15:41 UTC (Thu) by ericproberts (guest, #139553) [Link]

It certainly would make things more miserable.

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 2:25 UTC (Thu) by kschendel (subscriber, #20465) [Link] (5 responses)

The sysctl knob seems a good idea, but IMO it's backwards -- the name is "split_lock_mitigate", so to mitigate, it should be set to 1. To cause pain, mitigation is turned off, ie should be set to zero.

Bikeshedding, I guess, but still.

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 13:00 UTC (Thu) by tamara_schmitz (guest, #141258) [Link]

I think it's already as you described it:

0 Disables the misery mode - just warns the split lock on kernel log.
1 Enables the misery mode (this is the default) - penalizes the split lockers with intentional performance degradation.

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 15:53 UTC (Thu) by WolfWings (subscriber, #56790) [Link] (2 responses)

In this case the mitigation is preventing impact to the rest of the system, which indeed is what "1" does. It directly penalizes anything that triggers split-cacheline-atomic scenarios by forcing it to eat a 10ms pause in it's next scheduling.

"0" disables the mitigation, allowing any single application to hog the bus for it's own locks and arbitrarily DDoS the rest of the system as a result.

The search for the correct amount of split-lock misery

Posted Oct 23, 2022 1:11 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

It's not a DDoS, it's just a regular DoS. There's only one system involved.

More to the point, this whole idea is clownshoes anyway. The purpose of the kernel is to serve userspace, not to tell userspace what to do. If userspace wants to hurt its own performance, that's the sysadmin's* problem. For some configurations, it might make sense to allow the sysadmin to block or restrict split-lock operations, but it should function like an rlimit, not a system-wide "block first, ask questions later" flag.

* If there is no sysadmin, that means it's a single-user system and the "problem" is even more nonsensical.

The search for the correct amount of split-lock misery

Posted Oct 23, 2022 17:27 UTC (Sun) by marcH (subscriber, #57642) [Link]

> The purpose of the kernel is to serve userspace, not to tell userspace what to do.

No, the purpose of the kernel is to protect userspace from each other (some single application systems don't even have a kernel)

> If userspace wants to hurt its own performance, that's the sysadmin's* problem.

Default settings matter A LOT and it's really good to see overbusy maintainers spending so much time discussing and getting them right.

> > Affected gamers will have to set the new knob appropriately, but knowing which sysctls to tweak could be said to be part of being a true God of War.

Happy ending:
- The "bug" will not go unnoticed and new applications will avoid it
- Old applications will run too after a few minutes searching the Internet.

Very delicate trade-off perfectly found.

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 4:08 UTC (Fri) by ballombe (subscriber, #9523) [Link]

Well, there should exist a mitigation that still allow God of War to run with decent performance.
The current one was made painful by design, not by necessity. This is a false dilemma.

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 2:36 UTC (Thu) by developer122 (guest, #152928) [Link] (20 responses)

>Attack vector prevention has precedence over broken applications.
Taken in context, I find this statement absurd.

There is no attack being mitigated. The users of split-lock dependent software on single-user machines run those workloads fully accepting (nay, _expecting_*) the performance characteristics. It is tautologically impossible for this mechanism to provide them with any protection whatsoever from themselves.

The default is very plainly in the wrong place. The default should indeed prevent harm: by not disrupting existing workloads.

The place where the penalty is useful is in a multiuser system system. These multiuser systems are *exactly* the place to find a capable sysadmin who can fiddle a knob to ward off bad behavior, even as it arises. User complaints roll in, as do dmesg messages, and the problem is swiftly rectified.

I think this is simply a case where annoyance that "software is still not being fixed" is fueling an impulse to steam-roll past the reports of harm that the new default is causing. The unjustified expectation is that inflicting additional pain on the reporters will somehow convince an unrelated population to change their behavior.

* "Why shouldn't my game/database/whatever consume all available resources for maximum performance?"

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 2:39 UTC (Thu) by developer122 (guest, #152928) [Link] (18 responses)

As a footnote:

These users running various software may not be in a position to adjust the setting in question. There is a very good reason for the hard requirement that WINE and similar *MUST* work without elevated privileges available. This has come up before during the work on system call emulation.

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 7:17 UTC (Thu) by taladar (subscriber, #68407) [Link] (17 responses)

The number of people gaming on a system where they have limited privileges must be extremely small though.

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 10:19 UTC (Thu) by hkario (subscriber, #94864) [Link] (16 responses)

no, it isn't, wine _really_ doesn't like being run as root; everybody is running their wine environments or the whole Steam client from regular user accounts, no sudo prompts in sight

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 10:32 UTC (Thu) by syrjala (subscriber, #47399) [Link] (1 responses)

I believe the point was that most people that need this have the ability to just edit /etc/sysctl.conf or whatever.

And I think most of the remaining cases are probably solved by: "Mom/Dad, can you toggle this sysctl knob for me?". Assuming the kid hasn't found some local root exploit already ;)

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 20:32 UTC (Thu) by developer122 (guest, #152928) [Link]

These same arguments came up in system call emulation and it didn't fly there either.

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 12:11 UTC (Fri) by bartoc (guest, #124262) [Link] (13 responses)

Presumably you would do something similar to how windows programs are installed on steam, upon first launch it (sometimes) requests elevation and installs "things", in this case one such thing could be a suid program that turns off misery mode and then launches the game, restoring misery mode when the game has exited.

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 12:19 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (5 responses)

Mmm, yes, the fun problem of figuring out "what did this program do to my machine?" when trying to uninstall it. Not sure I'd like to import that particular behavior from Windows.

Incidentally, more programs supporting `/etc/prog.d/*.conf`-style configuration would be greatly appreciated :) .

> restoring misery mode when the game has exited.

ABAB problems sound fun with this. Also sounds like a maze full of fun error code paths that will never be reliably tested.

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 12:29 UTC (Fri) by bartoc (guest, #124262) [Link] (4 responses)

Yeah. Though actually another option might be to just have wine know how to bump around the allocations in question to avoid the split-locks in the first place. It would have to be a game-specific hack probably.

I wonder if windows has compat shims for this kinda stuff.

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 15:00 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

Yeah, it's done those kinds of things: https://arstechnica.com/gadgets/2022/10/windows-95-went-t...

TheOldNewThing blog by Raymond Chen also has lots of tales to this effect: https://devblogs.microsoft.com/oldnewthing/

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 23:40 UTC (Fri) by bartoc (guest, #124262) [Link] (1 responses)

yeah ofc, but I mean _this specific problem_

The search for the correct amount of split-lock misery

Posted Oct 22, 2022 23:26 UTC (Sat) by developer122 (guest, #152928) [Link]

Not possible, see comment below. This approach was also suggested during the kernel syscall emulation saga and also shot down then too.

The search for the correct amount of split-lock misery

Posted Oct 22, 2022 23:25 UTC (Sat) by developer122 (guest, #152928) [Link]

This is not possible.

This same suggestion *also* came up during the kernel syscall emulation work. It was suggested that instead of having the kernel reroute attempts by a program to call a windows kernel syscall, wine could simply scan for and patch any such attempts.

This is not only hideously complicated and unreliable, it also doesn't work at all with exactly the class of programs we're interested with: Games.

They frequently include a wide variety of very gnarly anti-tampering features which cannot be automatically overcome. It is for this reason wine seeks to emulate the environment around the program, and never ever attempts to reach inside it.

The search for the correct amount of split-lock misery

Posted Oct 22, 2022 23:21 UTC (Sat) by developer122 (guest, #152928) [Link] (6 responses)

This is unfortunately completely unacceptable.

The entire steam linux runtime, and indeed every wine runtime MUST NOT require any more privileges than a regular user program. For installation or during runtime.

One philosophical reason: If it's emulating the windows ABI to run a program that doesn't requires privileges, then it must not as for any additional privileges of it's own.

Privileges during installation have never been required and are not allowed. Privileges that are repeatedly granted through permanently-lodged a setuid program that runs every time the game launches? Unthinkable.

The search for the correct amount of split-lock misery

Posted Oct 23, 2022 0:26 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

> One philosophical reason: If it's emulating the windows ABI to run a program that doesn't requires privileges, then it must not as for any additional privileges of it's own.
>
> Privileges during installation have never been required and are not allowed. Privileges that are repeatedly granted through permanently-lodged a setuid program that runs every time the game launches? Unthinkable.

That is a somewhat odd position to take, because on Windows:

1. Steam installs a service, which runs as "Local System." This is entirely equivalent to a daemon running as root, it's just that Windows uses different terminology.
2. It is not uncommon for Steam games to display a UAC prompt on first run. This is usually to install MSVC++ redistributable or something of that nature, rather than doing some weird custom thing, but it still technically qualifies as elevated privileges.
3. The way that Steam manages to install games without requiring a UAC prompt for every installation is kind of... terrible. Basically, it sets the library as world-writable and world-executable. That means anyone can drop a DLL in the same directory as the game and silently hijack it, because Windows considers the directory permissions to constitute a security boundary (indeed, you could just as easily replace the whole exe file). Now, you may argue that this is irrelevant, since the game is running as an ordinary user rather than a superuser, but we have to consider the https://xkcd.com/1200/ factor.

Note: (3) is my understanding of how it works, but like a lot of gamers, I do something *even more terrible than the above* and install everything into C:\Games instead of Program Files, so I don't know for sure if (3) is the default behavior or just my own damn fault. Why do I do this, if I know it's terrible? Because life is too short, that's why. It is not uncommon for older games to randomly break if they don't have full write access to "everything," so deliberately subverting the OS's security is not uncommon for PC gamers. Even if Steam is doing everything right, that doesn't help when half the users are overriding the defaults and using a less-secure configuration.

The search for the correct amount of split-lock misery

Posted Oct 24, 2022 15:01 UTC (Mon) by developer122 (guest, #152928) [Link]

All of the above is pretty much invalidated by "on windows."
We all know that user account permissions on windows are a total mess.

The point is that high permissions should *never* be required for installing and running a game, which does nothing but:
1) download some files
2) copy them into the home directory
3) launch an executable
4) access graphics APIs/sound APIs/input APIs

NONE of those things have ever warranted elevated privledges. Wine has continued doing it's thing for over a decade now without requiring sudo even once. The use of sudo wasn't needed when the ability to trap and redirect windows syscalls was added to the linux kernel, it sure as hell shouldn't be required because of a performance knob.

The search for the correct amount of split-lock misery

Posted Oct 24, 2022 13:35 UTC (Mon) by implr (subscriber, #159818) [Link] (3 responses)

>The entire steam linux runtime, and indeed every wine runtime MUST NOT require any more privileges than a regular user program. For installation or during runtime.
It already does, at least for VR. On first run SteamVR will ask for sudo, which it uses to setcap CAP_SYS_NICE on its various binaries. If it detects that it can already run at nice -10, it'll skip that step.

The search for the correct amount of split-lock misery

Posted Oct 24, 2022 15:08 UTC (Mon) by developer122 (guest, #152928) [Link] (2 responses)

*Due to unix's legacy as a multiuser system.* It dates back to the early days of unix that only root can decrease a nice value, giving more cpu time.

Does this privileged tunable make any sense on a singe user system? No it does not.
Worst case scenario, the user only DoSs themselves.

The search for the correct amount of split-lock misery

Posted Oct 24, 2022 21:01 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

And wasn't Unix originally written to run games? Specifically Star Trek, iirc ... :-)

Cheers,
Wol

The search for the correct amount of split-lock misery

Posted Oct 25, 2022 10:22 UTC (Tue) by geert (subscriber, #98403) [Link]

Didn't they just needed an OS to develop a document formatting system for the AT&T patents division? ;-)

https://www.gnu.org/software/groff/manual/html_node/Histo...

The search for the correct amount of split-lock misery

Posted Oct 28, 2022 21:35 UTC (Fri) by mrugiero (guest, #153040) [Link]

> * "Why shouldn't my game/database/whatever consume all available resources for maximum performance?"

To be fair, split locks hurt their own performance as well. Maybe you do want your game to consume all available resources for maximum performance. Then take care of it actually providing maximum performance instead of harming it :)

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 13:13 UTC (Thu) by DouglasJM (subscriber, #6435) [Link] (1 responses)

"...proper god of war using just ordinary locks, so the game does a lot of split locking. Luck's patch had achieved its intended purpose; God of War players are now suitably miserable. "

This is the type of writing that makes reading LWN so enjoyable, tongue in cheek comments that make me smile or laugh out loud. Keep up the great work folks, I look forward to reading LWN!

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 14:12 UTC (Thu) by JamesErik (subscriber, #17417) [Link]

Couldn't agree more. For me, the welcome levity was this:

'''...split-lock operations have long been frowned upon. Unfortunately, software that is malicious (or just poorly written) turns out to be remarkably indifferent to even the most severe of frowns.'''

Thank you, Jonathan. I very much admire your editorial style, your professionalism, and your consistently human touch to your work... including humor that hits the funnybone of geeks everywhere!

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 15:32 UTC (Thu) by jhoblitt (subscriber, #77733) [Link]

If a sysctl is being added, why not expose the ability to make split locks fatal via the knob (or even make that the default)?

The search for the correct amount of split-lock misery

Posted Oct 20, 2022 17:37 UTC (Thu) by excors (subscriber, #95769) [Link]

This article sounds a little suspicious of the merits of the gods and wars, but God of War is actually a heartwarming tale about a troubled man learning how to be a better father to his estranged son as they go on a journey together, hoping to achieve some closure by scattering his wife's ashes on the highest peak in the lands. It's surprisingly sophisticated and a lovely game.

Also you have a cool axe and kill a few gods and rip many thousands of monsters' heads off.

The search for the correct amount of split-lock misery

Posted Oct 21, 2022 10:04 UTC (Fri) by scientes (guest, #83068) [Link] (1 responses)

Considering the long history of silicon bugs around "transactional memory" which is basically atomic operations over a much larger amount of multiple cache lines, this is clearly just the "exercise for the reader" warm-up to getting that stuff correct.

Ivan Goddard said that atomic operations rather than the cache-flushing approach is not really worth it, because it costs almost the same due to cache-coherency issues. I have no thought through that however (I can't even get my head around L2 cache-coherent Zync-9000 FPGA with AXI yet...)

The search for the correct amount of split-lock misery

Posted Oct 23, 2022 17:22 UTC (Sun) by marcH (subscriber, #57642) [Link]

> Considering the long history of silicon bugs around "transactional memory" which is basically atomic operations over a much larger amount of multiple cache lines

Silicon bugs asides, this "basically" sounds weird in a discussion about split lock performance. The concept of a transaction is exactly what provides atomicity over a larger set of data _without_ any extra performance cost. Transactions make performance independent from the data size.

The search for the correct amount of split-lock misery

Posted Oct 25, 2022 6:27 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Can this be a prctl setting? Perhaps a cgroup one?

The search for the correct amount of split-lock misery

Posted Oct 29, 2022 8:40 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (2 responses)

I was going to ask the same. prctl() is precisely used exactly for such things, you can decide of FPU emulation, alignment checks, speculation bypass etc. The sole reason people complain precisely is because there's no wrapper allowing the behavior to change for the life time of a program, which prctl would do just fine.

The search for the correct amount of split-lock misery

Posted Oct 29, 2022 20:06 UTC (Sat) by nybble41 (subscriber, #55106) [Link] (1 responses)

prctl() is for things which affect the operation of the process itself. This controls whether a program making use of split locks can negatively impact the performance of the entire *system*, which puts it in a category similar to real-time priorities. As such, a privileged sysctl knob is a reasonable solution. A cgroup option could also work, and would allow access to split locks to be more precisely targeted, but it would need to require some form of elevated privilege to enable it for a regular user account.

The search for the correct amount of split-lock misery

Posted Oct 29, 2022 21:11 UTC (Sat) by joib (subscriber, #8541) [Link]

Not sure cgroups are appropriate either, since the kernel cannot isolate the impact of the split-locking only to the members of the cgroup. I guess you could make some mechanism like "at most N split locks per second for this cgroup, after that split lock usage will be throttled", though I'm not sure if it would actually be useful for any practical situation. The sysctl knob sounds much simpler and should cover most usecases.

The search for the correct amount of split-lock misery

Posted Oct 28, 2022 22:20 UTC (Fri) by marcH (subscriber, #57642) [Link]

> The hope was that the warn mode would be sufficient to alert users to the problem, and lead to software being fixed, while [...]. By the time the 5.19 development cycle came around earlier this year, though, it seemed that little progress toward the removal of split lock operations had been made, and the denial-of-service problem was as present as before.

Wait, how does anyone know the warning had no effect? When we spot a warning and try to address it we don't tell anyone and AFAIK the kernel has no telemetry feature yet.