Another push for sched_ext

By Jonathan Corbet
May 9, 2024

The extensible scheduler class (or "sched_ext") is a comprehensive framework that enables the implementation of CPU schedulers as a set of BPF programs that can be loaded at run time. Despite having attracted a fair amount of interest from the development community, sched_ext has run into considerable opposition and seems far from acceptance into the mainline. The posting by Tejun Heo of a new version of the sched_ext series at the beginning of May has restarted this long-running discussion, but it is not clear what the end result will be.

As a quick refresher: sched_ext allows the creation of BPF programs that handle almost every aspect of the scheduling problem; these programs can be loaded (and unloaded) at run time. Sched_ext is designed to safely fall back to the completely fair scheduler should something go wrong (if a process fails to be run within a time limit, for example). It has been used to create a number of special-purpose schedulers, often with impressive performance benefits for the intended workload. See this 2023 article for a more detailed overview of this work.

Heo lists a number of changes that have been made to sched_ext since the previous version was posted in November. For the most part, these appear to be adjustments to the BPF API to make the writing of schedulers easier. There is also a new shutdown mechanism that, among other things, disables the BPF scheduler during power-management events like system suspend. There is now support for CPU-frequency scaling, and some debugging interfaces have been added to make developing schedulers easier. The core design of sched_ext appears to have stabilized, though.

Increasing interest

Even before getting to the changes, though, Heo called attention to the increasing interest in sched_ext that is being shown across the community and beyond. Valve is planning to use sched_ext for better game scheduling on the Steam Deck. Ubuntu is considering shipping it in the 24.10 release. Meta and Google are increasing their use of it in their production fleets. There is also evidently interest in using it in ChromeOS, and Occulus is looking at it as well. Heo concludes that section with:

Given that there already is substantial adoption which continues to grow and sched_ext doesn't affect the built-in schedulers or the rest of kernel in an invasive manner, I believe it's reasonable to consider sched_ext for inclusion.

Whether that inclusion will happen remains an open question, though. The posting of version 4 of the patch set in July 2023 led to a slow-burning discussion on the merits of this development. Scheduler maintainer Peter Zijlstra rejected the patches outright, saying:

There is not a single doubt in my mind that if I were to merge this, there will be Enterprise software out there that will mandate its own BPF sched thing, or else it won't work.
They will not care, they will not contribute, they might even pull a RedHat and only share the code to customers.

He added that he saw no value in merging the code, and dropped out of the conversation. Mel Gorman also expressed his opposition to merging sched_ext, echoing Zijlstra's concern that enterprise software would start requiring the use of special-purpose schedulers. He later added that, in his opinion (one shared with Zijlstra), sched_ext would work actively against the improvement of the current scheduler:

I generally worry that certain things may not have existed in the shipped scheduler if plugging was an option including EAS, throttling control, schedutil integration, big.Little, adapting to chiplets and picking preferred SMT siblings for turbo boost. In each case, integrating support was time consuming painful and a pluggable scheduler would have been a relatively easy out that would ultimately cost us if it was never properly integrated.

Heo, naturally, disagreed with a lot of the concerns that had been raised. There are, he said, scheduling problems that cannot be addressed with tweaks to the current scheduler, especially in "hyperscaling" environments like Meta. He disagreed that sched_ext would impose a maintenance burden, arguing that the intrusion of BPF into other parts of the kernel has not had that result. Making it possible for users to do something new is beneficial, even if there will inevitably be "stupid cases" resulting from how some choose to use the new feature. In summary, he said, opponents are focused on the potential (and, in his opinion, overstated) costs of sched_ext without taking into account the benefits it would bring.

Restarting the conversation

That message, in October, was the end of the conversation at the time. Heo is clearly hoping for a better result this time around, but Zijlstra's response was not encouraging:

I fundamentally believe the approach to be detrimental to the scheduler eco-system. Witness the metric ton of toy schedulers written for it, that's all effort not put into improving the existing code.

He said that he would not accept any part of this patch series until "the cgroup situation" has been resolved. That "situation" is a performance problem that affects certain workloads when a number of control groups are in use. Rik van Riel had put together a patch series to address this problem in 2019, but it never reached the point of being merged; Zijlstra seems to be insisting that this work be completed before sched_ext can be considered, and he gave little encouragement that it would be more favorably considered even afterward.

Heo expressed a willingness (albeit reluctantly) to work on the control-group problem if it would clear the way for sched_ext. He strongly disagreed with Zijlstra's characterization of sched_ext schedulers as "toy schedulers" and the claim that working on sched_ext will take effort away from the mainline scheduler, though. There is, he said, no perfect CPU scheduler, so the mainline scheduler has to settle for being good enough for all users. That makes it almost impossible to experiment with "radical ideas", and severely limits the pool of people who can work on the scheduler. Much of the energy that goes into sched_ext schedulers, he said, is otherwise unavailable for scheduler development at all.

There is, he said, value in some of those radical ideas:

Yet, the many different ways that even simple schedulers can demonstrates sometimes significant behavior and performance benefits for specific workloads suggest that there are a lot of low hanging fruits in the area. Low hanging fruits that we can't easily reach from our current local optimum. A single implementation which has to satisfy all users all the time is unlikely to be an effective vehicle for mapping out such landscape.

Igalia developer Changwoo Min, who is working with Valve on gaming-oriented scheduling, supported Heo's argument, saying that: "The successful implementation of sched_ext enriches the scheduler community with fresh insights, ideas, and code". That, as of this writing, is where this conversation stands.

What next?

Sched_ext is on the schedule for the BPF track of the Linux Storage, Filesystem, Memory-Management, and BPF Summit, which begins on May 13. That discussion will cover the future development of sched_ext but, most likely, will not be able to address the question of whether this work should be merged at all. That discussion could continue, on the mailing lists and elsewhere, for some time yet.

Sometimes, when a significant kernel development stalls in this way, distributors that see value in it will ship the patches anyway, as Ubuntu, Valve, and ChromeOS are considering doing. While shipping out-of-tree code is often discouraged, it can also serve to demonstrate interest in a feature and flush out any early problems that result from its inclusion. If things go well, this practice can strengthen the argument for merging the code into the mainline, albeit with the ever-present possibility of changes that create pain for the early adopters.

Whether that will be the path taken for sched_ext remains to be seen. What is certain is that this work has attracted a lot of interest and is unlikely to go away anytime soon. Sched_ext has the potential to enable a new level of creativity in scheduler development, even if it remains out of the mainline — but that potential will be stronger if it does end up being merged. Significant scheduler patches are not merged quickly even when they are uncontroversial; this one will be slower than most if it is accepted at all.

Index entries for this article
Kernel	BPF/CPU scheduling
Kernel	Scheduler/Extensible scheduler class

Another push for sched_ext

Posted May 9, 2024 15:36 UTC (Thu) by flussence (guest, #85566) [Link] (6 responses)

> > I fundamentally believe the approach to be detrimental to the scheduler eco-system. Witness the metric ton of toy schedulers written for it, that's all effort not put into improving the existing code.

No, no. There isn't a “scheduler ecosystem”. There is a scheduler *monoculture*, and for twenty years now people have interpreted that as damage and routed around it. The few of those who've dared to negotiate with the scheduler tyrant directly in the past have burned out and left kernel development forever - which is why it took a witheringly embarrassing fifteen years to properly address the Wasted Cores paper with more than a band-aid.

The only two choices remaining here are either accept the reality of the situation and address the root cause (in old hacker parlance, "maintainer needs face time with a LART"), or continue to be obstinate about it and encourage downstream hacks to proliferate - both individual and corporate. The corporations don't need royal consent to F up the kernels they ship on their devices, they've *been* doing it. And because they're butchering their device-specific kernels with hardwired hacks, for want of a sane pluggable mechanism, *users* are the only ones that get screwed because they can't turn it off.

Another push for sched_ext

Posted May 9, 2024 15:48 UTC (Thu) by intelfx (subscriber, #130118) [Link] (2 responses)

> in old hacker parlance, "maintainer needs face time with a LART"

Could you please translate that? I’m not old hacker enough and that doesn’t seem to parse (or google).

Another push for sched_ext

Posted May 9, 2024 16:03 UTC (Thu) by schessman (subscriber, #82966) [Link]

LART : Luser Attitude Readjustment Tool. See Clue-by-four.

Another push for sched_ext

Posted May 9, 2024 19:56 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> that doesn’t seem to parse (or google).

FWIW, DDG gave the "Luser" expansion in its info box. My Google result has Wikipedia/Wiktionary as the first result (same thing).

Another push for sched_ext

Posted May 15, 2024 21:05 UTC (Wed) by Conan_Kudo (subscriber, #103240) [Link] (2 responses)

I personally would love to see sched_ext to land upstream for the simple reason that nobody upstream cares about desktop Linux and regular user needs for scheduling.

Having a way to build optimized schedulers outside of the kernel to bypass CFS and EEVDF and their flawed behaviors for regular users gives us an actual chance to deal with long-running responsiveness and performance-under-load issues that desktop Linux users observe that otherwise drive them to use alternative kernels that patch in other schedulers (like Con Kolivas' MuQSS and its predecessors).

Another push for sched_ext

Posted May 15, 2024 22:06 UTC (Wed) by jordalgo (guest, #170580) [Link]

Totally agree. It would be great to be able to experiment with new schedulers quickly.

Another push for sched_ext

Posted May 16, 2024 17:00 UTC (Thu) by Manifault (guest, #155796) [Link]

Improving the desktop experience was definitely one of the biggest targets we had in mind for potential improvements that could result from sched_ext. See this for an example of efforts to improve interactivity: https://drive.google.com/file/d/1fyHt9BYGha6apl7HAkibwpy5.... The techniques used there could also be added to EEVDF, which I'd be super happy to see.

Another push for sched_ext

Posted May 9, 2024 18:30 UTC (Thu) by summentier (guest, #100638) [Link] (3 responses)

> [Peter Zijlstra] said that he would not accept any part of this patch series until ""the cgroup situation"" has been resolved

I understand that it has been common practice to strongarm^Wencourage patch submitters into cleaning up about as much mildly related kernel code as they are willing to in order to get their patches merged.

This on the other hand seems to make just as much sense as the argument that Ukraine aid cannot be passed until the US-Mexican border is completely secure.

Or am I missing something here? Unfortunately, the maintainer does not really elaborate in his post ...

Another push for sched_ext

Posted May 9, 2024 19:13 UTC (Thu) by summentier (guest, #100638) [Link] (2 responses)

Umm, okay.

I was trying to be funny there, but upon rereading this, I realized I was WAY out of line. This is the kernel, nobody's life is on the line. Apologies to Mr Zijlsta for drawing such a connection.

Unfortunately, there is no way to delete a comment. Granted, it would have been better to catch this before posting. Mr Corbet, feel free to delete this.

Another push for sched_ext

Posted May 12, 2024 14:48 UTC (Sun) by marcH (subscriber, #57642) [Link]

Ignoring the "out of line" analogy, this seems like a good question. Hopefully some people can see past the former and help answer the latter.

Another push for sched_ext

Posted May 28, 2024 2:44 UTC (Tue) by DanilaBerezin (guest, #168271) [Link]

Okay what happened here?

This is probably futile, but...

Posted May 9, 2024 23:32 UTC (Thu) by mattdm (subscriber, #18) [Link] (4 responses)

The "pull a RedHat" comment is a non sequitur. The final, best, accepted versions of every improvement or patch Red Hat makes or applies to RHEL are distributed to everyone via CentOS Stream. Interim updates may go to customers only (usually, backported patches instead of an upstream with a new version which may incorporate slightly different code), but the actual code is shared with the world.

That situation _still_ may not make everyone happy, but it'd at least be nice if people were mad about the actual thing, rather than something very different.

If Red Hat makes improvements to the scheduler, via the mainline scheduler or BPF, they'll be shared back in a way that ultimately benefits everyone, and which emphasizes upstream collaboration -- _even if_ the BPF approach would technically allow us to not do so. (If this policy changes in the future, I'll go on record as saying that I'll be one of the people angry about it.)

This is probably futile, but...

Posted May 10, 2024 11:11 UTC (Fri) by hkario (subscriber, #94864) [Link] (3 responses)

Red Hat has a long-standing policy of first merging the patches upstream, only then shipping them in RHEL: https://www.redhat.com/en/blog/what-open-source-upstream we call it "upstream first"

(there are exceptions, of course, like security fixes, or unresponsive upstreams, etc. but the policy is still very much to get every patch shipped merged upstream)

disclaimer: I work at Red Hat

This is probably futile, but...

Posted May 10, 2024 11:24 UTC (Fri) by sdalley (subscriber, #18550) [Link] (2 responses)

I like these kinds of disclaimers. They show that you are very likely to know what you're talking about, and can be treated with rather greater confidence than those who smear from the outside.

This is probably futile, but...

Posted May 11, 2024 16:51 UTC (Sat) by mattdm (subscriber, #18) [Link] (1 responses)

For the record, I too work at Red Hat. So, same disclaimer.

This is probably futile, but...

Posted May 13, 2024 9:03 UTC (Mon) by sdalley (subscriber, #18550) [Link]

Thank you for speaking up!

Another push for sched_ext

Posted May 10, 2024 1:26 UTC (Fri) by hmanning77 (subscriber, #160992) [Link] (7 responses)

According to the BPF Licensing page of the kernel documentation, some BPF program types (currently Linux Security Models and TCP Congestion Control) are restricted to loading programs licensed under the GPL. If the concern is that vendors will develop custom schedulers without contributing improvements back upstream, could the GPL requirement not be extended to CPU scheduler BPF programs?

Another push for sched_ext

Posted May 10, 2024 2:14 UTC (Fri) by Manifault (guest, #155796) [Link] (6 responses)

That requirement does apply to all BPF sched_ext schedulers. The verifier will fail to load any BPF sched_ext scheduler program that is not licensed with GPL v2.

Another push for sched_ext

Posted May 10, 2024 14:08 UTC (Fri) by Wol (subscriber, #4433) [Link] (5 responses)

Isn't that in itself a breach of the GPL? Any version?

From v3, section 2

All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.

So if the verifier refuses to load a BPF program at runtime, ANY BPF program, isn't it in breach of the GPL?

Cheers,
Wol

Another push for sched_ext

Posted May 10, 2024 14:42 UTC (Fri) by Manifault (guest, #155796) [Link] (4 responses)

I'm not a lawyer, but that seems like a... rather naive interpretation of the text. The GPL does not guarantee your right to always have your program run _successfully_ in any environment. It's guaranteeing your right to _use_ your program as much as you want without being sued by some entity claiming that it's their intellectual property. The verifier rejecting an unsafe (or improperly licensed) BPF program is no different than if the main kernel returns an error code for some system call that has invalid inputs. If the verifier considers a BPF program unsafe or unsuitable to load it, it's under no legal obligation to load it. If your GPLv2 program is broken, you can still use it, but the operating system is (obviously) under no obligation to run it for you.

Also, my comment said that the verifier will reject sched_ext programs that are _not_ licensed with GPLv2. If it rejects programs with a different license, they wouldn't get any protections from GPLv2 regardless.

Another push for sched_ext

Posted May 10, 2024 21:10 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

I think the real questions are:

1. Is it legal to make a modified version of the BPF verifier which does not reject non-GPL-licensed programs? It might violate 17 USC 1201 (and similar laws in other countries). Version 3 of the GPL explicitly repudiates that provision of copyright law, but the kernel is licensed under GPLv2. OTOH, the GPL is much more permissive than a "typical" (proprietary) copyright license, so I don't know what courts would say in that situation. Certainly the spirit of the license is that you can change whatever you want, so long as you distribute those changes (if at all) under the GPL. I also believe this is the intent of most if not all of the people who have actually contributed source code to the kernel.
2. Is it legal to make a BPF sched_ext program that is not GPL? In most countries, this will turn on some kind of substantial similarity analysis. In the US, some courts have adopted a software-specific test called the "abstraction-filtration-comparison test," which complicates things, but the short version is that there must be portions of the allegedly infringing work (the BPF program) which are both eligible for copyright protection and also present in the original work (the kernel), and those portions must be substantial enough to infer that some kind of "copying" happened. This is not necessarily limited to literal copying of source code, but it can't be something as abstract as an entire algorithm, either (nobody owns quicksort, for example). The AFC test is basically just a more methodical and specific way of doing that analysis for software. Probably other countries won't do the exact same analysis, but it is hard to imagine them doing something wildly different.

Another push for sched_ext

Posted May 11, 2024 1:07 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (1 responses)

> Is it legal to make a modified version of the BPF verifier which does not reject non-GPL-licensed programs?

This feels more like a DMCA kind of thing to me.

Another push for sched_ext

Posted May 11, 2024 8:39 UTC (Sat) by Wol (subscriber, #4433) [Link]

There's another point here too.

Are BPF programs distributed as source? I thought the kernel ran them through a jit compiler?

Because we have the problem there that the GPL is not a particularly suitable licence for stuff distributed as source, about the only Freedom you're missing is the ability to distribute - the mere fact it's source gives you everything else.

Oh - and by the way, does it REALLY refuse to run anything that's not GPL2? What about BSD? MIT? MPL2? PD etc?

Cheers,
Wol

Another push for sched_ext

Posted May 11, 2024 6:54 UTC (Sat) by mb (subscriber, #50428) [Link]

>Is it legal to make a modified version of the BPF verifier which does not reject non-GPL-licensed programs?

Yes, of course. That does not change the licensing situation at all.

You can:
- Remove this check from the verifier.
- Use the modified verifier.
- Distribute the resulting verifier code.
- Write a non-GPL BPF program and use it locally without distributing.

It may be a license violation (courts have to decide), if:
- You distribute your non-GPL BPF program.

It might also be totally fine to distribute the non-GPL BPF program along with the modified verifier, if a court rules that the BPF program is not a derived work.

Another push for sched_ext

Posted May 11, 2024 23:58 UTC (Sat) by marcH (subscriber, #57642) [Link]

> Witness the metric ton of toy schedulers written for it, that's all effort not put into improving the existing code.

I would expect most of the work there to be not so much about "code" but much more about experimenting with and benchmarking a very wide range of workloads. It sounds like the real question here is not really about "improving existing code" but rather "Can a unique scheduler do everything well enough?" If feels like a very difficult question but it's probably even harder to answer with less room to experiment...

> While shipping out-of-tree code is often discouraged, it can also serve to demonstrate interest in a feature and flush out any early problems that result from its inclusion. If things go well, this practice can strengthen the argument for merging the code into the mainline, albeit with the ever-present possibility of changes that create pain for the early adopters.

Absolutely: trial and error. Sometimes that requires actually shipping it to gather measurements and experience on a very large scale.

I heard some guys even send rockets in space nowadays with the expectation that they will fail.

Another push for sched_ext

Posted May 12, 2024 15:15 UTC (Sun) by marcH (subscriber, #57642) [Link] (10 responses)

> There is not a single doubt in my mind that if I were to merge this, there will be Enterprise software out there that will mandate its own BPF sched thing, or else it won't work.

And? What's new?

I bet the vast majority of CPU cycles "scheduled" by Linux already involve large amounts of closed-source software. Even when most web browsers (the OS on top of the OS) are now "open-core", (1) they still include a fair amount of closed-source code (2) The javascript they run is complex and minified (3) WASM is probably closed-source most of the time.

In other words, it's never been possible to reproduce and test "Enterprise" or any other real workload out of the box. If you want help from the maintainer and community, then you've always had to simplify, open and share your workload first. If no one can understand what you do then you're on your own, good bye. That's always been the deal. Simple.

Same logic with source code. Products almost always ship with custom stable branches with various backports and out of tree code. Even Linux distributions have always done this. So to engage from the community and get "free" support, you always had to switch to the latest commit on the main branch and to the maintainer's .config first (minor exaggeration to get the point across).

So I really don't see why a different logic would suddenly apply to custom BPF schedulers. If it's private then you're on your own as usual. Same thing if you share your BPF scheduler but maintainers think it's cr*p, as in "doctor, it hurts when I do this..." The answer to that question has never changed.

BTW what are the open-source test suites and workloads available for the scheduler? I'm surprised none was mentioned, it seems like a key element in that discussion.

Another push for sched_ext

Posted May 12, 2024 20:17 UTC (Sun) by Wol (subscriber, #4433) [Link] (9 responses)

> maintainers think it's cr*p, as in "doctor, it hurts when I do this..." The answer to that question has never changed.

The problem comes when the patient says "doctor it hurts when I do ..." something that's necessary for normal life, like go for a walk. That regularly lands my grand-daughter in considerable pain. It could easily land my daughter temporarily paralysed on the floor.

And this is the problem. I get the impression there are quite a few important people in the Linux community who's attitude seems to be "the computer is there to run the OS. Who cares about the users" ... this discussion seems to have added another one to the list ...

"BOFH, my computer crashes every time I run our production software" - "Well, don't run your production software, then ..."

Cheers,
Wol

Another push for sched_ext

Posted May 12, 2024 22:51 UTC (Sun) by marcH (subscriber, #57642) [Link] (8 responses)

More like:

- ... crashes when I run my production software.
- Interesting! What is that software, what does it do?
- Can't tell you.
- .....

Or like:
- Here is the list of all the stupid things the documentation told me not to do and that I'm doing anyway with my custom BPF scheduler...
- Good bye!

Another push for sched_ext

Posted May 13, 2024 9:43 UTC (Mon) by Wol (subscriber, #4433) [Link] (7 responses)

> Or like:

"We're running this commercial software on stock <distro of choice>".

Like my daughter / grand-daughter going for a walk.

What then?

Cheers,
Wol

Another push for sched_ext

Posted May 13, 2024 14:02 UTC (Mon) by pizza (subscriber, #46) [Link] (6 responses)

> What then?

"Contact <commercial software vendor> for support."

(And, more often than not, the <vendor> will say "you're not using the supported OS/hardware we specified, goodbye.")

Another push for sched_ext

Posted May 13, 2024 15:15 UTC (Mon) by Wol (subscriber, #4433) [Link] (5 responses)

Are you sure?

I'm running stock Excel on stock Windows. Response times are absolute shite - to the point that we are in danger of it taking longer to run than the time available.

(Crap choice of software, I know. Not my choice.)

The most likely response from the vendors I know of is "ask the community".

But the point is, you're all coming up with POSSIBLE excuses for the vendor. What happens when the vendor's recommended, supported environment is "not fit for the purpose intended" to quote UK legalese? ESPECIALLY if said environment contains obvious flaws (not necessarily the vendor's fault) that would help you massively if they were fixed.

That was the complaint with ext4 years ago. That's the complaint with ext_sched now. That's the complaint with Rust. There are people who are actively obstructing attempts to improve Linux, because all they can see is their personal downside.

Are they Luddites? I don't know, I sincerely hope not. After all, the true Luddites could see the benefits of the technology they destroyed - that's why they destroyed it, because they could see the benefits would go to others, not them.

Cheers,
Wol

Another push for sched_ext

Posted May 13, 2024 16:49 UTC (Mon) by pizza (subscriber, #46) [Link] (4 responses)

> The most likely response from the vendors I know of is "ask the community".

The fact that the vendors you know are universally crappy doesn't change whose obligation it is to provide support.

Again, $user has a *commercial* relationship with $vendor. $user has no such relationship with "the community", which made $user zero promises, received zero compensation, and thus has precisely zero legal or moral obligation to give $user the time of day, much less provide support for a product they had nothing to do with.

Another push for sched_ext

Posted May 13, 2024 17:08 UTC (Mon) by mb (subscriber, #50428) [Link] (3 responses)

Yes. All correct.
Yet, the users will spam the community mailing lists, because that's the next logical step for them, if the vendor refuses support.

Why should we merge a feature that doesn't benefit the community?
If sched_ext benefits the community, I'm all for it. If not, why can't the exceptional use cases just ship a patched kernel?

Another push for sched_ext

Posted May 13, 2024 17:30 UTC (Mon) by pizza (subscriber, #46) [Link]

> Yet, the users will spam the community mailing lists, because that's the next logical step for them, if the vendor refuses support.

Yep. And that's what $vendors are counting on as it lets them save money on knowledgeable support staff.

(Mind you, user support request to vendors are not necessarily reasonable. After all, you wouldn't expect a company that makes hammers to "support" a customer complaining that the structure they are building keeps falling apart)

> Why should we merge a feature that doesn't benefit the community?

Every feature benefits someone. Unfortunately every feature also brings along costs that rarely fall onto the same someones that are reaping the benefits.

Those costs can be short term ("performance regression under every other workload") or longer term (technical debt, combinatorial complexity, security vulnerabilities with cutsey names, etc)

Another push for sched_ext

Posted May 13, 2024 22:02 UTC (Mon) by marcH (subscriber, #57642) [Link] (1 responses)

> Yet, the users will spam the community mailing lists, because that's the next logical step for them, if the vendor refuses support.

And? What's new?

> If sched_ext benefits the community, I'm all for it. If not, why can't the exceptional use cases just ship a patched kernel?

Agreed: if sched_ext turns out to be used ONLY by closed-source vendors then it shouldn't be merged. But that seems rather unlikely, doesn't it?

Another push for sched_ext

Posted May 25, 2024 0:16 UTC (Sat) by mrugiero (guest, #153040) [Link]

Maybe the same policy as with DRM drivers should be applied here. Give me some useful BPF programs (not just toy schedulers) that I may be interested in running, and then I may merge the code needed to run them. That's the safe bet, if you require that you already vanished the possibility of those magical cool schedulers never coming to the wider public.

Another push for sched_ext

Posted May 13, 2024 16:17 UTC (Mon) by riking (subscriber, #95706) [Link]

I've seen a few people express the opinion elsewhere that the success of the EEVDF scheduler has massively reduced the impetus for sched_ext -- that is, they wanted sched_ext in order to implement basically EEVDF.

Another push for sched_ext

Posted May 16, 2024 9:14 UTC (Thu) by rwmj (subscriber, #5474) [Link] (2 responses)

I have a couple of technical questions which I don't think were answered in the article or comments so far ...

(1) Is BPF actually fast enough here? I mean, surely this code is called every time you do schedule(), which would be very frequent, so you'd want that to be as fast as it can be. BPF is JITted but does it compete with AOT-compiled C code?

(2) Related to (1), why wouldn't a pluggable system of regular kernel modules work for this use case? From a quick scan of kernel/sched in the sources it seems like the current schedulers are not modules.

Another push for sched_ext

Posted May 16, 2024 13:27 UTC (Thu) by corbet (editor, #1) [Link] (1 responses)

I'm not the expert in this area, but can try to answer the questions...

As I understand it, BPF is indeed fast enough to use in this way. It's all native code by the time it runs.

Using regular modules would forego many of the safety features of BPF, making it much easier to hose the system. That would be undesirable in a system meant to encourage experimentation.

Another push for sched_ext

Posted May 17, 2024 14:57 UTC (Fri) by daroc (editor, #160859) [Link]

I'll also note that many of the sched_ext folks I've spoken with have been hopeful that people will use sched_ext for experimentation, and then contribute those algorithms as actual schedulers later, avoiding any overhead left from the JIT. I attended a talk at LFSMM (article forthcoming) where the speaker mentioned that BPF now has almost all the features needed to implement EEVDF — the main remaining item being the ability to safely call kfuncs while holding a lock — at which point they hope to see people use sched_ext to experiment directly on EEVDF.