|
|
Log in / Subscribe / Register

BFS vs. mainline scheduler benchmarks and measurements

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 15:07 UTC (Mon) by paragw (guest, #45306)
Parent article: BFS vs. mainline scheduler benchmarks and measurements

So what happened to pluggable schedulers? I recall there was plugsched by Peter Williams up until 2.6.22 but not sure where it went from there.

It sounds like one scheduler fits all approach may not be the right one - or am I mistaken and CFS is doing well for all desktop, server and in-between workloads? If it is then it makes pluggable schedulers less attractive.

However it still would be good to be able to do sched=server, sched=desktop, sched=netbook (lol!) type things. I think the scheduler code will also be definitely simplified if it is given a definite objective as opposed to the dance it has to do right now making sure everyone is happy. We could even do sillier things on the desktop by feeding the desktop scheduler a list of processes and its descendants to award more interactivity to - no matter what happens in the background, it can put the memory hogs and CPU hogs to rest and allow me to click on the windows etc.

/me goes digging plugsched on google.


to post comments

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 18:39 UTC (Mon) by niner (guest, #26151) [Link] (20 responses)

"We could even do sillier things on the desktop by feeding the desktop scheduler a list of
processes and its descendants to award more interactivity to - no matter what happens
in the background"

But you can do that already! In fact, you ought to have been able to do that for
decades. What you want is just the simple nice and renice commands. Works for any
list of processes you want and their descendants. No need to hardcode names into a
scheduler.

I keep wondering why people seem to have completely forgotten about nice values and
instead expect the scheduler to guess what are the important processes for them, when
they can simply tell it.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 20:37 UTC (Mon) by roskegg (subscriber, #105) [Link] (9 responses)

Because nice and renice don't affect interactivity issues as much as you would think they would.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:35 UTC (Mon) by mingo (subscriber, #31122) [Link] (8 responses)

Because nice and renice don't affect interactivity issues as much as you would think they would.

What do you mean?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 5:12 UTC (Tue) by realnc (guest, #60393) [Link] (7 responses)

Meaning that even if you nice 19 every other process, mplayer will still battle for CPU time with the compositor and drop frames and skip sound as soon a composite effect kicks in.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 6:10 UTC (Tue) by mingo (subscriber, #31122) [Link] (6 responses)

Does it behave in an anomalous way for you? What would you expect it to do and what does it do for you currently?

I.e. the default behavior is that if both compiz and mplayer are running (and both are a single thread) then they should get 50%/50% of a single CPU - or be nicely on separate CPUs on dual-core. (with an added twist that Xorg generally tends to get some amount of CPU time as well when compiz is active - plus whatever other app that is generating X output.)

If that's not enough then come nice levels into play.

You can indeed renice up - but you can also renice down - so you can set mplayer to nice -5 for example.

Nice levels work according to a very simple rule: if you set mplayer to nice -1, it will get 55% of CPU time, compiz gets 45% of time. Yet another nice level and it's 60% versus 40%. It goes roughly 20% up with every nice level - so nice -5 should get you 75%/25%, nice -10 gives you 90% CPU time and 10% CPU time for compiz.

More tasks can modify this behavior - but this is the general principle. If this does not work like that for you, please report it as a scheduler bug on lkml.

Note, you can set negative nice levels as an ordinary user as well, There's an rlimit for it (and PAM support): see the 'nice' attribute in /etc/security/limits.conf - you can set it per user.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 7:31 UTC (Tue) by epa (subscriber, #39769) [Link] (4 responses)

The thing is, nice levels mostly affect total throughput, but what needs improvement is latency. A 50-50 split between two tasks sounds ideal, but that only makes sense if they are both CPU-bound tasks. In the case of compiz and mplayer, the first spends most of its time blocking on user input, and the second doesn't need much CPU time (probably a lot less than 50% on a modern system) but it does need to respond quickly and not be blocked for too long. 'nice' doesn't really address these issues.

(Also I think that 'nice' won't help you if one process starts thrashing the memory and swapping; another process, even if nominally at a lower niceness level, will be heavily slowed down.)

When nice lets you specify desired maximum latencies, as well as just throughput, then it will be a suitable way to get good desktop performance.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 9:44 UTC (Tue) by mingo (subscriber, #31122) [Link] (3 responses)

The thing is, nice levels mostly affect total throughput, but what needs improvement is latency.

That's exactly what the upstream scheduler does. The upstream scheduler considers latency goals in a nice-level normalized way. See the wakeup_gran() function in kernel/sched_fair.c:


  static unsigned long
  wakeup_gran(struct sched_entity *curr, struct sched_entity *se)
  {
          unsigned long gran = sysctl_sched_wakeup_granularity;
          [...]
                 if (unlikely(se->load.weight != NICE_0_LOAD))
                          gran = calc_delta_fair(gran, se);
          [...]
  }

See the calc_delta_fair() - that is the nice level normalizer. Plus-reniced tasks will get longer latencies - minus-reniced tasks will get shorter wakeup latencies.

If this does not work for you then that's a bug, please report it in that case.

Note that you can tune the basic kernel latency goals/deadlines via two dynamic sysctls: sched_wakeup_granularity_ns and sched_latency_ns. Lower those and you'll get a snappier desktop - at the expense of some throughput.

You can set these in /etc/sysctl.conf to make the settings permanent. (and please report it to us if a new setting improves some workload in a dramatic way - we constantly re-tune the upstream default as well, to make for a snappier desktop.)

(Note that for forced preemption (CPU bound tasks) HZ is a lower limit - but otherwise it's tunable in a finegrained way. So say you want to change from HZ=250 to HZ=1000 if you want to set the latency targets down to 1 millisecond.)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 10:22 UTC (Tue) by epa (subscriber, #39769) [Link] (2 responses)

Thanks for the info. I was still thinking of classic UNIX nice values. It would be even better if you could specify some units for the latency - Linux is not a hard real-time system but nonetheless users might want to say 'maximum latency 10ms for this process' as a best-effort goal and something to benchmark against. Do any distributions come with an appropriate set of nice values built in?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 11:58 UTC (Tue) by mingo (subscriber, #31122) [Link] (1 responses)

One difference is that nice levels are relative - that way "nice +5" makes relative sense from within a nice +10 workload. Latency values tend to be absolute. Relative makes more conceptual sense IMO - as workloads are fundamentally hierarchical and a sub-workload of some larger workload might not be aware of the larger entity it is running in.

Also, a practical complication is that there's not much of a culture of setting latencies and it would take years to build them into apps and to build awareness.

Also, latencies are hardware dependent and change with time. 100 msecs on an old box is very different from 100 msecs on a newer box.

Maybe for media apps it would make sense to specify some sort of deadline (a video app if it wants to display at fixed frequency, or an audio app if it knows its precise buffering hard limit) - but in practice these apps tend to not even know their precise latency target. For example the audio pathway could be buffered in the desktop environment, in the sound server and in the kernel too.

Nor would it solve much: most of the latencies that people notice and which cause skipping/dropped-frames etc. are bugs, they are unintended and need fixing.

Nevertheless this has come up before and could be done to a certain degree. I still hope that we can just make things behave by default, out of box, without any extra tweaking needed.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 12:41 UTC (Tue) by epa (subscriber, #39769) [Link]

I agree that relative niceness levels make the most sense in a batch processing environment or in a 'lightly interactive' environment such as a Unix shell, where it should respond quickly when you type 'ls', but there is no firm deadline.

I think they make a bit less sense for multimedia applications or even ordinary desktop software (where users nowadays expect smooth scrolling and animations). You are right that in the Unix world there isn't much culture of setting quantifiable targets for latency or CPU use; we are accustomed to mushy 'niceness' values, where setting a lower niceness somehow makes it go faster, but only the most greybearded of system administrators could tell you exactly how much.

One reason to specify a latency target in milliseconds is just to have something quantifiable. A lot of discussions on LKML and elsewhere about scheduling seem to suffer from a disconnect between one side running benchmarks such as kernel compiles, which give hard numbers but aren't typical of desktop usage, and another side who just talk in qualitative terms about how much faster it 'feels'.

I expect that if a 'max latency' option were added to the kernel and it did almost nothing at all to start with, it would still provide a framework for improvements to take place - a latency of 110ms when 100ms was requested could now be a quantifiable performance regression, and people can benchmark their kernel against a promised performance target rather than just trying to assess how it feels. (You yourself have provided such a latency benchmark - the 'load enormous JPEG in Firefox' test suite :=-).

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 19:12 UTC (Tue) by realnc (guest, #60393) [Link]

Does it behave in an anomalous way for you? What would you expect it to do and what does it do for you currently?

It does behave "anomalous." A simple example would be mplayer (or any other video player) or an OpenGL app "hanging" for a bit while I leave my mouse over the clock in the systray. This brings up details about the current time (what day it is, month, etc) in a "bells and whistles" pop-up that just doesn't pop-up out of the blue but slowly fades-in using transparency. It is for the duration of this compositing effect (which actually doesn't even need that much CPU power) that mplayer stalls, barks and drops frames.

Now image how bad things can seem with virtually a crapload of actions (opening menus, switching desktops, moving windows, etc, etc) result in frame skipping, sound stuttering, mouse pointer freezing, etc. They perform well, that's not the problem. The problem is that due to the skips and lag, they *seem* to be sluggish. Not in a dramatic way, but still annoying. I was actually quite used to Linux behaving like that. But after applying the BFS patch, Linux joined the list of "smooth GUI" OSes (alongside OS X and MS Vista/7). That's how a desktop should feel like. Frankly, I never quite suspected the kernel to be at fault here, but rather the applications themselves. But after seeing BFS solving all those problems, it seems the kernel can be at fault for such things.

The Android folks also confirmed that their devices ran much more fluid and responsive after they loaded a custom firmware on them with a BFS-patched kernel. Folding users claim increased folding performance which doesn't interfere with their GUI anymore. This can't be coincidence.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 20:55 UTC (Mon) by paragw (guest, #45306) [Link] (9 responses)

"I keep wondering why people seem to have completely forgotten about nice values and instead expect the scheduler to guess what are the important processes for them, when they can simply tell it."

Did that ever work satisfactorily in practice though? If it did why are people still cranking out different scheduler for desktops?

Thing is usability wise we have come further on a Linux desktop and I guess people are starting to expect the OS to do the right thing without them having to do work and make decisions. (About Xorg renice - what about its clients - every time I start a program, should I renice it if it is a Xorg client? If we instead had the desktop scheduler boost interactivity for all Xorg client programs - that makes it very easy for the user.)

And I was saying we can afford to do such silly things in the Desktop scheduler if the sole objective of the desktop scheduler was interactivity. If one scheduler was to do interactivity and throughput and what not - it quickly becomes complex and thus ineffective. If we had a pluggable scheduler for one thing we could simplify a lot of code and for another we can let people choose what fits their needs best.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 3:55 UTC (Tue) by fest3er (guest, #60379) [Link]

When I was going to do something that might get away from me (fork lots of
processes doing something), I would try to remember to 'nice --20 sh' in
another window. Because I would often have the boundary condition wrong
and would generate 20 000 to 40 000 processes running BTTW. That one
nice'd shell would save me almost every time. I've done this on my AT&T
UNIXPC, and systems running SysV/68, SysV/88, Irix, SunOS[345], Linux,
BeOS, BSD, and others.

There have been times in the past when nice'ing the X server improved
performance on my single-proc PIII-866; for 5-10 years now, only two or
more CPUs let X run smoothly.

There have been times in the past when nothing would smooth out the
choppiness of the EXT2/3 driver under heavy R/W load, whether I had two
PII-266's or a PIII-866. I solved that problem by switching to ReiserFS.

In recent years and on two completely different systems, I've noticed a
tendency for the kernel to do weird things with the PS/2 drivers (system
slows down, gets choppy, and even silently resets). This last time, I
pulled the plugs for the PS/2 ports and the system returned to normal.
(The chipset fan was overworking itself, so I had *some* clue where to
look.)

There can be many reasons why a system is 'choppy', and it's not always
the scheduler. Sometimes it's the interrupt handler dealing with some
device that's gone haywire. Sometimes it's the block layer not doing disk
I/O very nicely or a server process being very inefficient. Sometimes it's
an application that's gone braindead. And if a scheduler can be developed
that smooths out the choppiness in single- and dual-core systems, great!
Go for it! An older single-CPU system may never be fast, but it ought to
run smoothly under normal user operations.

The scheduler has gotten better over the past 15 years. And it will
continue to improve. But apps have to improve as well and not always
assume the 'system' will take care of everything.

As Ingo says, 8-core systems aren't mainline. But they will be. Perhaps
Con is looking to improve today's mainline systems, not tomorrow's. Is
this apples v. oranges? Or is it ain't? Mayhap never the twin shall meet.
But all parties involved should strive to keep the discourse civil and
positive.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 6:39 UTC (Tue) by mingo (subscriber, #31122) [Link] (7 responses)

Did that ever work satisfactorily in practice though?

Yes. (See my other post about nice levels in this discussion.) If it does not it's a bug and needs to be reported to lkml.

There's also the /proc/sys/kernel/sched_latency_ns control in the upstream scheduler - that is global and if you set that to a very low value like 1 msec:

    echo 1000000 > /proc/sys/kernel/sched_latency_ns
you'll get very fine-grained scheduling. This tunable has been upstream for 7-8 kernel releases already.

If it did why are people still cranking out different scheduler for desktops?

Primarily because it's fun to do. Also, in no small part because it's much easier to do than to fix an existing scheduler (with all its millions of current users and workloads) :-)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 12:30 UTC (Tue) by i3839 (guest, #31386) [Link] (6 responses)

Weird, I don't see /proc/sys/kernel/sched_latency_ns. After reading
the code it's clear it depends on CONFIG_SCHED_DEBUG, any reason for
that? It has nothing to do with debugging and the code saved is minimal.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 12:37 UTC (Tue) by mingo (subscriber, #31122) [Link] (5 responses)

Please send a patch, i think we could make it generally available - and also the other granularity options i think. CONFIG_SCHED_DEBUG default to y and most distros enable it. (alongside CONFIG_LATENCYTOP)

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 8:42 UTC (Wed) by realnc (guest, #60393) [Link] (1 responses)

I've tried those tweaks. They don't really help much.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 9:53 UTC (Thu) by mingo (subscriber, #31122) [Link]

Thanks for testing it. It would be helpful (to keep reply latency low ;-) to move this to email and Cc: lkml.

You can test the latest upstream scheduler development tree via:

http://people.redhat.com/mingo/tip.git/README

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 9, 2009 11:50 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

I thought CONFIG_LATENCYTOP had horrible effects on the task_struct size and people were being encouraged to *disable* it as a result?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 9:56 UTC (Thu) by mingo (subscriber, #31122) [Link]

It shouldnt have too big cost unless you are really RAM constrained. (read running: a 32 MB system or so) So it's a nice tool if you want to see a general categorization of latency sources in your system.

latencytop is certainly useful enough so that several distributions enable it by default. It has size impact on task struct but otherwise the runtime cost should be near zero.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 10, 2009 19:35 UTC (Thu) by i3839 (guest, #31386) [Link]

I'll try to send a patch against tip later this week, not feeling too well at the moment.

pluggable schedulers vs. tunable schedulers

Posted Sep 8, 2009 9:51 UTC (Tue) by mingo (subscriber, #31122) [Link] (15 responses)

So what happened to pluggable schedulers?

In fact, wouldn't it be even cooler technically to have a scheduler that you could tune either for low-latency desktop workloads or for server-oriented throughput workloads? And this could all be done runtime, without rebooting the kernel.

Some easy runtime tunable parameter in /proc/sys/kernel/ that sets the expected preemption deadline of tasks. So on a server you could tune it to 100 msecs, on a desktop could tune it to 5 msecs - all with the same scheduler.

No reboots needed, only a single scheduler needs to be maintained, only a single scheduler needs bugfixes - and improvements to both workloads will flow into the same scheduler codebase so server improvements will indirectly improve the desktop scheduler and vice versa.

Sounds like a nice idea, doesn't it?

pluggable schedulers vs. tunable schedulers

Posted Sep 8, 2009 13:59 UTC (Tue) by paragw (guest, #45306) [Link] (14 responses)

No reboots needed, only a single scheduler needs to be maintained, only a single scheduler needs bugfixes - and improvements to both workloads will flow into the same scheduler codebase so server improvements will indirectly improve the desktop scheduler and vice versa. Sounds like a nice idea, doesn't it? Well no, I don't think so. My line of thinking was that making one scheduler balance the arbitrary needs of multiple workloads leads to complexity and suboptimal behavior. If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing. And if you ask me I will take 5 different code modules that each do one simple thing rather than taking 1 code module that tries to achieve 5 different things at once. After all, if we can have multiple IO schedulers why cannot we have multiple selectable CPU schedulers? Are there technical limitations or complexity issues that make us not want to go to pluggable schedulers?

pluggable schedulers vs. tunable schedulers

Posted Sep 8, 2009 14:01 UTC (Tue) by paragw (guest, #45306) [Link] (13 responses)

[ Gaah - Here is a better looking copy of above commment ]

No reboots needed, only a single scheduler needs to be maintained, only a single scheduler needs bugfixes - and improvements to both workloads will flow into the same scheduler codebase so server improvements will indirectly improve the desktop scheduler and vice versa. Sounds like a nice idea, doesn't it?

Well no, I don't think so. My line of thinking was that making one scheduler balance the arbitrary needs of multiple workloads leads to complexity and suboptimal behavior.

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

And if you ask me I will take 5 different code modules that each do one simple thing rather than taking 1 code module that tries to achieve 5 different things at once.

After all, if we can have multiple IO schedulers why cannot we have multiple selectable CPU schedulers? Are there technical limitations or complexity issues that make us not want to go to pluggable schedulers?

pluggable schedulers vs. tunable schedulers

Posted Sep 9, 2009 16:31 UTC (Wed) by martinfick (subscriber, #4455) [Link] (12 responses)

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

How does moving your tunable to boot time make it less of a tunable?

pluggable schedulers vs. tunable schedulers

Posted Sep 9, 2009 23:10 UTC (Wed) by paragw (guest, #45306) [Link] (11 responses)

How does moving your tunable to boot time make it less of a tunable?

Where did I say move the tunable to boot time? I said the particular modular scheduler can make reasonable assumptions that are best for the objective it is trying to meet - low latency for Xorg and its clients for example at the expense of something else (throughput) on the desktop systems.

pluggable schedulers vs. tunable schedulers

Posted Sep 10, 2009 9:50 UTC (Thu) by mingo (subscriber, #31122) [Link] (10 responses)

Note that what you propose is not what has been proposed on lkml under 'pluggable schedulers' before - that effort (PlugSched) was a build time / boot time scheduler selection approach.

Your model raises a whole category of new problems. For example under what model would you mix these pluggable schedulers on the same CPU? Add a scheduler of schedulers? Or can a CPU have only one pluggable scheduler defined at a time?

Also, how is this different from having per workload parameters in a single scheduler? (other than being inherently more complex to implement)

pluggable schedulers vs. tunable schedulers

Posted Sep 10, 2009 11:57 UTC (Thu) by paragw (guest, #45306) [Link]

[ Warning - long winded thoughtlets follow ]

About the plugsched - since it was a boot time selectable it could do what I was proposing just not at runtime (which is no big deal really). And I wasn't suggesting mixing schedulers per CPU. My thought was to have one CPU scheduler exactly as we have it today - either selectable at boot time or based on how much complex it would be to implement, at runtime.

If we talk about CFS as it is in mainline - I think its objective of being completely fair is a noble one on paper but does not work well on desktops with workloads that demand interactivity bias in favor of only a certain set of apps. Like many people have reported CFS causes movie skips and does worse than BFS for interactivity. I am not saying the problems with CFS are 100% due to it being completely fair by design but it is not hard to imagine it will try to be fair to all tasks and that in itself will not be enough for mplayer to keep running the movie without skips if there are enough processes and not enough CPUs. If it favored running mplayer it would not be completely fair unless we also started renicing the processes - which if you think of it, is fundamentally broken from usability standpoint unless it was made fully automatic which in turn is impossible without user involvement. (Desktop user is simply not going to renice every desktop process he works on and then one has to select what gets more interactivity bonus apart from Xorg - now the browser, later the mail client, etc. you get the idea. I explain more problems with nice a little further down.)

Now if we think about the CPU(s) as a finite resource - if people start running more tasks than there are CPUs it becomes clear that a bunch of tasks have to be scheduled less frequently and given less time slice than a bunch of other tasks if we are to maintain interactivity. (In Windows for example - one can set a scheduler switch that either favors foreground tasks (desktop workload) or background (server) tasks.)

So if we were to do something like build a scheduler with only goal of latency for interactive processes - we then would not have to worry about throughput in that scheduler. I.e. no conflicting goals, so less complexity and better results. Then one can think of a per process flag which Xorg and its clients can set that tells the desktop scheduler when the process window is foreground and interactive (when it is the topmost window or when a window needs user input) and the scheduler will ensure that it meets its goal of giving that process enough CPU resources to keep it running smoothly. This would solve the ugly problem of making the scheduler guess which process is interactive/needs user input or needs to be given interactivity boost so that the desktop feels responsive for the user. In my opinion making a scheduler with conflicting goals also making it guess processes to give interactivity boost simply does not work as the scheduler doesn't have enough data to know for sure what process needs the most interactivity at any given point of time - at least it is not straight forward to make that guess reliably every time, without any hint from the applications themselves.

Similarly for servers we could simplify CFS to make sure it remains completely fair and goes after throughput and latency comes second.

The benefit of having two schedulers is that of course users can choose one that does what they need - interactivity or fairness. So if someone complains my desktop is jerky when I run make -j128 kernel build, we can tell them to use the desktop scheduler and stop worrying about kernel build times if they are also going to play a movie at the same time. And for people needing fairness they can go with CFS and we can tell them to stop complaining about desktop jerkiness when running kernel builds as long as it is not anomalously jerky -i.e. not completely fair per goal.

We then also keep complexity in each scheduler to minimum without penalizing server workloads with interactivity logic and desktop workloads with fairness logic.

In short the point I am trying to make is that doing all things in one scheduler as we do it today, without any notion of what process needs user interaction or what process needs to be boosted in order to make the user feel the desktop is more interactive - it is never going to be a 100% success for all parties. (Correct me if I am wrong but I don't think we have any separate treatment for multimedia applications - they are just another process from the scheduler's PoV and it fails when there are also other 128 runnable processes that need to run on vastly less than 128 CPUs). Which means that the scheduler needs to be biased to the apps user cares most about - and nice does not work as long as it is a static, one time, user controlled thing. I don't want my browser to be nice -10 all the times - if it is minimized and not being used I want it to be nice +5 and instead have mplayer in the foreground nice'd to -5. Who decides what amount of nice in relation to other nice'd processes is sufficient so mplayer plays without skipping? We need something absolute there unlike nice - if a multimedia application is playing in the foreground - it gets all resources that it needs no matter what - that IMHO is the key to making the desktop users happy.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 7:50 UTC (Sat) by trasz (guest, #45786) [Link] (8 responses)

Just do what Solaris does - schedulers are pieces of code that calculate thread priorities. This way you can assign different schedulers to different processes.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 8:37 UTC (Sat) by mingo (subscriber, #31122) [Link] (7 responses)

That does not answer the fundamental questions though.

Who schedules the schedulers? What happens if multiple tasks are on the same CPU with different 'schedulers' attached to them? For example a Firefox process scheduled by BFS and Thunderbird scheduled by CFS. How would it behave on the same CPU for it to make sense?

Really, i wish people who are suggesting 'pluggable schedulers!!!' spent five minutes thinking through the technical issues involved. They are not trivial.

Programming the kernel isnt like LEGO where you can combine bricks physically and have a nice fire station in addition to your police car ;-)

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 8:46 UTC (Sat) by trasz (guest, #45786) [Link] (2 responses)

Let me repeat - in Solaris, schedulers are the parts of code that calculate priorities. They don't do other things - specifically, they don't switch threads. You don't have to schedule them in any way - just switch threads conforming to the priorities calculated by the schedulers.

And if you don't like this approach, you could still do what FreeBSD has been doing for several years now - implement schedulers changeable at compile time.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 9:00 UTC (Sat) by mingo (subscriber, #31122) [Link]

Let me repeat - in Solaris, schedulers are the parts of code that calculate priorities. They don't do other things - specifically, they don't switch threads. You don't have to schedule them in any way - just switch threads conforming to the priorities calculated by the schedulers.

That's not pluggable schedulers. It's one scheduler with some flexibility in calculating priorities. The mainline Linux scheduler has something like that too btw: we have 'scheduling classes' attached to each process. See include/linux/sched.h::struct sched_class.

And if you don't like this approach, you could still do what FreeBSD has been doing for several years now - implement schedulers changeable at compile time.

It's not about me 'liking' anything. My point is that i've yet to see a workable model for pluggable schedulers. (I doubt that one can exist - but i have an open mind about it and i'm willing to be surprised.)

Compile-time is not a real pluggable scheduler concept: which would be multiple schedulers acting _at once_. See the example i cited: that you can set Firefox to BFS one and Thunderbird to CFS.

Compile-time (plus boot time) schedulers is what the PlugSched patches did for years.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 12:24 UTC (Sat) by nix (subscriber, #2304) [Link]

But you still have to figure out which processes get their priorities
decided by which 'schedulers' (it is not very useful to jump into a Linux
discussion assuming that the terminology used is that of some other
kernel's development community, btw).

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 14:44 UTC (Sat) by paragw (guest, #45306) [Link] (3 responses)

I don't really understand it when you say "think through the technical issues involved [ in designing pluggable schedulers ] not being trivial" since you already mentioned PlugSched did just that prior to CFS.

It might be a terminology difference that is getting in the way - when I say "pluggable" I imply choice more than anything else. In other words it would be perfectly OK for the scheduler to be selectable only at compile and boot time and not at runtime just like PlugSched was.

We are advertising a completely fair scheduler that will do all things (ponies included ;) for everybody but no one has so far explained, HOW fundamentally, on the conceptual level, on the design level are we going to ensure that when resources get scarce (2CPU cores, 130 runnable processes - most CPU heavy jobs and one mplayer doing video and other doing audio encoding) we make sure we give enough, continuous CPU share to mplayer and the audio encoder and the whole desktop as such so it feels fluid to the user without the user having to play the nice games.

Making it even simpler, asking the same question differently - what logic in the current scheduler will hand out the most resources to mplayer, the audio encoding process and the desktop window manager (switching between windows needs to be fluid as well) when user is interacting with them? You can say the scheduler will be completely fair and give an equal chunk to every process but desktop users get pissed if that means mplayer is going to skip - not enough CPUs and lot of processes to run.

In other words - if I hand out $100 to a charity and ask them to be completely fair while distributing the amount to everyone equally and 200 people turn up for help - the charity did the fair thing and gave out 50c to everyone without considering the fact that 3 people out of the 200 badly needed at least 2$ so they could not only eat but also buy their pill and stay alive, that would be an unfair result at the end. So the charity has to have some notion of bias to the most needy and for that it needs to figure who are the most needy.

The point I am trying to make is we need to have a scheduler that is both completely fair (server workloads) and desktop friendly and these conflicting objectives can only be met by having 2 different user selectable schedulers. The desktop scheduler can get into the details of foreground and background Xorg and non-Xorg, multimedia vs. non-multimedia processes and fight hard to keep the desktop fluid without bothering about the background jobs taking longer or bothering about scaling to 1024 CPUs. The CFS scheduler can stay fair and moderately interactive and scalable as it is and server people can select it.

So again why do we not want to bring PlugSched back and have user select BFS or CFS or DS (Desktop Scheduler) (at compile or boot time)? If we do want CFS to do everything while being fair - I don't think we have explained on paper how it would ensure desktop interactivity without having a notion of what constitutes the desktop. We have to question the CFS goals/design/implementation if we are to go by the reports that after substantial development interactivity issues with CFS still remain. (Please don't say the nice word - I have explained already that it doesn't work well practically.) If it turns out that it is hard to meet conflicting goals well or if it turns out we need to add more complexity to CFS to meet those conflicting goals even in "most" workloads - it is still prudent to ask why not just have 2 different schedulers each with one, non-conflicting goal?

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 15:28 UTC (Sat) by mingo (subscriber, #31122) [Link] (1 responses)

What i believe you are missing relates to the very first question i asked: wouldnt it be better if a scheduler had nice runtime tunables that achieved the same?

Your original answer was (in part and way up in the discussion):

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

What you are missing is that 'boot time' or 'build time' schedulers (i.e. what PlugSched did in essence) are build time / boot time tunables. A complex one but still a knob as far as the user is concerned.

Furthermore they are worse tunables than nice runtime tunables. They inconvenience the user and they inconvenince the distro. Flipping to another scheduler would force a reboot. Why do that?

For example, it does not allow the example i suggested: to run Firefox under BFS while Thunderbird under another scheduler.

So build-time/boot-time pluggable schedulers have various clear usage disadvantages and there are also have various things they cannot do.

So if you want tunability then i cannot understand why you are arguing for the technically worse solution - for a build time or boot time solution - versus a nice runtime solution.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 18:48 UTC (Sat) by paragw (guest, #45306) [Link]

Surely one single tunable (I want the desktop scheduler for example in the case of PlugSched) is better (i.e. less complex) from user standpoint rather than having to figure out say 5 complex numerical things such as granularity and what not?

Or do we have one single tunable for CFS that converts it into desktop friendly? If it does have such a knob then the next and most important question is how well does it work for desktops. From the reports I think we are still some way from claiming excellent "automatic" interactivity for desktops. Note that I am excluding the nicing games and making the user do a complex dance of figuring out how to make his/her desktop interactive. I am sure you agree that does not work well.

To your point, if we have to have one tunable for the CFS scheduler to make it desktop friendly - essentially a single knob (like sched=desktop in the PlugSched case) it is easy to see how that would fail to work satisfactorily for all desktop workloads. For one thing unless the user messes with nice levels of each process that he/she opens, minimizes or closes or brings to foreground (that is out of question from usability standpoint) the scheduler has no way to distinguish the foreground process from a background one, it has no way of distinguishing mplayer from dekstop window manager from some system daemon going bad and eating CPU.

For another, the scheduler seems to have no reliable way to know what processes it needs to favor. Window manager and the process of the foreground window need to be registered with the scheduler as foreground processes, each minimized window needs to be registered with scheduler as background. Then as long as the window manager and the process owning the foreground window are not runnable everyone else gets CPU. Multimedia applications need to be registered with the scheduler as such - automatically, so that Mplayer always gets CPU when it needs it, even favoring it over the window manager and other process of another foreground window if there is only one available CPU. Until this co-ordination happens I think we will be away from achieving great desktop interactivity which works for most desktop workloads.

Then the question would be that do we want to put all this "only needed on desktop" complexity into the completely fair scheduler or do we want to keep both separate. That is sort of a secondary question - the first question is how do we get the desktop to hint the scheduler as to which processes the user is actively interacting with, which ones are the ones he/she is likely to interact with (minimized windows) and then the scheduler favoring those accordingly - that ought to solve the interactivity problems in an automatic fashion.

[ Windows has this notion of distinguishing between "Programs" (which are running desktop applications) and background services (things without desktop interaction) and in its default configuration on the desktop it favors "Programs" and on Servers it favors "Background services" (Web Server service for e.g.). And it certainly helps interactivity. It can do this because it can distinguish between what is a desktop application and which is foreground or background and what is a non-desktop, background application.]

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 18:31 UTC (Sat) by khc (guest, #45209) [Link]

I already have a compile time way to select scheduler:

patch -p1 < 2.6.31-sched-bfs-211.patch


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds