pluggable schedulers vs. tunable schedulers

Posted Sep 9, 2009 16:31 UTC (Wed) by martinfick (subscriber, #4455)
In reply to: pluggable schedulers vs. tunable schedulers by paragw
Parent article: BFS vs. mainline scheduler benchmarks and measurements

If we had a nice modular scheduler interface that allows us to load a scheduler at runtime or choose which scheduler to use at boot time or runtime that would solve the complexity problem and it will work well for the workloads it was designed for. As a bonus I will not have to make decisions on values of tunables - we can make the particular scheduler implementation make reasonable assumptions for the workload it was servicing.

How does moving your tunable to boot time make it less of a tunable?

pluggable schedulers vs. tunable schedulers

Posted Sep 9, 2009 23:10 UTC (Wed) by paragw (guest, #45306) [Link] (11 responses)

How does moving your tunable to boot time make it less of a tunable?

Where did I say move the tunable to boot time? I said the particular modular scheduler can make reasonable assumptions that are best for the objective it is trying to meet - low latency for Xorg and its clients for example at the expense of something else (throughput) on the desktop systems.

pluggable schedulers vs. tunable schedulers

Posted Sep 10, 2009 9:50 UTC (Thu) by mingo (guest, #31122) [Link] (10 responses)

Note that what you propose is not what has been proposed on lkml under 'pluggable schedulers' before - that effort (PlugSched) was a build time / boot time scheduler selection approach.

Your model raises a whole category of new problems. For example under what model would you mix these pluggable schedulers on the same CPU? Add a scheduler of schedulers? Or can a CPU have only one pluggable scheduler defined at a time?

Also, how is this different from having per workload parameters in a single scheduler? (other than being inherently more complex to implement)

pluggable schedulers vs. tunable schedulers

Posted Sep 10, 2009 11:57 UTC (Thu) by paragw (guest, #45306) [Link]

[ Warning - long winded thoughtlets follow ]

About the plugsched - since it was a boot time selectable it could do what I was proposing just not at runtime (which is no big deal really). And I wasn't suggesting mixing schedulers per CPU. My thought was to have one CPU scheduler exactly as we have it today - either selectable at boot time or based on how much complex it would be to implement, at runtime.

If we talk about CFS as it is in mainline - I think its objective of being completely fair is a noble one on paper but does not work well on desktops with workloads that demand interactivity bias in favor of only a certain set of apps. Like many people have reported CFS causes movie skips and does worse than BFS for interactivity. I am not saying the problems with CFS are 100% due to it being completely fair by design but it is not hard to imagine it will try to be fair to all tasks and that in itself will not be enough for mplayer to keep running the movie without skips if there are enough processes and not enough CPUs. If it favored running mplayer it would not be completely fair unless we also started renicing the processes - which if you think of it, is fundamentally broken from usability standpoint unless it was made fully automatic which in turn is impossible without user involvement. (Desktop user is simply not going to renice every desktop process he works on and then one has to select what gets more interactivity bonus apart from Xorg - now the browser, later the mail client, etc. you get the idea. I explain more problems with nice a little further down.)

Now if we think about the CPU(s) as a finite resource - if people start running more tasks than there are CPUs it becomes clear that a bunch of tasks have to be scheduled less frequently and given less time slice than a bunch of other tasks if we are to maintain interactivity. (In Windows for example - one can set a scheduler switch that either favors foreground tasks (desktop workload) or background (server) tasks.)

So if we were to do something like build a scheduler with only goal of latency for interactive processes - we then would not have to worry about throughput in that scheduler. I.e. no conflicting goals, so less complexity and better results. Then one can think of a per process flag which Xorg and its clients can set that tells the desktop scheduler when the process window is foreground and interactive (when it is the topmost window or when a window needs user input) and the scheduler will ensure that it meets its goal of giving that process enough CPU resources to keep it running smoothly. This would solve the ugly problem of making the scheduler guess which process is interactive/needs user input or needs to be given interactivity boost so that the desktop feels responsive for the user. In my opinion making a scheduler with conflicting goals also making it guess processes to give interactivity boost simply does not work as the scheduler doesn't have enough data to know for sure what process needs the most interactivity at any given point of time - at least it is not straight forward to make that guess reliably every time, without any hint from the applications themselves.

Similarly for servers we could simplify CFS to make sure it remains completely fair and goes after throughput and latency comes second.

The benefit of having two schedulers is that of course users can choose one that does what they need - interactivity or fairness. So if someone complains my desktop is jerky when I run make -j128 kernel build, we can tell them to use the desktop scheduler and stop worrying about kernel build times if they are also going to play a movie at the same time. And for people needing fairness they can go with CFS and we can tell them to stop complaining about desktop jerkiness when running kernel builds as long as it is not anomalously jerky -i.e. not completely fair per goal.

We then also keep complexity in each scheduler to minimum without penalizing server workloads with interactivity logic and desktop workloads with fairness logic.

In short the point I am trying to make is that doing all things in one scheduler as we do it today, without any notion of what process needs user interaction or what process needs to be boosted in order to make the user feel the desktop is more interactive - it is never going to be a 100% success for all parties. (Correct me if I am wrong but I don't think we have any separate treatment for multimedia applications - they are just another process from the scheduler's PoV and it fails when there are also other 128 runnable processes that need to run on vastly less than 128 CPUs). Which means that the scheduler needs to be biased to the apps user cares most about - and nice does not work as long as it is a static, one time, user controlled thing. I don't want my browser to be nice -10 all the times - if it is minimized and not being used I want it to be nice +5 and instead have mplayer in the foreground nice'd to -5. Who decides what amount of nice in relation to other nice'd processes is sufficient so mplayer plays without skipping? We need something absolute there unlike nice - if a multimedia application is playing in the foreground - it gets all resources that it needs no matter what - that IMHO is the key to making the desktop users happy.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 7:50 UTC (Sat) by trasz (guest, #45786) [Link] (8 responses)

Just do what Solaris does - schedulers are pieces of code that calculate thread priorities. This way you can assign different schedulers to different processes.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 8:37 UTC (Sat) by mingo (guest, #31122) [Link] (7 responses)

That does not answer the fundamental questions though.

Who schedules the schedulers? What happens if multiple tasks are on the same CPU with different 'schedulers' attached to them? For example a Firefox process scheduled by BFS and Thunderbird scheduled by CFS. How would it behave on the same CPU for it to make sense?

Really, i wish people who are suggesting 'pluggable schedulers!!!' spent five minutes thinking through the technical issues involved. They are not trivial.

Programming the kernel isnt like LEGO where you can combine bricks physically and have a nice fire station in addition to your police car ;-)

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 8:46 UTC (Sat) by trasz (guest, #45786) [Link] (2 responses)

Let me repeat - in Solaris, schedulers are the parts of code that calculate priorities. They don't do other things - specifically, they don't switch threads. You don't have to schedule them in any way - just switch threads conforming to the priorities calculated by the schedulers.

And if you don't like this approach, you could still do what FreeBSD has been doing for several years now - implement schedulers changeable at compile time.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 9:00 UTC (Sat) by mingo (guest, #31122) [Link]

That's not pluggable schedulers. It's one scheduler with some flexibility in calculating priorities. The mainline Linux scheduler has something like that too btw: we have 'scheduling classes' attached to each process. See include/linux/sched.h::struct sched_class.

And if you don't like this approach, you could still do what FreeBSD has been doing for several years now - implement schedulers changeable at compile time.

It's not about me 'liking' anything. My point is that i've yet to see a workable model for pluggable schedulers. (I doubt that one can exist - but i have an open mind about it and i'm willing to be surprised.)

Compile-time is not a real pluggable scheduler concept: which would be multiple schedulers acting _at once_. See the example i cited: that you can set Firefox to BFS one and Thunderbird to CFS.

Compile-time (plus boot time) schedulers is what the PlugSched patches did for years.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 12:24 UTC (Sat) by nix (subscriber, #2304) [Link]

But you still have to figure out which processes get their priorities
decided by which 'schedulers' (it is not very useful to jump into a Linux
discussion assuming that the terminology used is that of some other
kernel's development community, btw).

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 14:44 UTC (Sat) by paragw (guest, #45306) [Link] (3 responses)

I don't really understand it when you say "think through the technical issues involved [ in designing pluggable schedulers ] not being trivial" since you already mentioned PlugSched did just that prior to CFS.

It might be a terminology difference that is getting in the way - when I say "pluggable" I imply choice more than anything else. In other words it would be perfectly OK for the scheduler to be selectable only at compile and boot time and not at runtime just like PlugSched was.

We are advertising a completely fair scheduler that will do all things (ponies included ;) for everybody but no one has so far explained, HOW fundamentally, on the conceptual level, on the design level are we going to ensure that when resources get scarce (2CPU cores, 130 runnable processes - most CPU heavy jobs and one mplayer doing video and other doing audio encoding) we make sure we give enough, continuous CPU share to mplayer and the audio encoder and the whole desktop as such so it feels fluid to the user without the user having to play the nice games.

Making it even simpler, asking the same question differently - what logic in the current scheduler will hand out the most resources to mplayer, the audio encoding process and the desktop window manager (switching between windows needs to be fluid as well) when user is interacting with them? You can say the scheduler will be completely fair and give an equal chunk to every process but desktop users get pissed if that means mplayer is going to skip - not enough CPUs and lot of processes to run.

In other words - if I hand out $100 to a charity and ask them to be completely fair while distributing the amount to everyone equally and 200 people turn up for help - the charity did the fair thing and gave out 50c to everyone without considering the fact that 3 people out of the 200 badly needed at least 2$ so they could not only eat but also buy their pill and stay alive, that would be an unfair result at the end. So the charity has to have some notion of bias to the most needy and for that it needs to figure who are the most needy.

The point I am trying to make is we need to have a scheduler that is both completely fair (server workloads) and desktop friendly and these conflicting objectives can only be met by having 2 different user selectable schedulers. The desktop scheduler can get into the details of foreground and background Xorg and non-Xorg, multimedia vs. non-multimedia processes and fight hard to keep the desktop fluid without bothering about the background jobs taking longer or bothering about scaling to 1024 CPUs. The CFS scheduler can stay fair and moderately interactive and scalable as it is and server people can select it.

So again why do we not want to bring PlugSched back and have user select BFS or CFS or DS (Desktop Scheduler) (at compile or boot time)? If we do want CFS to do everything while being fair - I don't think we have explained on paper how it would ensure desktop interactivity without having a notion of what constitutes the desktop. We have to question the CFS goals/design/implementation if we are to go by the reports that after substantial development interactivity issues with CFS still remain. (Please don't say the nice word - I have explained already that it doesn't work well practically.) If it turns out that it is hard to meet conflicting goals well or if it turns out we need to add more complexity to CFS to meet those conflicting goals even in "most" workloads - it is still prudent to ask why not just have 2 different schedulers each with one, non-conflicting goal?

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 15:28 UTC (Sat) by mingo (guest, #31122) [Link] (1 responses)

What i believe you are missing relates to the very first question i asked: wouldnt it be better if a scheduler had nice runtime tunables that achieved the same?

Your original answer was (in part and way up in the discussion):

What you are missing is that 'boot time' or 'build time' schedulers (i.e. what PlugSched did in essence) are build time / boot time tunables. A complex one but still a knob as far as the user is concerned.

Furthermore they are worse tunables than nice runtime tunables. They inconvenience the user and they inconvenince the distro. Flipping to another scheduler would force a reboot. Why do that?

For example, it does not allow the example i suggested: to run Firefox under BFS while Thunderbird under another scheduler.

So build-time/boot-time pluggable schedulers have various clear usage disadvantages and there are also have various things they cannot do.

So if you want tunability then i cannot understand why you are arguing for the technically worse solution - for a build time or boot time solution - versus a nice runtime solution.

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 18:48 UTC (Sat) by paragw (guest, #45306) [Link]

Surely one single tunable (I want the desktop scheduler for example in the case of PlugSched) is better (i.e. less complex) from user standpoint rather than having to figure out say 5 complex numerical things such as granularity and what not?

Or do we have one single tunable for CFS that converts it into desktop friendly? If it does have such a knob then the next and most important question is how well does it work for desktops. From the reports I think we are still some way from claiming excellent "automatic" interactivity for desktops. Note that I am excluding the nicing games and making the user do a complex dance of figuring out how to make his/her desktop interactive. I am sure you agree that does not work well.

To your point, if we have to have one tunable for the CFS scheduler to make it desktop friendly - essentially a single knob (like sched=desktop in the PlugSched case) it is easy to see how that would fail to work satisfactorily for all desktop workloads. For one thing unless the user messes with nice levels of each process that he/she opens, minimizes or closes or brings to foreground (that is out of question from usability standpoint) the scheduler has no way to distinguish the foreground process from a background one, it has no way of distinguishing mplayer from dekstop window manager from some system daemon going bad and eating CPU.

For another, the scheduler seems to have no reliable way to know what processes it needs to favor. Window manager and the process of the foreground window need to be registered with the scheduler as foreground processes, each minimized window needs to be registered with scheduler as background. Then as long as the window manager and the process owning the foreground window are not runnable everyone else gets CPU. Multimedia applications need to be registered with the scheduler as such - automatically, so that Mplayer always gets CPU when it needs it, even favoring it over the window manager and other process of another foreground window if there is only one available CPU. Until this co-ordination happens I think we will be away from achieving great desktop interactivity which works for most desktop workloads.

Then the question would be that do we want to put all this "only needed on desktop" complexity into the completely fair scheduler or do we want to keep both separate. That is sort of a secondary question - the first question is how do we get the desktop to hint the scheduler as to which processes the user is actively interacting with, which ones are the ones he/she is likely to interact with (minimized windows) and then the scheduler favoring those accordingly - that ought to solve the interactivity problems in an automatic fashion.

[ Windows has this notion of distinguishing between "Programs" (which are running desktop applications) and background services (things without desktop interaction) and in its default configuration on the desktop it favors "Programs" and on Servers it favors "Background services" (Web Server service for e.g.). And it certainly helps interactivity. It can do this because it can distinguish between what is a desktop application and which is foreground or background and what is a non-desktop, background application.]

pluggable schedulers vs. tunable schedulers

Posted Sep 12, 2009 18:31 UTC (Sat) by khc (guest, #45209) [Link]

I already have a compile time way to select scheduler:

patch -p1 < 2.6.31-sched-bfs-211.patch