Group scheduling and alternatives
The main change to the TTY-based group scheduling patch set is that it is, in fact, no longer TTY-based. The identity of the controlling terminal was chosen as a heuristic which could be used to group together tasks which should compete with each other for CPU time, but other choices are possible. An obvious possibility is the session ID. This ID is used to identify distinct process groups; a process starts a new session with the setsid() system call. Since sessions are already used to group together related processes, it makes sense to use the session ID as the key when grouping processes for scheduling. More recent versions of the patch do exactly that. The session-based group scheduling mechanism appears to be stabilizing; chances are good that it will be merged in the 2.6.38 merge window.
Meanwhile, there have been a couple of discussions led by vocal proponents of other approaches to interactive scheduling. It is fair to say that neither is likely to find its way into the mainline. Both are worth a look, though, as examples of how people are thinking about the problem.
Colin Walters asked about whether group scheduling could be tied into the "niceness" priorities which have been implemented by Unix and Linux schedulers for decades. People are used to nice, he said, but they would like it to work better. Creating groups for nice levels would help to make that happen. But Linus was not excited about this idea; he claims that almost nobody uses nice now and that is unlikely to change.
More to the point, though: the semantics implemented by nice are very different from those offered by group scheduling. The former is entirely priority-based, making the promise that processes with a higher "niceness" will get less processor time than those with lower values. Group scheduling, instead, is about isolation - keeping groups of processes from interfering with each other. The concept of priorities is poorly handled by group scheduling now, it's just not how that mechanism works. Group scheduling will not cause one set of processes to run in favor of another; it just ensures that the division of CPU time between the groups is fair.
Colin went on to suggest that using groups would improve nice, giving the results that users really want. But changing something as fundamental as the effects of niceness would be, in a very real sense, an ABI change. There may not be many users of nice, but installations which depend on it would not appreciate a change in its semantics. So nice will stay the way it is, and group scheduling will be used to implement different (presumably better) semantics.
The group scheduling discussion also featured a rare appearance by Con Kolivas. Con's view is that the session-based group scheduling patch is another attempt to put interactivity heuristics into the kernel - an approach which has failed in the past:
Con's alternative suggestion was to put control of interactivity more directly into the hands of user space. He would attach a parameter to every process describing its latency needs. Applications could then be coded to communicate their needs to the kernel; an audio processing application would request the lowest latency, while make would inform the kernel that latency matters little. Con would also add a global knob controlling whether low-latency processes would also get more CPU time. The result, he says, would be to explicitly favor "foreground" processes (assuming those processes are the ones which request lower latency). Distributors could set up defaults for these parameters; users could change them, if they wanted to.
All of that, Con said, would be a good way to "move away from the
fragile heuristic tweaks and find a longer term robust solution.
"
The suggestion has not been particularly well received, though. Group
scheduling was defended against the "heuristics" label; it is simply an
implementation of the scheduling preferences established by the user or
system administrator. The session-based component is just a default for
how the groups can be composed; it may well be a better default than "no
groups," which is what most systems are using now. More to the point,
changing that default is easily done. Lennart Poettering's systemd-driven
groups are an example; they are managed entirely from user space. Group
scheduling is, in fact, quite easy to manage for anybody who wants to set
up a different scheme.
So we'll probably not see Con's knobs added anytime soon - even if somebody
does actually create a patch to implement them. What we might see, though,
is a variant on that approach where processes could specify exact latency
and CPU requirements. A patch for that does exist - it's called the deadline scheduler. If clever group
scheduling turns out not to solve everybody's problem (likely - somebody
always has an intractable problem), we might see a new push to get the
deadline scheduling patches merged.
Index entries for this article | |
---|---|
Kernel | Group scheduling |
Kernel | Scheduler/Group scheduling |
Posted Dec 9, 2010 14:34 UTC (Thu)
by epa (subscriber, #39769)
[Link] (5 responses)
Posted Dec 9, 2010 16:16 UTC (Thu)
by sync (guest, #39669)
[Link] (4 responses)
Already exists (since 2.6.16): SCHED_BATCH
Posted Dec 9, 2010 18:15 UTC (Thu)
by walters (subscriber, #7396)
[Link] (3 responses)
chrt --idle 0 ionice -c 3 make -j 64
Posted Dec 10, 2010 14:51 UTC (Fri)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Dec 10, 2010 15:18 UTC (Fri)
by walters (subscriber, #7396)
[Link]
http://fedorapeople.org/gitweb?p=walters/public_git/homeg...
I picked a high number to emphasize the point basically, but yes, one needs to pick a good -j value.
Posted Dec 10, 2010 20:59 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
Posted Dec 9, 2010 16:38 UTC (Thu)
by jwarnica (subscriber, #27492)
[Link] (7 responses)
Its like asking users what they want. Everything, now, for free. Of course! Thanks for that.
Posted Dec 9, 2010 22:48 UTC (Thu)
by iabervon (subscriber, #722)
[Link] (6 responses)
The right design is to assume that programs want everything, and let them say what they don't want. Then you don't give them anything they don't want. Then the usual fairness and best effort goals essentially work again: if you have a batch process and a realtime process of the same priority, it is equally bad to miss the realtime process's window once as to not run the batch process at all for 1 ms; that is, the scheduler should try equally hard to avoid either happening, and fail about equally often under random load. Of course, writing a scheduler that does this optimally is hard, but the theory shows that it is possible to give userspace controls such that a program can benefit by decreasing its demands on the system.
Posted Dec 10, 2010 1:23 UTC (Fri)
by dtlin (subscriber, #36537)
[Link] (4 responses)
What's wrong with nanosleep?
Posted Dec 10, 2010 2:46 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (3 responses)
The scales in my example are different from what I was actually doing at the time, but I was trying to sample an accelerometer attached to an i2c bus attached to a serial port at 20 Hz; I needed to send a few bytes at the right time, which would cause the accelerometer to take a sample then. (The accelerometer device didn't support automatic periodic sampling.) It turned out that the only way to get data that I could analyze was to sleep until 1 ms before the time I wanted to sample and busy-wait until the right time; that meant I was generally running by the sample time, and generally hadn't used up my time slice. On the other hand, I was burning ~2% of the CPU on a power-limited system busy-waiting.
Posted Dec 10, 2010 18:08 UTC (Fri)
by njs (subscriber, #40338)
[Link] (2 responses)
Posted Dec 10, 2010 18:38 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (1 responses)
Posted Dec 20, 2010 1:44 UTC (Mon)
by kevinm (guest, #69913)
[Link]
Posted Dec 16, 2010 20:36 UTC (Thu)
by oak (guest, #2786)
[Link]
Posted Dec 9, 2010 18:48 UTC (Thu)
by madscientist (subscriber, #16861)
[Link] (1 responses)
Posted Dec 10, 2010 4:59 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
For GUI-heavy users, nothing is bound to a TTY anyway! There aren't any processes bound to TTYs on my wife's machine for example, other than some lonely gettys and the X server itself.
Posted Dec 4, 2016 10:52 UTC (Sun)
by mkerrisk (subscriber, #1978)
[Link]
Interactive versus batch processes
Interactive versus batch processes
See sched_setscheduler(2), chrt(1)
Interactive versus batch processes
Interactive versus batch processes
Interactive versus batch processes
And if you have an exceptionally slow filesystem, also multiply by the expansion factor (single thread total time / CPU time). One one system I use, with a single CPU, I found -j6 gave minimum elapsed time.
make -j level
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
If an application can only make use of the first 1 us of every 1 ms, and asks to run only then, the kernel may be able to give it 100% of the time it wants without any system impact; if, on the other hand, it can't tell the kernel, it has to busy-wait through a lot more processor time in order to get any change of being running then, and load the system much more heavily.
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
Group scheduling and alternatives
"But changing something as fundamental as the effects of niceness would be, in a very real sense, an ABI change. There may not be many users of nice, but installations which depend on it would not appreciate a change in its semantics."
Ironically, changing the traditional semantics of niceness was exactly what the "group scheduling" feature (a.k.a. autogroup) did bring about. When autogrouping is on (which is the default in various distributions), then in many usages (e.g., when applied to one of two CPU bound jobs that is running in two different terminal windows), nice(1) becomes a no-op. See this note and details on the autogroup feature in the (soon to be released) revised sched(7) manual page. A web search easily finds many users who got surprised by the change.