TTY-based group scheduling
The core idea behind the completely fair scheduler is its complete fairness: if there are N processes competing for the CPU, each with equal priority, than each will get 1/N of the available CPU time. This policy replaced the rather complicated "interactivity" heuristics found in the O(1) scheduler; it yields better desktop response in most situations. There are places where this approach falls down, though. If a user is running ten instances of the compiler with make -j 10 along with one video playback application, each process will get a "fair" 9% of the CPU. That 9% may not be enough to provide the video experience that the user was hoping for. So it is not surprising that many users see "fairness" differently; wouldn't be nice if the compilation job as a whole got 50%, while the video application got the other half?
The kernel has been able to implement that kind of fairness for years though a feature known as group scheduling. A set of processes placed within a group will each get a fair share of the CPU time allocated to the group as a whole, but groups will, themselves, compete for a fair share of the CPU. So, if the video player were to be placed in one group and the compilation in another, each group would get half of the available processor time. The various processes doing the compilation would then get a fair share of their group's half; they will compete with each other, but not with the video player. This arrangement will ensure that the video player gets enough CPU time to keep up with the stream and any interactivity requirements.
Groups are thus a nice feature, but they have not seen heavy use since they were merged for the 2.6.24 release. The reasons for that are clear: groups require administrative work and root privileges to set up; most users do not know how to tweak the knobs and would really rather not learn. What has been missing all these years is a way to make group scheduling "just work" for ordinary users. That is the goal of Mike Galbraith's per-TTY task groups patch.
In short, this patch automatically creates a group attached to each TTY in the system. All processes with a given TTY as their controlling terminal will be placed in the appropriate group; the group scheduling code can then share time between groups of processes as determined by their controlling terminals. A compilation job is typically started by typing "make" in a terminal emulator window; that job will have a different controlling TTY than the video player, which may not have a controlling terminal at all. So the end result is that per-TTY grouping automatically separates tasks run in terminals from those run via the window system.
This behavior makes Linus happy; Linus, after all, is just the sort of person who might try to sneak in a quick video while waiting for a highly-parallel kernel compilation. He said:
Others have also reported significant improvements in desktop response, so this feature looks like one which has a better-than-average chance of getting into the mainline in the next merge window. There are, however, a few voices of dissent, most of whom think that the TTY is the wrong marker to use when placing processes in group.
Most outspoken - as he often is - is Lennart Poettering, who asserted that "Binding something like
this to TTYs is just backwards
"; he would rather see something which
is based on sessions. And, he said, all of this could better be done in
user space. Linus was, to put it politely, unimpressed, but Lennart came back with a few lines of bash scripting
which achieves the same result as Mike's patch - with no kernel patching
required at all.
It turns out that working with control groups is not necessarily that hard.
Linus, however, still likes the kernel version, mainly because it can be made to "just work" with no user intervention required at all:
In other words, an improvement that just comes with a new kernel is likely to be available to more users than something which requires each user to make a (one-time) manual change.
Lennart isn't buying it. A real user-space solution, he says, would not come in the form of a requirement that users edit their .bashrc files; it, too, would be in a form that "just works." It should come as little surprise that the form he envisions is systemd; it seems that future plans involve systemd taking over session management, at which time per-session group scheduling will be easy to achieve. He believes that this solution will be more flexible; it will be able to group processes in ways which make more sense for "normal desktop users" than TTY-based grouping. It also will not require a kernel upgrade to take effect.
Another idea which has been raised is to add a "run in separate group" option to desktop application launchers, giving users an easy way to control how the partitioning is done.
Linus seems to be holding his line on the kernel version of the patch:
Tough. I found out that I can solve it using cgroups, I asked people to comment and help, and I think the kernel approach is wonderful and _way_ simpler than the scripts I've seen. Yes, I'm biased ("kernels are easy - user space maintenance is a big pain").
The next merge window is not due until January, though; that is a fair
amount of time for people to demonstrate other approaches. If a solution
based in user space turns out to be more flexible and effective in the long
run, it may yet prevail. That is especially true because merging Mike's
patch does not in any way inhibit user-space solutions; if a systemd-based
approach shows better results, that may be what the distributors decide to
enable. One way or the other, it seems like better interactive response is
coming in the near future.
Index entries for this article | |
---|---|
Kernel | Group scheduling |
Kernel | Interactivity |
Kernel | Scheduler/Group scheduling |
Posted Nov 18, 2010 2:27 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (15 responses)
Those of us who use xterms (or whatever the program is called today) can benefit from Linus' approach now. Those who only use GUIs with no tty involved might have to wait for systemd to achieve desktop domination, but that isn't being slowed down by the per-tty approach ... is it?
Posted Nov 18, 2010 2:43 UTC (Thu)
by mjg59 (subscriber, #23239)
[Link] (12 responses)
...which isn't inherently a bad thing. The people that do do that kind of thing are the people that are more likely to update their kernel more often than their userspace, and they're also the kind of enthusiast population that test kernels early and often and let us know when things have broken. There's obvious benefits in catering to them, but we shouldn't lose sight of the fact that there's way more people using Linux as something to run a web browser and a music player than there are people who ever launch a terminal.
Posted Nov 18, 2010 3:03 UTC (Thu)
by Trelane (subscriber, #56877)
[Link] (9 responses)
Posted Nov 18, 2010 3:17 UTC (Thu)
by foom (subscriber, #14868)
[Link] (8 responses)
But, the plan is for systemd (as a per-user process, separate from the root/pid 1 process) to take over as the session manager for GNOME (and who knows, maybe KDE too). Because, in the end, a user session manager has to do almost the same things as a root daemon manager anyways.
Posted Nov 18, 2010 3:24 UTC (Thu)
by Trelane (subscriber, #56877)
[Link]
Honestly, I don't care who does it as long as it doesn't suck. :)
Posted Nov 18, 2010 20:03 UTC (Thu)
by aleXXX (subscriber, #2742)
[Link] (6 responses)
Alex
Posted Nov 18, 2010 20:49 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link] (5 responses)
Posted Nov 18, 2010 23:27 UTC (Thu)
by cyd (guest, #4153)
[Link] (4 responses)
Posted Nov 19, 2010 0:00 UTC (Fri)
by drag (guest, #31333)
[Link] (2 responses)
Gone are the days of mucking around with my ~/.asoundrc file. Good riddance to them.
I couldn't imagine going back now.
Posted Nov 19, 2010 0:43 UTC (Fri)
by sfeam (subscriber, #2841)
[Link]
Posted Nov 19, 2010 9:36 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Yeah, but the other question is: how long and how much pain did this require?
When there is always something (different) broken, the dawn of the Linux Desktop is always "tomorrow".
Posted Nov 20, 2010 0:18 UTC (Sat)
by mezcalero (subscriber, #45103)
[Link]
Also note that the man who currently does most of the maintenance work on PA during my hiatus working on systemd is actually a KDE contributor (Phonon), Colin Guthrie, so this worked out quite well in the end.
If KDE wants to be a leader in innovation when it comes to developing the infrastructure of the Linux desktop, then it actually has to become active and work with us. It's their duty, nor ours. So far however, with HAL/udev, NM, PA, Gst and all the other techs we pushed from the GNOME side they ended up catching up eventually, but never led development. I would be happy if they'd work more with those doing the ground work.
Lennart
Posted Nov 18, 2010 3:23 UTC (Thu)
by russell (guest, #10458)
[Link] (1 responses)
really? who? He's missing out on all the fun things.
Posted Nov 18, 2010 3:25 UTC (Thu)
by mjg59 (subscriber, #23239)
[Link]
Posted Nov 18, 2010 12:09 UTC (Thu)
by dgm (subscriber, #49227)
[Link]
I think Lennart has a point, policy may not belong to the kernel. But providing a sane default when we can sounds like music to my ears. Just let users override it if/when they can do better.
Posted Nov 18, 2010 14:38 UTC (Thu)
by vgoyal (guest, #49279)
[Link]
- I have always been told that kernel should implement mechanism and not policy and this sounds like a policy to me. Not sure where to draw the line
- tty groups are hidden and and don't appear in cgroup hierarchy and are not user visible. So if some controller is implementing the upper limit, these groups might not be subjected to those limits.
- Once cgroups are visible to user space, one can write a nice monitoring tool to list all the cgroups, their resource consumption, processes running in these etc. With hidden group, one can't do that.
So IMHO, this kind of automatic group creation is more of policy and should be done in user space. Even if we do it in kernel, atleast we need to make sure these groups become visible in appropriate cgroup hierarchy and are user controllable as regular cgroups are.
Posted Nov 18, 2010 11:10 UTC (Thu)
by kragil (guest, #34373)
[Link] (2 responses)
Posted Nov 18, 2010 11:53 UTC (Thu)
by dgm (subscriber, #49227)
[Link]
Posted Nov 21, 2010 11:28 UTC (Sun)
by Darkmere (subscriber, #53695)
[Link]
http://www.bigrockcandymountain.info/2010/11/20/disrespect
Just because you feel entitled doesn't mean I have to care. Just because I do not care doesn't mean I can be an arse. However, if we continue off this tangent I'll be insulting more people here.
Posted Nov 18, 2010 12:34 UTC (Thu)
by nikanth (guest, #50093)
[Link] (2 responses)
Posted Nov 18, 2010 16:36 UTC (Thu)
by hmh (subscriber, #3838)
[Link] (1 responses)
The correct way to go about it is: policy belongs where it can be *realistically* made to work best by default.
Every interface that requires a kernel->userspace->kernel roundtrip to set policy _for no other reason_ than the "policy belongs in userspace" mentality, is clearly the product of bad engineering.
Posted Nov 18, 2010 18:27 UTC (Thu)
by vonbrand (subscriber, #4458)
[Link]
The "policy is userspace" mentality is exactly one of the things that make Unixy systems flexible (and got Linux running from smartphones to Google). If it truly is setting policy, a roundtrip through the kernel won't be expensive enough to make any difference anyway.
Posted Nov 18, 2010 12:51 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Nov 18, 2010 18:20 UTC (Thu)
by Chousuke (subscriber, #54562)
[Link]
If you can't watch video while the compilation is underway, what's the point of the scheduler allowing the video player to consume resources at all?
You can't tell users it's too much to demand decent interactivity while heavy background processes are running.
Posted Nov 18, 2010 13:14 UTC (Thu)
by iq-0 (subscriber, #36655)
[Link] (2 responses)
But I fear cgroups don't scale to 1 level per process/process-group. Otherwise I think this would automagically work for all users (eg. running chrome (multiprocess possibly cpu hog) a game and listing to some music in another app instead of the in-game music).
Posted Nov 18, 2010 21:09 UTC (Thu)
by jzbiciak (guest, #5246)
[Link]
Yeah, I'm skeptical that assigning groups by TTY is the way to go for the average user. For folks like me with 50-bazillion terminal windows open trying to multitask while running lots of compute? Yeah, sure. For someone who has never opened a terminal window? I can't see how it could possibly help. For giggles, I just logged into my wife's computer remotely over ssh. Out of ~167 processes, only 10 had controlling terminals. 6 were idle gettys on tty1-6, 1 was Xorg on tty7, and the other 3 were processes associated with my ssh session. What process group(s) would all the other 150+ tasks be assigned to? Or all they all effectively in their own per-process groups by default? How would this patch benefit users like my wife, who open few if any interactive sessions? It also breaks if you're the type that launches a bunch of GUI apps in the background from the same shell window. They might all inherit the same TTY but never use it. Now they'd get lumped in the same group.
Posted Nov 20, 2010 1:39 UTC (Sat)
by sayap (guest, #71380)
[Link]
Posted Nov 18, 2010 15:33 UTC (Thu)
by Cato (guest, #7643)
[Link] (4 responses)
Probably it's a VM issue, but it's hard to understand why Linux can't keep up with my typing these days given much faster hardware, when it always used to back in the days of formatting floppy disks in background.
Firefox is a key issue - with 100-200 tabs this freezing of keyboard input across the whole system is much more frequent, and "pidstat -d 5" shows that it and kjournald are doing some I/O, though at quite low rates. This is on a Core 2 Duo system with 4 GB RAM and PAE enabled, and about 50% of RAM used by programs typically.
Linux always used to be stunningly fast and responsive compared to Windows - with bugs like this, and the choice of a lumbering Firefox vs. a Chrome that lacks key extensions, it's still not quite there as a desktop. My main desktop is still Linux but I'd really like it to perform more consistently.
Posted Nov 18, 2010 16:42 UTC (Thu)
by vgoyal (guest, #49279)
[Link]
You also might want to play a bit with blkio cgroup controller. Try putting firefox/VM in a separate cgroup or try putting your terminal in a separate cgroup and see if it helps.
For further information look at Documentation/cgroups/blkio-controller.txt
Posted Nov 18, 2010 21:33 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (2 responses)
Posted Nov 19, 2010 12:03 UTC (Fri)
by Cato (guest, #7643)
[Link] (1 responses)
I'm experimenting with a very small swapfile on Linux (512MB) etc but that hasn't solved this.
Posted Nov 26, 2010 9:11 UTC (Fri)
by jospoortvliet (guest, #33164)
[Link]
Posted Nov 18, 2010 19:20 UTC (Thu)
by ccurtis (guest, #49713)
[Link]
I'm thinking of fork(): What about an approach like:
In the 'make + video' scenario, all the 'gcc' processes will have a penalty while the video will perform with higher priority. In the daemon scenario, they will be reparented and thus have their penalty cleared. I expect a 'renice' would also want to clear any penalties.
Now, this isn't the same as creating a cgroup automatically - but it does seem like the place to do this is within the shell. Or with a shell script as Lennart posted. But there may be another heuristic around fork() that makes more sense than simply the controlling tty.
Posted Nov 19, 2010 5:43 UTC (Fri)
by naptastic (guest, #60139)
[Link] (2 responses)
The problem isn't that the scheduler allocates time fairly; the problem is that the operating system sets all 10 children of make and the video player to the same priority. This is not a sane default. The Operating System should have an idea how much priority an application needs, and an end-user who doesn't know about .bashrc files shouldn't have to think about it. Make and its children should be reniced to 15 (for example) and totem should be reniced to -15 (for example) without any user intervention. Audio and video applications should have high priorities; Firefox, Thunderbird, Solitaire and OpenOffice should have medium priorities; video encoding and compilation should have low priorities by default.
If assigning arbitrary numbers to applications by category so they sort themselves out correctly seems crazy, remember that System V init has been doing it for years: every init script has a 2-digit number, and those numbers determine the order in which scripts are run.
A quick look at top on my system (Ubuntu 10.10) shows only 4 tasks with priorities != 1, and only 2 tasks with nice values != 0. Nice values and priorities are a simple tool that can solve these problems (assuming desktop preemption) so why aren't we using them? Cgroups are a great tool (killer feature!) but it seems like swatting a mosquito with a laser-guided rocket-propelled grenade launcher.
(Someday I want to try swatting a mosquito with a laser-guided rocket-propelled grenade launcher.)
Posted Nov 19, 2010 18:50 UTC (Fri)
by holstein (guest, #6122)
[Link]
apparently, someone DID build something that uses laser to kill mosquito (no grenade though):
http://blog.makezine.com/archive/2010/09/make_23_how_to_s...
Besides,more on topic, the nice thing about this cgroup utilisation is that it's more or less automatic: no need to chase down pid to renice. Maybe both could be used together somehow?
Posted Nov 20, 2010 1:10 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
The video player is a special case, and there's a name for what it requires: real time scheduling. It's special because its correct execution is tied to the passage of real time. Well, we have real time scheduling facilities in Linux. Why doesn't the movie player program use them?
So the example in the article is probably not the best example of how the CTTY-based scheduling policy for non-realtime processes is good.
Posted Nov 19, 2010 10:13 UTC (Fri)
by job (guest, #670)
[Link] (2 responses)
Personally I have a problem with disk I/O a lot more often. I suspect there is a significant amount of CPU work involved in SATA I/O and that this activity is not scheduled together with userspace, because when there is heavy disk usage interactivity in my terminals becomes very choppy.
This was actually better ten years ago, on yesterday's hardware, so something must have changed since then. If you want to help dekstop interactivity, test under disk activity first. Your users will thank you (at least I would).
Posted Nov 19, 2010 19:47 UTC (Fri)
by Hypatia (guest, #57397)
[Link]
That said, I fear that with this scheduling scheme I will have to stop launching my headless VMs from the commandline, since they will be badly throttled while my wobbly windows wobble smoothly and my screen saver dazzles an empty room at high priority. I use the 'nice' command a lot , especially back in the days when I did a lot of kernel compiling. Nice has always worked pretty well for me. Is this patch different than auto-nicing your interactive bash shells?
Posted Nov 20, 2010 18:16 UTC (Sat)
by mfedyk (guest, #55303)
[Link]
btw, chrome does the same with memory at about 50 tabs...
Posted Nov 19, 2010 23:35 UTC (Fri)
by dmarti (subscriber, #11625)
[Link] (1 responses)
Posted Nov 20, 2010 17:33 UTC (Sat)
by nlucas (guest, #33793)
[Link]
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
though.
R E S P E C T
R E S P E C T
R E S P E C T
Policy
Policy belongs to kernel space
Policy belongs to kernel space
TTY-based group scheduling
TTY-based group scheduling
TTY-based group scheduling
This could even be applied more generically in other schemes (systemd creating a cgroup and subgroups are automagically created within their parents group).
TTY-based group scheduling
TTY-based group scheduling
Responsiveness under disk I/O
Responsiveness under disk I/O
Responsiveness under disk I/O
Responsiveness under disk I/O
Responsiveness under disk I/O
I'm fine with the tty-based approach personally, but it doesn't really seem ideal as more gui gets involved. Desktops - at the risk of portability - can integrate the feature, but what about another trigger?
TTY-based group scheduling
This is just conceptual, of course. I imagine 'penalty' to be something like 1/10 or 1/8 of a 'nice', so the scheduler would use a value like (nice << 3 | penalty).
Very poor underlying assumptions.
OT : Mosquito
I don't know that you can say make is a generally a low-priority thing. When I type 'make' and am waiting for it to finish (which I expect to be in less than a minute), there's not much I'd like to see get CPU time before my make.
tty-based scheduler grouping vs scheduler priorities
I/O is the problem, not CPU
I/O is the problem, not CPU
I/O is the problem, not CPU
Someday when Lennart Poettering is 96 years old, he'll be sitting in the park painting a watercolor of a duck, and some LWN reader is going to come up to him, take one look at the painting, and say, "D3WD THAT SUX! PULSEAUDIO TOTALLY BR0KE MY SOUND IN 2004!!1!!"
Prediction
Prediction