Improving interactivity on Linux systems

[Posted March 11, 2003 by corbet]

The 2.5 kernel features a massively reworked scheduler which, among other things, improves the interactive feel of a desktop system. It goes to great lengths to try to separate interactive tasks from "background" processes, and to give a priority boost to the former. One way that this distinction is made is to look at how much time each process spends sleeping. Processes that sleep a lot are generally waiting for humans to do something, so the kernel tries to ensure that, when they wake up, they get quick access to the processor.

This heuristic works well much of the time, but it also fails badly in some situations. Consider, for example, the case of a user dragging a window across the screen. That sort of operation can require a fair amount of computation on the part of the X server. If the system is busy anyway (with a kernel compilation, for example), the X server can end up using all of the processor time that is available to it. When the server stops sleeping, the kernel concludes that it is a compute-bound background task and drops its priority. At that point, the pointer stops keeping up with the mouse, and the desktop experience becomes generally unpleasant.

A classic solution (which predates Linux) for this problem is to raise the priority of the X server. A higher-priority server can make things work better for some users, but it ignores the fact that similar situations can arise with other interactive processes that require a fair amount of processor time. Streaming media applications tend to work this way, for example. Raising the priority of the X server can make things worse for this sort of application. Also, as Linus points out, tweaking priorities in this way is an indication that the system has failed somehow:

Something is wrong, and we couldn't fix it, so here's the band-aid to avoid that problem for hat particular case. It's acceptable as a band-aid, but if you don't realize that it's indicative of a problem, then you're just kidding yourself.

A few patches have gone into the 2.5.65 kernel which, by most reports, make things a lot better. One of them, which originally came from Linus, is based on the recognition that, if an interactive process is waiting for another process to do something, that other process should be considered interactive as well. The X server may be using a fair amount of CPU time, but, since interactive processes (i.e. the clients that the user works with) are waiting for it, the X server should still be seen as an interactive process.

The ideal time to make this adjustment might be when an interactive process goes to sleep waiting for an event. Unfortunately, that is hard to do; the kernel has no way to know, in the general case, who will be waking up processes that sleep on a particular queue. On the other hand, when the wakeup actually occurs, the relationship is immediately obvious. So the new scheduler will, at wakeup time, look at the interactivity bonus for the process being awakened. If that process has maxed out its bonus (as processes that sleep a lot will), the "excess" interactivity bonus is given, instead, to the process which is performing the wakeup. Thus, a sleeping mail client gives some of its bonus to the X server, which wakes it up. This patch is said to improve the interactivity of X significantly.

Ingo Molnar has taken Linus's patch and merged it into a larger set of scheduler changes (which, in turn, has gone into 2.5.65). Some of the additional changes that have been made include:

Various scheduler parameter tweaks. The maximum timeslice given to any process has been reduced, for example (to 200ms).
One process can preempt another with the same priority, if the former has a longer remaining timeslice.
The first wakeup of a newly-forked child has been made smarter, resulting in less work being redone.

The end result of these changes is a kernel which provides a much more satisfying interactive experience. Note, however, that some causes of X server stalls - in particular, those related to disk I/O scheduling - still have not been resolved. Work is ongoing, however.

(See also: Jim Houston's self-tuning scheduler patch, which takes a different approach to scheduler improvement).

Improving interactivity on Linux systems

Posted Mar 13, 2003 7:20 UTC (Thu) by cpeterso (guest, #305) [Link] (2 responses)

Sounds like the Linux kernel developers are once again reinventing the wheel. They are reinventing multi-level feedback queue (MLFQ) scheduling and they don't even know it.

I remember reading an interview (on kerneltrap.com, sorry LWN ;-) with one Linux kernel developer who bragged that he never read any research papers about existing algorithms or systems. He didn't want his code to be biased by the lessons learned from other people's research or prototypes. He "knew" his code would be the optimal algorithm (even though he hadn't read about the other, existing algorithms).

Reading the Literature

Posted Mar 13, 2003 16:48 UTC (Thu) by brugolsky (guest, #28) [Link]

Richard Feynman didn't spend much time reading papers -- instead he worked from first principles. It seems to have worked for him. :-)

More seriously, Ingo (the kerneltrap interviewee to whom you refer) is a fount of ideas, some of which are brilliant, some of which are misguided. In almost every case, his brainstorms are accompanied by working code. Besides Linus and the core kernel developers, there are hundreds of lurkers who have read the literature, and stand ready to peer review his work.

Synthesizing the work of others is fine, but original ideas have to come from somewhere.

Improving interactivity on Linux systems

Posted Mar 15, 2003 6:53 UTC (Sat) by IkeTo (subscriber, #2122) [Link]

The first day when the O(1) process scheduler comes out, the scheduler becomes a (generalized) MLFQ. Perviously that's not the case: there is a single queue with priority dynamically changing, which makes it very difficult to do quickly (one has to recompute the priority every time the scheduler is invoked). So the "reinvention" is done I think more than 1 year ago.

But that's not the point of the current change to the scheduler. The change has been about how to know a process should change queue. Before the change, a process change queue when it sleeps too much or too little, which is considered an indication of whether the process is interactive or CPU-bounded. This is as oppose to some OS out there (that I don't want to name) that hard codes some processes as interactive processes.

The current "reinvention" is actually something to do with priority inversion that you see in deadlock prevention works, when processes of lower priority is raised to a higher priority if a higher priority task is waiting for its completion. At that context, the solution is quite easy, since it is always mutex or locks that we are talking about, and mutex and locks remember who locks them.

I think the real innovation here is that (1) such priority balancing can be used as an effective means to guess which process is interactive, and (2) such priority balancing can be effected even when we don't know who a process is waiting for. Well... that might not really be as "innovative": as Linus himself pointed out it is a scheme that is used in other research OS as well, but is never used in any real-world OS.

Improving interactivity on Linux systems

Posted Mar 13, 2003 16:53 UTC (Thu) by NAR (subscriber, #1313) [Link] (6 responses)

I wonder why the linux kernel tries to find out that which processes are interactive, why not asks them? The application itself surely knows if it is interactive or not, so it could simply set an i_am_an_interactive_process bit somewhere in the kernel, and then the scheduler should only care about this bit instead of simple (or not so simple) heuristics. It might not be portable, but I can live with an extra #ifdef LINUX in the X/KDE/GNOME codebase...

Bye,NAR

Process "interactive" bits

Posted Mar 13, 2003 17:03 UTC (Thu) by corbet (editor, #1) [Link] (4 responses)

The problem, of course, is that you can't always trust the process involved. If setting the "I am interactive" bit yields a higher priority, it will certainly be abused. So it would have to be a privileged operation, at which point you might as well just use nice().

Process "interactive" bits

Posted Mar 13, 2003 17:20 UTC (Thu) by NAR (subscriber, #1313) [Link] (2 responses)

The problem, of course, is that you can't always trust the process involved.

Yes, but in the end it's the user (of the kernel interface), who you don't trust. I think it would have already gone wrong, if we couldn't trust e.g. the authors of sendmail or postgres or apache that they wouldn't abuse this feature. And I'm not sure if they can't abuse something already in the kernel like allocating lots of memory ("in case we might need it") or caching files ("we know better than the kernel") or forking many processes.

And even if they do abuse this feature, they might be publicly humiliated for this (after all, this is open source, so it would be noticed :-)

Bye,NAR

But a process may or may not be "interactive"

Posted Mar 13, 2003 19:12 UTC (Thu) by captrb (guest, #2291) [Link]

A process may or may not be interactive depending on how it is used, not how it is
constructed. A relatively unused apache instance, usually sleeping but
occasionally awaking would be an interactive process. In this situation you want
apache given priority to finish its job and resume sleeping, taking momentary
priority over the non-sleeping seti@home process that has been sucking CPU for
weeks.

What this patch does above and beyond this is to share priority with process that
the interactive process is IO bound on. So if the awoken apache process contacts
the seti@home process (I don't know why it would, but for the example) and is
waiting for a response, you actually want to give the seti@home process some of
apache's priority. Giving more cpu time to apache at this point is useless, as he is
just waiting for seti@home to respond, you may as well identify the relationship
and help apache finish its request by helping out seti.

The reason this relates to the an Xserver and "user interactive" processes is that
all Xwindows programs (like the Gnome and KDE programs you suggest have the
interactive bit set) are clients to the xserver. Even if they are given priority, they
still have to wait for the xserver to perform the display duties. Giving more priority
to them doesn't help, as they are IO bound and gagged. Furthermore, those
xclient programs could be pigs, sucking cpu without user interaction. If you
depended on the "interactive bit" being set, you'd never be able to properly
de-prioritize a "interactive bit" application that had spiraled out of control.
Contrary to what a good scheduler should do, you'd be giving too much time to the
process that doesn't need it.

Process "interactive" bits

Posted Mar 14, 2003 14:04 UTC (Fri) by torsten (guest, #4137) [Link]

It is indeed the user you don't trust. Long ago, on UNIX systems with perhaps thousands of accounts, and everyone was waiting for their job to finish, people used to up their nice levels to get ahead in the queue, and get their jobs quicker.

Try not to think of Linux as a single-user system.

Torsten

Process "interactive" bits

Posted Mar 15, 2003 5:43 UTC (Sat) by IkeTo (subscriber, #2122) [Link]

> The problem, of course, is that you can't always trust the
> process involved. If setting the "I am interactive" bit yields a
> higher priority, it will certainly be abused.

That's not really the problem. Anything can be abused, so adding one does not really matter. After all, it's the user's system that the program is abusing, so the adverse effect of the abuse goes to the user himself. If a program consistently abuse the OS to its own advantage and affect other programs in the system, it won't be installed by the user (or system admin, for a multi-user system). The problem is that it is very inconvenient for the programmers if programs must tell the OS that it is interactive---especially since the OS is supposed to manage the resources and figure out which programs are interactive by itself.

Improving interactivity on Linux systems

Posted Mar 15, 2003 5:33 UTC (Sat) by IkeTo (subscriber, #2122) [Link]

> The application itself surely knows if it is interactive or not

Hm... now you are relying on the programmer to do the right thing to set a flag when he know that he is going to be "interactive" for a while, and reset the flag when he know that he is going to be "non-interactive" for a while. Do you really thing that programmers will really use their coding time this way? Personally I'd envision that generate a lot of bugs instead of a lot of interactivity improvements.

Maximum timeslices?

Posted Mar 15, 2003 23:46 UTC (Sat) by riddochc (guest, #43) [Link] (1 responses)

It occurs to me that the timeslice size should be somewhat related to how much a process can do in a certain amount of time - you don't want to have too much, or too little, time given to any single process at once. Or at least that's what I learned in my operating systems class.

The question I have is this: If the timeslice is fixed at, say, 200ms, this would behave very differently on a 2.4Ghz P4 than a ~50Mhz 486. Intuitively, the amount of time the scheduler spends on overhead work is going to depend on how much it's trying to do, and that overhead will take far less time on a really fast CPU than on a really slow one - it won't always be, say, 10ms.

Wouldn't it make sense to have parameters that are in units of milliseconds be changed so that they're proportional to the CPU speed instead? I've got doubts about whether 200 milliseconds is as good a number on a 386 as it is on a really modern SMP machine.

Maximum timeslices?

Posted Mar 16, 2003 8:21 UTC (Sun) by IkeTo (subscriber, #2122) [Link]

Timeslice is very much a perception thing than a mathematical thing. Because what's important is what people perceives, it runs at the speed of human rather than machines. If you find it fair and acceptable to let each of your CPU bounded process to work for 10ms in your 33MHz 386, I can't see why once you upgrade to a 1.5GHz Pentium-IV you suddenly can't tolerate more than 0.2ms. Unless you run 50 times as many processes in it, of course.