LWN.net Logo

The scheduler saga continues

At the conclusion of last week's episode, Con Kolivas and Ingo Molnar were busily trying to improve interactive response in the 2.6-test scheduler through a variety of techniques. Con had picked up some of Ingo's changes, but had passed over others. In particular, Con thought that Ingo's nanosecond timekeeping functionality added extra overhead without really helping with interactive scheduling.

So it was, perhaps, a surprise to some when Andrew Morton's 2.6.0-test2-mm3 kernel came with a little note: "Con's CPU scheduler rework has been dropped out and Ingo's changes have been added." There is a useful lesson here that has been learned several times on linux-kernel: when Ingo starts to think seriously about a development issue, it's usually worthwhile to pay attention to what he comes up with. (Incidentally, Andrew merged Ingo's 4G/4G patch as well).

In particular, it seems that Ingo's nanosecond timekeeping in the scheduler was necessary after all. The interactivity patches try to give a priority boost to processes which perform short sleeps, and tracking those sleeps in jiffies (usually 1/1000 second in 2.6) was insufficiently precise. Con reworked his patch to use the higher-resolution times; the resulting O12.2int patch found its way back into 2.6.0-test2-mm4. Beyond the timekeeping change, the patch continues to tweak the various parameters, but mostly sticks to the techniques for discovering interactive processes that were discussed last week.

Con's O13int goes a little further, however, and denies an interactive bonus to processes for non-interruptible sleeps. This type of sleep (which shows up in ps output as the dreaded "D" state that can mark a non-killable process) is usually (but not always) associated with a wait for disk I/O. Con's observation was that processes which are pounding on the disk are usually not performing truly interactive work, and shouldn't get the associated bonus.

This approach has a problem, however: the recently merged anticipatory I/O scheduler will, on completion of a read request, idle the disk briefly on the expectation that the reading process will immediately issue another, nearby request. But if the scheduler makes the reading process wait (since it was in a non-interruptible sleep and doesn't appear to be interactive), the next read request may not arrive in time, with the result that the I/O pause was done in vain. Idling a disk for no useful purpose does not help response, interactive or otherwise. In the end, Con tweaked the code to allow tasks to build up enough credit in non-interruptible sleeps to just barely qualify as "interactive."

Since then, scheduler tweaking activity has slowed a bit. For the time being, it seems, most of the ideas in circulation have been tried out. Perfection in the scheduler is probably an unattainable goal; it may be that it will soon be time to declare victory and move on to other issues.


(Log in to post comments)

The scheduler saga continues

Posted Aug 7, 2003 11:57 UTC (Thu) by Algol (subscriber, #2681) [Link]

Any word on a "bounded-latency soft realtime scheduler" mentioned in last week's scheduler-story?

Interactivity is important, but it doesn't seem to be the only problem with the scheduler. The story seems to suggest that once the interactivity patches are in the scheduler won't get any more attention for awhile?

The scheduler saga continues

Posted Aug 7, 2003 17:09 UTC (Thu) by iabervon (subscriber, #722) [Link]

It seems like there is a need for the scheduler to decide what to schedule based on a number of factors, each corresponding to a common pattern. So there would be one for interactive tasks, one for soft realtime (e.g., audio buffer filling) tasks, and one for disk I/O tasks. The disk one would schedule a task which does a lot of disk I/O when an I/O operation completes for the duration that the I/O scheduler keeps the disk idle.

I suspect that some of Con's difficulties stem from trying to solve a heterogenous problem with a single solution; furthermore, some of the cases ought to be tied to other code.

For that matter, it should be possible for the I/O scheduler to find out whether the task will get CPU time, and not idle the disk if the CPU scheduler has decided not to wake the task. (Of course, I haven't actually looked into the matter in enough detail to really know).

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds