The scheduler saga continues
[Posted August 6, 2003 by corbet]
At the conclusion of
last week's
episode, Con Kolivas and Ingo Molnar were busily trying to improve
interactive response in the 2.6-test scheduler through a variety of
techniques. Con had picked up some of Ingo's changes, but had passed over
others. In particular, Con thought that Ingo's nanosecond timekeeping
functionality added extra overhead without really helping with interactive
scheduling.
So it was, perhaps, a surprise to some when Andrew Morton's 2.6.0-test2-mm3 kernel came with a little note:
"Con's CPU scheduler rework has been dropped out and Ingo's changes
have been added." There is a useful lesson here that has been
learned several times on linux-kernel: when Ingo starts to think seriously
about a development issue, it's usually worthwhile to pay attention to what
he comes up with. (Incidentally, Andrew merged Ingo's 4G/4G patch as well).
In particular, it seems that Ingo's nanosecond timekeeping in the scheduler
was necessary after all. The interactivity patches try to give a priority
boost to processes which perform short sleeps, and tracking those sleeps in
jiffies (usually 1/1000 second in 2.6) was insufficiently precise. Con reworked
his patch to use the higher-resolution times; the resulting O12.2int patch found its way back into 2.6.0-test2-mm4. Beyond the timekeeping
change, the patch continues to tweak the various parameters, but mostly
sticks to the techniques for discovering interactive processes that were
discussed last week.
Con's O13int goes a little further, however,
and denies an interactive bonus to processes for non-interruptible sleeps.
This type of sleep (which shows up in ps output as the dreaded
"D" state that can mark a non-killable process) is usually (but
not always) associated with a wait for disk I/O. Con's observation was
that processes which are pounding on the disk are usually not performing
truly interactive work, and shouldn't get the associated bonus.
This approach has a problem, however: the recently merged anticipatory
I/O scheduler will, on completion of a read request, idle the disk briefly
on the expectation that the reading process will immediately issue another,
nearby request. But if the scheduler makes the reading process wait (since
it was in a non-interruptible sleep and doesn't appear to be interactive),
the next read request may not arrive in time, with the result that the I/O
pause was done in vain. Idling a disk for no useful purpose does not help
response, interactive or otherwise. In the end, Con tweaked the code to allow tasks to build up
enough credit in non-interruptible sleeps to just barely qualify as
"interactive."
Since then, scheduler tweaking activity has slowed a bit. For the time
being, it seems, most of the ideas in circulation have been tried out.
Perfection in the scheduler is probably an unattainable goal; it may be
that it will soon be time to declare victory and move on to other issues.
(
Log in to post comments)