Some of the changes that have been stressing the I/O scheduler have gone in
very recently. A couple of patches from Andrew Morton are currently
sitting in Linus's 2.5.39 BitKeeper tree; they are worth a look.
In the end, much of the work done by the VM subsystem is deciding which
pages to move to which disks, and when. A good set of decisions will lead
to good performance; but if the VM is not smart in which pages it shoves
out, performance can suffer. Some of Andrew's recent efforts stem from an
important observation that has been somewhat overlooked until now: there is
little point in trying to write pages to disks which are already
overwhelmed with requests.
If you want to try to direct your efforts toward disks which are not overly
busy, you first need some sort of indication of just how much work each
drive has to do. So Andrew has added a new set of functions that report on
whether a device's request queue is congested or not. The test used is
simplistic: a device's read or write queue is not congested if at least 25%
of the allocated request queue entries (a fixed number of these is
allocated at queue creation time) are available for use. A simple test is
good enough, though, especially considering that the size of a request
queue tends to be volatile.
Once you can test for a congested state, you can start making smarter
decisions. Once these functions were in, Linus merged another patch which
causes the ext2 filesystem to cut back on speculative readahead operations
if the underlying device is busy. If the disks are backed up, presumably
there are more important things for them to be doing than reading ahead
data that may or may not be used.
More impressive performance gains, however, can be had by looking at the
pdflush subsystem. pdflush is a set of kernel threads
whose job it is to write dirty file data back to the underlying
filesystems. A fair amount of effort goes into keeping separate
pdflush threads from trying to write back to the same device, and
to simply keeping the right number of pdflush threads around.
With the new scheme, life gets easier. pdflush does its best to
simply pass over pages when the destination queue is congested; instead, it
concentrates on pages that can be written to less busy devices. Thus
pdflush no longer blocks on request queues, and can concentrate on
keeping them all full. A side benefit is that a single pdflush
thread may now be sufficient.
According to Andrew: "This code can keep sixty spindles saturated -
we've never been able to do that before."
It is increasingly apparent that the 2.6 kernel is going to be an amazing
performer in numerous areas, thanks to work like this.
to post comments)