The continuing development of I/O scheduling
[Posted February 11, 2003 by corbet]
The 2.5 development series has seen a great deal of work aimed at improving
the performance of the block I/O subsystem. Recently there has been a
resurgence of interest in I/O scheduling - deciding which disk I/O requests to
process in which order. Optimal scheduling can keep the disks running at
full speed and users happy, but the optimal solution can be hard to find.
That doesn't stop the kernel hackers from trying, however. The
anticipatory I/O scheduler work was covered here
a couple of weeks ago; now a new approach is
being tried which may improve I/O performance even more.
The technique being looked at is "stochastic fair queueing," and it is
intended to bring greater fairness to I/O scheduling decisions. In a fair
situation, all users of a particular drive would be able to execute about
the same number of I/O requests over a given time. This approach to
fairness gets rid of starvation problems, and ensures that all processes
can get some work done. The hope would be, for example, that a streaming
media application would be able to move its data without outages, even in
the presence of other, disk-intensive applications.
The stochastic fair queueing approach was first developed in the networking
world by Paul E. McKenney; his paper on the subject can be found on this page. In the
networking context, stochastic fair queueing tries to divide the available
bandwidth equally among all users. Ideally, a separate queue would be used
for each ongoing connection, but high-performance routers lack the
resources to do things that way. So a smaller number of queues is used,
with each connection being assigned to a queue via a hash function.
Packets are then taken from each queue in turn, dividing the bandwidth
between them. If two high-bandwidth connections happen to land on the same
queue, they will be penalized relative to the other queues; to address this
problem, the hash function is periodically changed to redistribute
connections among the queues. The algorithm works reasonably well and is
easy to make fast; the Linux networking code has had a stochastic queueing
module available for some time.
In the disk I/O context, the aim is to divide the available disk bandwidth
fairly between processes. The initial
implementation by Jens Axboe creates 64 subqueues for each block I/O
request queue, and distributes requests among the subqueues based on the
process ID of the requestor. (Actually, it uses the process ID of the
currently running process, which could, in some situations, not be the
originator of the request). When the time comes to dispatch requests, one
is taken from each subqueue, and the whole set is ordered before being sent
to the drive for execution.
Taking things even further, Jens has also posted a complete fair queueing scheduler, which does
away with the hash function used in the stochastic approach. Each process
has its own queue, and requests are taken equally from all queues. It is
hard to get fairer than that. Of course, as Jens points out, once you have
this infrastructure in place, it is relatively easy to make things less
fair again by adding, say, I/O priorities to processes.
Where this all appears to be heading (though probably not in the 2.5
series) is toward a configurable I/O scheduler with several possible
algorithms which can be mixed and matched according to a site's local
policy. In other words, it looks a lot like the traffic control code which
has existed in the networking subsystem for a few years. As with
networking, most sites will probably not need to tweak their disk
scheduling regimes. Users with special needs, however, will be glad for
the ability to fine-tune things to their specifications.
(
Log in to post comments)