Disk I/O priorities
[Posted November 11, 2003 by corbet]
Linux has long had a priority mechanism which controls access to the
processor(s). Other system resources, however, are not so easily managed.
Often, the real performance bottleneck is not the processor, but some other
resource, such as I/O bandwidth to a disk drive. If disk I/O is the real
limiting factor, even a very low-priority process can, by creating many I/O
requests, strongly affect the performance of higher-priority processes on
the system.
Jens Axboe has now taken a stab at the I/O priority issue with a new version of his "completely fair queueing"
(CFQ) I/O scheduler. We first mentioned the
CFQ scheduler back in February; it works by creating a separate request
queue for every process issuing disk I/O and taking an equal number of
requests from each one of them. In this way, it seeks to distribute the
available I/O bandwidth equally across processes in the system and produce
"completely fair" results.
The new version gives each process an I/O priority, which is a number
between zero and 20 (inclusive). At the bottom end, disk I/O is only
allowed when the
disk would otherwise be idle. A priority of 20, instead, is the
"real-time" level; all requests at that level are satisfied before any
other requests are considered. The levels in between are for normal
processes; by default, the I/O priority is set to 10. A pair of system
calls has been added to adjust the I/O priority of a process, though the
form of those calls is likely to change in the future.
Internally, the per-process request queues have now been divided into an
array of 21 lists, one for each priority level. There is also a dispatch
queue, which contains the requests which have been selected for processing
next. A separate dispatch queue is still needed to allow some amount of
request ordering and merging.
When the time comes to fill the dispatch queue, the new scheduler starts
with the real-time queue. If requests are waiting there, they go straight
into the dispatch queue and the process is complete. There is also an
anticipatory scheduling feature for real-time requests: when the last
real-time request is processed, the scheduler will wait a short period
(10ms, currently) to see if any more real-time requests show up before
opening the floodgates for everybody else.
In the absence of real-time requests, the code passes through each priority
level, taking a decreasing number of requests from each one. Each process
gets to contribute one request at a time to the dispatch queue until the
quota for its priority level (expressed in both the number of requests and
the number of sectors to transfer) has been reached. Requests are only
taken from the idle priority queue if no other requests have been
dispatched for a configurable period of time (default 100ms).
With the new CFQ scheduler, an I/O request may not be serviced even after
it makes it into the dispatch queue. If a new request with real-time
priority shows up, all lower-priority requests are yanked back out of the dispatch
queue and have to go through the whole process again. Similarly, any
non-idle requests will cause any pending idle-priority requests to lose
their place in the dispatch queue.
The new scheduler appears to be uncontroversial - though it clearly is not
a critical fix and thus won't go into 2.6.0. The real debate appears to be
over how I/O priorities should be controlled. Some commenters would like
to see the nice() system call apply to I/O priorities as well as
CPU priorities. That, however, would be a fairly fundamental ABI change,
and is unlikely to happen.
(
Log in to post comments)