Anticipatory I/O scheduling
[Posted January 28, 2003 by corbet]
If an operating system is to perform well, it must get top performance out
of its disk drives. The best use of disks is generally obtained by
recognizing a couple of basic facts of life:
- Disk seeks are very slow. Overall transfer rates go up significantly
if requests can be ordered to minimize the number and distance of seek
operations.
- Write operations can (usually) happen whenever, but there is almost
always a process waiting for the completion of a read. Prioritizing
reads over writes will increase the parallelism and perceived
responsiveness of a system.
Unfortunately, minimizing seeks and prioritizing reads can be somewhat
contradictory goals. The way to keep seeks small and relatively rare is to
maintain a sizeable backlog of requests; that way, nearby requests can be
grouped and executed together. Getting the best performance on writes, in
other words, requires delaying write operations for a period of time.
If reads are to be handled quickly, however, they cannot be delayed and
kept in the request queue in this way.
It is even worse than that, actually. A process which is writing a file
can have several outstanding write requests at any given time, since that
process usually does not care when any particular request is completed.
Processes issuing reads, on the other hand, usually generate them one at a
time. The next read operation will not be requested until the previous one
completes. So, even if delaying read operations were an acceptible thing
to do, accumulating a backlog of close read operations is an unlikely
proposition.
The 2.4 kernel tends to mix reads in with the queue of write operations.
Given the way things work (a read is not particularly likely to be close to
an unrelated set of writes), reads tend to get put toward the end of the
queue and executed slowly. That is one reason why 2.4 can be slow to
respond when there is a lot of write activity going on.
In the 2.5 kernel, reads are not allowed to languish long before they are
pushed to the head of the queue and executed. This change can improve
performance significantly, but it still does not solve the whole problem.
As Andrew Morton put it in the 2.5.59-mm5
patch set:
So far so good, but these fixes are still dumb. Because we're
solving the dependent read problem by creating a seek storm. Every
time someone submits a read, we stop writing, seek over and service
the read, and then *immediately* seek back and start servicing
writes again.
But in the common case, the application which submitted a read is
about to go and submit another one, closeby on-disk to the first.
So whoops, we have to seek back to service that one as well.
As a result, overall performance suffers, since the disk is spending too
much time seeking.
2.5.59-mm5 contains a new "anticipatory I/O scheduler" by Nick Piggin which
attempts to address this problem. The basic idea is simple: if the drive has
just handled a read request, assume that there is another one coming behind
it and simply wait for a little bit. In this case, the request queue is
plugged, the I/O scheduler sets a timer, and no more requests are passed
down to the drive for a millisecond or so. If a "close" request shows up
during the wait
time, it is serviced right away; the distance that the kernel considers
"close" grows as time passes. Eventually the close requests will stop
coming, or the kernel will decide that it's time to get around to some
writes regardless; at that point normal request dispatching resumes.
Andrew reports a big improvement in performance (nearly a factor of six)
over 2.5.59 for a simple test he was running. The code, it is said, still
needs "quite some work," but it's a good start. 2.6 looks on track to be
the most responsive Linux kernel yet.
(
Log in to post comments)