Anticipatory I/O scheduling
- Disk seeks are very slow. Overall transfer rates go up significantly
if requests can be ordered to minimize the number and distance of seek
operations.
- Write operations can (usually) happen whenever, but there is almost always a process waiting for the completion of a read. Prioritizing reads over writes will increase the parallelism and perceived responsiveness of a system.
Unfortunately, minimizing seeks and prioritizing reads can be somewhat contradictory goals. The way to keep seeks small and relatively rare is to maintain a sizeable backlog of requests; that way, nearby requests can be grouped and executed together. Getting the best performance on writes, in other words, requires delaying write operations for a period of time. If reads are to be handled quickly, however, they cannot be delayed and kept in the request queue in this way.
It is even worse than that, actually. A process which is writing a file can have several outstanding write requests at any given time, since that process usually does not care when any particular request is completed. Processes issuing reads, on the other hand, usually generate them one at a time. The next read operation will not be requested until the previous one completes. So, even if delaying read operations were an acceptible thing to do, accumulating a backlog of close read operations is an unlikely proposition.
The 2.4 kernel tends to mix reads in with the queue of write operations. Given the way things work (a read is not particularly likely to be close to an unrelated set of writes), reads tend to get put toward the end of the queue and executed slowly. That is one reason why 2.4 can be slow to respond when there is a lot of write activity going on.
In the 2.5 kernel, reads are not allowed to languish long before they are pushed to the head of the queue and executed. This change can improve performance significantly, but it still does not solve the whole problem. As Andrew Morton put it in the 2.5.59-mm5 patch set:
But in the common case, the application which submitted a read is about to go and submit another one, closeby on-disk to the first. So whoops, we have to seek back to service that one as well.
As a result, overall performance suffers, since the disk is spending too much time seeking.
2.5.59-mm5 contains a new "anticipatory I/O scheduler" by Nick Piggin which attempts to address this problem. The basic idea is simple: if the drive has just handled a read request, assume that there is another one coming behind it and simply wait for a little bit. In this case, the request queue is plugged, the I/O scheduler sets a timer, and no more requests are passed down to the drive for a millisecond or so. If a "close" request shows up during the wait time, it is serviced right away; the distance that the kernel considers "close" grows as time passes. Eventually the close requests will stop coming, or the kernel will decide that it's time to get around to some writes regardless; at that point normal request dispatching resumes.
Andrew reports a big improvement in performance (nearly a factor of six)
over 2.5.59 for a simple test he was running. The code, it is said, still
needs "quite some work," but it's a good start. 2.6 looks on track to be
the most responsive Linux kernel yet.
