|
|
Log in / Subscribe / Register

The continuing development of I/O scheduling

The 2.5 development series has seen a great deal of work aimed at improving the performance of the block I/O subsystem. Recently there has been a resurgence of interest in I/O scheduling - deciding which disk I/O requests to process in which order. Optimal scheduling can keep the disks running at full speed and users happy, but the optimal solution can be hard to find. That doesn't stop the kernel hackers from trying, however. The anticipatory I/O scheduler work was covered here a couple of weeks ago; now a new approach is being tried which may improve I/O performance even more.

The technique being looked at is "stochastic fair queueing," and it is intended to bring greater fairness to I/O scheduling decisions. In a fair situation, all users of a particular drive would be able to execute about the same number of I/O requests over a given time. This approach to fairness gets rid of starvation problems, and ensures that all processes can get some work done. The hope would be, for example, that a streaming media application would be able to move its data without outages, even in the presence of other, disk-intensive applications.

The stochastic fair queueing approach was first developed in the networking world by Paul E. McKenney; his paper on the subject can be found on this page. In the networking context, stochastic fair queueing tries to divide the available bandwidth equally among all users. Ideally, a separate queue would be used for each ongoing connection, but high-performance routers lack the resources to do things that way. So a smaller number of queues is used, with each connection being assigned to a queue via a hash function. Packets are then taken from each queue in turn, dividing the bandwidth between them. If two high-bandwidth connections happen to land on the same queue, they will be penalized relative to the other queues; to address this problem, the hash function is periodically changed to redistribute connections among the queues. The algorithm works reasonably well and is easy to make fast; the Linux networking code has had a stochastic queueing module available for some time.

In the disk I/O context, the aim is to divide the available disk bandwidth fairly between processes. The initial implementation by Jens Axboe creates 64 subqueues for each block I/O request queue, and distributes requests among the subqueues based on the process ID of the requestor. (Actually, it uses the process ID of the currently running process, which could, in some situations, not be the originator of the request). When the time comes to dispatch requests, one is taken from each subqueue, and the whole set is ordered before being sent to the drive for execution.

Taking things even further, Jens has also posted a complete fair queueing scheduler, which does away with the hash function used in the stochastic approach. Each process has its own queue, and requests are taken equally from all queues. It is hard to get fairer than that. Of course, as Jens points out, once you have this infrastructure in place, it is relatively easy to make things less fair again by adding, say, I/O priorities to processes.

Where this all appears to be heading (though probably not in the 2.5 series) is toward a configurable I/O scheduler with several possible algorithms which can be mixed and matched according to a site's local policy. In other words, it looks a lot like the traffic control code which has existed in the networking subsystem for a few years. As with networking, most sites will probably not need to tweak their disk scheduling regimes. Users with special needs, however, will be glad for the ability to fine-tune things to their specifications.


to post comments

The continuing development of I/O scheduling

Posted Feb 13, 2003 12:08 UTC (Thu) by zmi (guest, #4829) [Link]

I hope that this development goes even further until the release of
2.6.0:

- Use SFQ for disk accesses
- Use SFQ for net accesses

But it should be possible to set "per process net/disk priorities". Root
should be able to run a process with "nice 5", "net nice 5" and "disk
nice -5". It should also be possible to define default values for users
and/or processes.

That way, you could let run Apache with higher priority than your backup,
and backup higher than user processes, and so on. You could also very
easily look a high bandwith video without frame drops.

That would be really nice. I've often had the annoying fact that running
a big copy operation almost blocked the machine, and all other processes
had to wait for I/O. Setting that copy a low prio would solve all that at
once.

Great work, if only it comes true...

zmi

The continuing development of I/O scheduling

Posted Feb 13, 2003 17:19 UTC (Thu) by rwmj (subscriber, #5474) [Link] (1 responses)

But didn't His Billness tell us that open source developers cannot innovate? Which feature of NT is Jens slavishly copying?

The continuing development of I/O scheduling

Posted Feb 13, 2003 18:17 UTC (Thu) by smoogen (subscriber, #97) [Link]

[Can't resist troll... so pretty.. so beautiful.]

Jens is of course copying things from a release in the future that he got because his older self used NT-Time-Travel 3.0 to try and be the first.

The continuing development of I/O scheduling

Posted Feb 14, 2003 3:02 UTC (Fri) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (1 responses)

Much as I hate to admit it ;-), straight fair queuing is normally
preferable to SFQ if you can enumerate the competing entities
(such as users, processes, threads, or whatever).

SFQ was motivated by the situation in a network router, where there
is/was no nice way to count the number of TCP sessions passing through
the router.

That said, SFQ does have the nice property that you can quickly
change the hash function, thus changing the focus of the fairness
(e.g., from users to processes). OTOH, I bet that the people working
on this will have no problem implementing disk I/O scheduling modules
that can be stacked and unstacked in real time on a running system.

The continuing development of I/O scheduling

Posted Feb 14, 2003 6:17 UTC (Fri) by cpeterso (guest, #305) [Link]

On the LKML list, they are already discussing on to define "fair". That is, fair between users, processes, or ?? The current implementation of CFQ provides a separate IO queue for every process, but for a multi-user system you might want separate IO queues for every user.

The continuing development of I/O scheduling

Posted Feb 15, 2003 19:29 UTC (Sat) by cotte (subscriber, #7812) [Link]

Wow! I find "fair" IO scheduling a very good idea. On our architecture
(s390[x]) we commonly use _some_ disks in parallel and do frequently have
high IO load. We've seen a huge improvement in IO scalability thru the
development progress 2.2->2.4->2.5, and current 2.5 kernels seem to get
close to the optimum. And on PCs, one can now easily work with a machine
that does "updatedb" without too much performance penalty.
Congratulations, outstanding work!

Carsten Otte


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds