There has recently been much attention paid to the group CPU scheduling
feature built into the Linux kernel. Using group scheduling, it is
possible to ensure that some groups of processes get a fair share of the
CPU without being crowded out by a rather larger number of CPU-intensive
processes in a different group. Linux has supported this feature for some
years, but it has languished in relative obscurity; it is only with recent
efforts to make group scheduling "just work" that it has started to come
into wider use. As it happens, the kernel has a very similar feature for
managing access to block I/O devices which is also, arguably, underused.
In this case, though, I/O group scheduling is not as completely implemented
as CPU scheduling, but some ongoing work may change that situation.
The "completely fair queueing" (CFQ) I/O scheduler tries to divide the
available bandwidth on any given device fairly between the processes which
are contending for that device. "Bandwidth" is measured not in the number
of bytes transferred, but the amount of time that each process gets to
submit requests to the queue; in this way, the code tries to penalize
processes which create seek-heavy I/O patterns. (There is also a mode based
solely on the number of I/O operations submitted, but your editor suspects
it sees relatively little use). The CFQ scheduler also supports group
scheduling, but in an incomplete way.
Imagine the group hierarchy shown on the right; here we have three control
groups (plus the default root group), and four processes running within
those groups. If every process were contending fully for the available I/O
bandwidth, and they all had the same I/O priority, one would expect that
bandwidth to be split equally between
P0, Group1, and Group2; thus P0 should
get twice as much I/O bandwidth as either P1 or P3. If more
processes were to be added to the root, they should be able to take I/O
bandwidth at the expense of the processes in the other control groups.
Similarly, the creation of new control groups underneath Group1
should not affect anybody outside of that branch of the hierarchy. In
current kernels, though, that is not how things work.
With the current implementation of CFQ group scheduling, the above
hierarchy is transformed into something that looks like this:
The CFQ group scheduler currently treats all groups - including the root
group - as being equal, at the same level in the hierarchy. Every group is
a top-level group. This level of grouping will be adequate for a number of
situations, but there will be other users who want the full hierarchical
model. That is why control groups were made to be hierarchical in the
first place, after all.
The hierarchical CFQ group scheduling patch
set from Gui Jianfeng aims to make that feature available. These
patches introduce a new cfq_entity structure which is used for
the scheduling of both processes and groups; it is clearly modeled after
the sched_entity structure used in the CPU scheduling code. With
this in place, the I/O scheduler can just give bandwidth to the top-level
cfq_entity which has run up the least "vdisktime" so far;
if that entity happens to be a group, the scheduling code drops down a
level and repeats the process. Sooner or later, the entity which is
scheduled for I/O will be an actual process, and the scheduler can start
dispatching I/O requests.
This patch set is on its fourth revision; the previous iterations have led
to significant changes. It appears that there are a few things to fix up
still, but this work seems to be getting closer to being ready.
One thing is worth bearing in mind: there are two I/O bandwidth controllers
in contemporary Linux kernels: the proportional bandwidth controller (built
into the CFQ scheduler) and the throttling controller built into the block
layer. The group scheduling changes only apply to the proportional
bandwidth controller. Arguably there is less need for full group
scheduling with the throttling controller, which puts absolute caps on the
bandwidth available to specific processes.
Controlling I/O bandwidth has a lot of applications; providing some
isolation between customers on a shared hosting service is an obvious
example. But this feature may yet prove to have value on the desktop as
well; many interactivity problems come down to contention for I/O
bandwidth. Anybody who has tried to start an office suite while
simultaneously copying a video image on the same drive understands how bad
it can be. If the group I/O scheduling feature can be made to "just work"
like the group CPU scheduling, we may have made another step toward a truly
responsive Linux desktop.
to post comments)