By Jonathan Corbet
February 28, 2012
Control groups are one of those features that kernel developers love to
hate. It is not that hard to find developers complaining about control
groups and even threatening to rip them out of the kernel some dark night
when nobody is paying attention. But it is much less common to see
explicit discussion around the aspects of control groups that these
developers find offensive or what might be done to improve the situation.
A recent linux-kernel discussion may have made some progress in that
direction, though.
The control group mechanism is really just a way for a system administrator
to partition processes into one or more hierarchies of groups. There can
be multiple hierarchies, and a given process can be placed into more than
one of them at any given time. Associated with control groups is the
concept of "controllers," which apply some sort of policy to a specific
control group hierarchy. The group scheduling controller allows control groups to
contend against each other for CPU time, limiting the extent to which one
group can deprive another of time in the processor. The memory controller
places limits on how much memory and swap can be consumed by any given
group. The network priority controller,
merged for 3.3, allows an administrator to give different groups better or
worse access to network interfaces. And so on.
Tejun Heo started the discussion with a lengthy
message describing his complaints with the control group mechanism and
some thoughts on how things could be fixed. According to Tejun, allowing
the existence of multiple process hierarchies was a mistake. The idea
behind multiple hierarchies is that they allow different policies to be
applied using different criteria. The documentation added with control
groups at the beginning gives an example with two distinct hierarchies that
could be implemented on a university system:
- One hierarchy would categorize processes based on the role of their
owner; there could be a group for students, one for faculty, and one
for system staff. Available CPU time could then be divided between
the different types of users depending on the system policy;
professors would be isolated somewhat from student activity, but,
naturally, the system staff would get the lion's share.
- A second hierarchy would be based on the program being executed by any
given process. Web browsers would go into one group while, say,
bittorrent clients could be put into another. The available network
bandwidth could then be split according to the administrator's view of
each class of application.
On their face, multiple hierarchies provide a useful level of flexibility
for administrators to define all kinds of policies. In practice, though,
they complicate the code and create some interesting issues. As Tejun
points out, controllers can only be assigned to one hierarchy. For
controllers implementing resource allocation policies, this restriction
makes some sense; otherwise, processes would likely be subjected to
conflicting policies when placed in multiple hierarchies. But there are
controllers that exist for other purposes; the "freezer" controller simply
freezes the processes found in a control group, allowing them to be
checkpointed or otherwise operated upon. There is no reason why this kind
of feature could not be available in any hierarchy, but making that
possible would complicate the control group implementation significantly.
The real complaint with multiple hierarchies, though, is that few
developers seem to see the feature as being useful in actual, deployed
systems. It is not clear that it is being used. Tejun suggests that this
feature could be phased out, perhaps with a painfully long deprecation
period. In the end, Tejun said, the control group hierarchy could
disappear as an independent structure, and control groups could just be
overlaid onto the
existing process hierarchy. Some others disagree, though; Peter Zijlstra
said "I rather like being
able to assign tasks to cgroups of my making without having to mirror
that in the process hierarchy." So the ability to have a control
group hierarchy that differs from the process hierarchy may not go away,
even if the multiple-hierarchy feature does eventually become deprecated.
A related problem that Tejun raised is that different controllers treat the
control group hierarchy differently. In particular, a number of
controllers seem to have gone through an evolutionary path where the
initial implementation does not recognize nested control groups but,
instead, simply flattens the hierarchy. Later updates may add full
hierarchical support. The block I/O controller, for example, only finished the job with hierarchical support
last year; others still have not done it. Making the system work properly,
Tejun said, requires getting all of the controllers to treat the hierarchy
in the same way.
In general, the controllers have been the source of a lot of grumbling over
the years. They tend to be implemented in a way that minimizes their
intrusiveness on systems where they are not used - for good reason - but
that leads to poor integration overall. The memory controller, for
example, created its own shadow page tracking system, leading to a resource-intensive mess that was only
cleaned up for the 3.3 release. The hugetlb controller is not integrated
with the memory controller, and, as of 3.3, we have two independent network
controllers. As the number of small controllers continues to grow (there
is, for example, a proposed timer slack
controller out there), things can only get more chaotic.
Fixing the controllers requires, probably more than anything else, a person
to take on the role as the overall control group maintainer. Tejun and Li
Zefan are credited with that role in the MAINTAINERS file, but it is still
true that, for the most part, control groups have nobody watching over
the whole system, so
changes tend to be made by way of a number of different subsystems. It is
an administrative problem in the end; it should be amenable to solution.
Fixing control groups overall could be harder, especially if the
elimination of the multiple-hierarchy feature is to be contemplated. That,
of course, is a user-space ABI change; making it happen would take years,
if it is possible at all. Tejun suggests "herding people to use a
unified hierarchy," along with a determined effort to make all of
the controllers handle nesting properly. At some point, the kernel could
start to emit a warning when multiple hierarchies are used. Eventually, if
nobody complains, the feature could go away.
Of course, if nobody is actually using multiple hierarchies, things could
happen a lot more quickly. Nobody entered the discussion to say that they
needed multiple hierarchies, but, then, it was a discussion among kernel
developers and not end users. If there are people out there using the
multiple hierarchy feature, it might not be a bad idea to make their use
case known. Any such use cases could shape the future evolution of the
control group mechanism; a perceived lack of such use cases could have a
very different effect.
(
Log in to post comments)