By Jonathan Corbet
September 19, 2012
One of the (many) classic essays in
The Mythical Man-Month by
Frederick Brooks is titled "Plan to throw one away." Our first solution to
a complex software development problem, he says, is not going to be fit for
the intended
purpose. So we will end up dumping it and starting over. The free
software development process is, as a whole, pretty good at the "throwing
it away" task; some would argue that we're
too good at it. But
there are times when throwing one away is hard; the current discussion
around control groups in the kernel shows how situation can come about.
What Brooks actually said (in the original edition) was:
In most projects, the first system built is barely usable. It may
be too slow, too big, awkward to use, or all three. There is no
alternative but to start again, smarting but smarter, and build a
redesigned version in which these problems are solved. The discard
and redesign may be done in one lump, or it may be done
piece-by-piece. But all large-system experience shows that it will
be done.
One could argue that free software development has taken this advice to
heart. In most projects of any significant size, proposed changes are
subjected
to multiple rounds of review, testing, and improvement. Often, a
significant patch set will go through enough fundamental changes that
it bears little resemblance to its initial version. In cases like this,
the new subsystem has, in a sense, been thrown away and redesigned.
In some cases it's even more explicit. The 2.2 kernel, initially, lacked
support for an up-and-coming new bus called USB. Quite a bit of work had
gone into the development of a candidate USB subsystem which, most people
assumed, would be merged sometime soon. Instead, in May 1999, Linus
looked at the code and decided to start over; the 2.2.7 kernel included a
shiny new USB subsystem that nobody had ever seen before. That code
incorporated lessons learned from the earlier attempts and was a better
solution — but even that version was eventually thrown away and replaced.
Brooks talks about the need for "pilot plant" implementations to turn up
the problems in the initial implementation. Arguably we have those in the
form of testing releases, development trees, and, perhaps most usefully,
early patches shipped by distributors. As our ability to test for
performance regressions grows, we should be able to do much of our
throwing-away before problems in early implementations are inflicted upon
users. For example, the 3.6 kernel was able to avoid a 20% regression in
PostgreSQL performance thanks to pre-release testing.
But there are times when the problem is so large and so poorly understood
that the only way to gain successful "pilot plant" experience is to ship
the best implementation we can come up with and hope that things can be
fixed up later. As long as the problems are internal, this fixing can
often be done without creating trouble for users. Indeed, the history of
most software projects (free and otherwise) can be seen as an exercise in
shipping inferior code, then reimplementing things to be slightly less
inferior and starting over again. The Linux systems we run today, in many
ways, look like those of ten years or so ago, but a great deal of code was
replaced in the time between when those systems were shipped.
But what happens when the API design is part of the problem? User
interfaces are hard to design and, when they turn out to be wrong, they can
be hard to fix. It turns out that users don't like it when things change
on them; they like it even less if their programs and scripts break in
the process. As a result, developers at all levels of the stack work hard to
avoid the creation of incompatible changes at the user-visible levels. It
is usually better to live with one's mistakes than to push the cost of
fixing them onto the user community.
Sometimes, though, those mistakes are an impediment to the creation of a
proper solution. As an example, consider the control groups (cgroups) mechanism
within the kernel. Control groups were first added to the 2.6.24 kernel
(January, 2008) as a piece of
the solution to the "containers" problem; indeed, they were initially
called "process containers." They have since become one of the most deeply
maligned parts of the kernel, to the point that some developers routinely
threaten to rip them out when nobody is looking. But the functionality
provided by control groups is useful and increasingly necessary, so it's
not surprising that developers are trying to identify and fix the problems
that have been exposed in the current ("pilot") control group
implementation.
As can be seen in this cgroup TODO list
posted by Tejun Heo, lot of those problems are internal in nature. Fixing
them will require a lot of changes to kernel code, but users should not
notice that anything has changed at all. But there are some issues that
cannot be hidden from users. In particular: (1) the cgroup design
allows for multiple hierarchies, with different controllers (modules that
apply policies to groups) working with possibly different views of the
process tree, and (2) the implementation of process hierarchies is
inconsistent from one controller to the next.
Multiple hierarchies seemed like an interesting feature at the outset; why
should the CPU usage controller be forced to work with the same view of the
process tree as, say, the memory usage controller? But the result is a
more complicated implementation that makes it nearly impossible for
controllers to coordinate with each other. The block I/O bandwidth
controller and the memory usage controller really need to share a view of
which control group "owns" each page in the system, but that cannot be done
if those two controllers are working with separate trees of control
groups. The hierarchy implementation issues also make coordination
difficult while greatly complicating the lives of system administrators
who need to try to figure out what behavior is actually implemented by each
controller. It is a mess that leads to inefficient implementations and
administrative hassles.
How does one fix a problem like this? The obvious answer is to force the
use of a single control group hierarchy and to fix the controllers to
implement their policies over hierarchies in a consistent manner. But both
of those are significant, user-visible API and behavioral changes. And,
once again, a user whose system has just broken tends to be less than
appreciative of how much better the implementation is.
In the past, operating system vendors have often had to face issues like
this. They have responded by saving up all the big changes for a major
system update; users learned to expect things to break over such updates.
Perhaps the definitive example was the transition from "Solaris 1"
(usually known as SunOS 4 in those days) to Solaris 2, which switched
the entire system from a BSD-derived base to one from ATT Unix. Needless
to say, lots of things broke in the process. In the
Linux world, this kind of transition still happens with enterprise
distributions; RHEL7 will have a great many incompatible changes from
RHEL6. But community distributions tend not to work that way.
More to the point, the components that make up a distribution are typically
not managed that way. Nobody in the kernel community wants to go back to
the pre-2.6 days when major features only got to users after a multi-year
delay. So, if problems like those described above are going to be fixed in
the kernel, the kernel developers will have to figure out a way to do it in
the regular, approximately 80-day development cycle.
In this case, the plan seems to be to prod users with warnings of upcoming
changes while trying to determine if anybody really has difficulties with
them. So, systems where multiple cgroup hierarchies are in use will emit
warnings to the effect that the feature is deprecated and inviting email
from anybody who objects. Similar warnings will be put into specific
controllers whose behavior is expected to change. Consider the memory
controller; as Tejun put it: "memcg asked itself the existential
question of to be hierarchical or not and then got confused and decided to
become both." The plan is to get distributors to carry a patch warning users of the non-hierarchical
mode and asking them to make their needs known if the change will truly be
a problem for them. In a sense, the distributors are being asked to run a
pilot for the new cgroup API.
It is possible that the community got lucky this time around; the features
that need to be removed or changed are not likely to be heavily used. In
other cases, there is simply no alternative to retaining the older,
mistaken design; the telldir() system call, which imposes heavy
implementation costs on filesystems, is a good example. We can never
preserve our ability to "throw one away" in all situations. But, as a
whole, the free software community has managed to incorporate Brooks's
advice nicely. We throw away huge quantities of code all the time, and we
are better off for it.
(
Log in to post comments)