One of the main selling points touted by many Linux-oriented vendors is
stability. Once a customer buys a subscription for an "enterprise" Linux
or embedded systems product, the vendor will fix bugs in the software but
otherwise keep it stable. The value for customers is that they can put
these supported distributions into important parts of their operations (or
products) secure in the knowledge that their supplier will provide updates
which keep the system bug-free and secure without breaking things. This
business model predates Linux by many years, but, as the success of certain
companies shows, there is still demand for this sort of service.
So it is interesting that, at the recently-concluded Linux Foundation
Collaboration Summit, numerous people were heard expressing concerns about
this model. Grumbles were voiced in the official panels and over beer in
the evening; they came from representatives of the relevant vendors, their
customers, and from not-so-innocent bystanders. The "freeze and support"
model has its merits, but there appears to be a growing group of people who
are wondering if it is the best way to support a fast-moving system like
Linux.
The problem is that there is a great deal of stress between the "completely
stable" ideal and the desire for new features and hardware support. That
leads to the distribution of some interesting kernels. Consider, for
example, Red Hat Enterprise Linux 4, which was released
in February, 2005, with a stabilized 2.6.9 kernel. RHEL4 systems are still
running a 2.6.9 kernel, but it has seen a few changes:
- Update
1 added a disk-based crash dump facility (requiring driver-level
support), a completely new Megaraid driver, a number of block I/O
subsystem and driver changes to support filesystems larger than 2TB,
and new versions of a dozen or so device drivers.
- Update
2 threw in SystemTap, an updated ext3 filesystem, the in-kernel
key management subsystem, a new OpenIPMI module, a new audit
subsystem, and about a dozen updated device drivers.
- For update
3, Red Hat added the InfiniBand subsystem, access control list
support, the error detection and correction (EDAC) subsystem, and
plenty of updated drivers.
- Update
4 added WiFi protected access (WPA) capability, ACL support in
NFS, support for a number of processor models and low-level chipsets,
and a large number of new and updated drivers.
The end result is that, while running uname -r on a RHEL4
system will yield
"2.6.9", what Red Hat is shipping is a far cry from the original
2.6.9 kernel, and, more to the point, it is far removed from the kernel
shipped with RHEL4 when it first became available. This enterprise kernel
is not quite as stable as one might have thought.
Greg Kroah-Hartman recently posted an
article on this topic which makes it clear that Red Hat is not alone in
backporting features into its stable kernels:
An example of how this works can be seen in the latest Novell
SLES10 Service Pack 1 release. Originally the SLES10 kernel was
based on the 2.6.16 kernel release with a number of bugfixes added
to it. At the time of the Service Pack 1 release, it was still
based on the 2.6.16 kernel version, but the SCSI core, libata core,
and all SATA drivers were backported from the 2.6.20 kernel.org
kernel release to be included in this 2.6.16 based kernel
package. This changed a number of ABI issues for any external SCSI
or storage driver that they would need to be aware of when
producing an updated version of their driver for the Service Pack 1
release.
Similar things have been known to happen in
the embedded world. In every case, the distributors are responding to two
conflicting wishes expressed by their customers: those customers want
stability, but they also want useful new features and support for new
hardware. This conflict forces distributors to walk a fine line, carefully
backporting just enough new stuff to keep their customers happy without
breaking things.
The word from the summit is that this balancing act does not always work.
There were stories of production systems falling over after updates were
applied - to the point that some high-end users are starting to reconsider
their use of Linux in some situations. It is hard to see how this problem
can be fixed: the backporting of code is an inherently risky operation. No
matter how well the backported code has been tested, it has not been
tested in the older environment into which it has been transplanted. This
code may depend on other, seemingly unrelated fixes which were merged at
other times; all of those fixes must be picked up to do the backport
properly. It is
also not the same code which is found in current kernels;
distributor-private changes will have to be made to get the backported code
to work with the older kernel. Backporting code can only serve to
destabilize it, often in obscure ways which do not come to light until some
important customer attempts to put it into production.
All of this argues against the backporting of code into the stabilized
kernels used in long-term-support distributions. But customer demand for
features, and (especially) hardware support will not go away. In fact, it
is likely to get worse. Quoting Greg again:
For machines that must work with new hardware all the time (laptops
and some desktops), the 12-18 month cycle before adding new device
support makes them pretty much impossible to use at
times. (i.e. people want you to support the latest toy they just
bought from the store.) This makes things like "enterprise" kernels
that are directed toward desktops quite uncomfortable to use after
even a single year has passed.
So, if one goes on the assumption that the Plan For World Domination
includes moving Linux out of the server room onto a wider variety of
systems, the pressure for additional hardware support in "stabilized"
kernels can only grow.
What is to be done? Greg offers three approaches, the first two of which
are business as usual and the elimination of backports. The disadvantages
of the first option should have been made clear by now; going to a "bug
fixes only" mode has its appeal, but the resulting kernels will look
old and obsolete in a very short time. Greg's third option is one which
your editor heard advocated by several people at the Collaboration summit:
the long-term-support distributions would simply move to a current kernel
every time they do a major update.
Such a change would have obvious advantages: all of the new features and
new drivers would come automatically, with no need for backporting.
Distributors could focus more on stabilizing the mainline, knowing that
those fixes would get to their customers quickly. Many more bug fixes
would get into kernel updates in general; no distributor can possibly hope
to backport even a significant percentage of the fixes which get into the
mainline. The attempt to graft an old support model better suited to
proprietary systems would end, and long-term support Linux customers would
get something that looks more like Linux.
Of course, there may be some disadvantages as well. Dave Jones has expressed some
discomfort with this idea:
The big problem with this scenario is that it ignores the fact that
kernel.org kernels are on the whole significantly less stable these
days than they used to be. With the unified development/stable
model, we introduce a lot of half-baked untested code into the
trees, and this typically doesn't get stabilised until after a
distro rebases to that kernel for their next release, and uncovers
all the nasty problems with it whilst it's in beta. As well as
pulling 'all bugfixes and security updates', a rebase pulls in all
sorts of unknown new problems.
As Dave also notes, some mainline kernel releases are better than others;
the current 2.6.21 kernel would probably not be welcomed in many stable
environments. So any plan which involved upgrading to current kernels
would have to give some thought to the problem of ensuring that those
kernels are suitably stable.
Some of the key ideas to achieve that goal may already be in place. There
was talk at the summit of getting the long-term support vendors to
coordinate their release schedules to be able to take advantage of an
occasional extra-stable kernel release cycle. It has often been suggested
that the kernel could go to an even/odd cycle model, where even-numbered
releases are done with stability as the primary goal. Such a cycle could
work well for distributors; an odd release could be used in beta
distribution releases, with the idea of fixing the resulting bugs for the
following even release. The final distribution release (or update) would
then use the resulting stable kernel. There is opposition to the even/odd
idea, but that could change if the benefits become clear enough.
Both Greg and Dave consider the effects such a change would have on the
providers of binary-only modules. Greg thinks that staying closer to the
upstream would make life easier by reducing the number of kernel variants
that these vendors have to support. Dave, instead, thinks that binary-only
modules would break more often, and "This kind of breakage in an
update isn't acceptable for the people paying for those expensive support
contracts." If the latter position proves true, it can be seen as
an illustration of the costs imposed on the process by proprietary modules.
Dave concludes with the thought that the status quo will not change anytime
soon. Certainly distribution vendors would have to spend a lot of time
thinking and talking with their customers before making such a fundamental
change in how their products are maintained. But the pressures for change
would appear to be strong, and customers may well conclude that they would
be better off staying closer to the mainline. Linux and free software have
forced many fundamental changes in how the industry operates; we may yet
have a better solution to the long-term support problem as well.
(
Log in to post comments)