As most LWN readers will be aware, the 2.6.21 kernel has been released.
The 2.6.21 process was relatively difficult, mostly as a result of the core
timer changes which went in. These changes were necessary - they are the
path forward to a kernel which works better on all types of hardware - but
they caused some significant delays in the release of the final 2.6.21
kernel. Even at release time, this kernel was known not to be perfect;
there were a dozen or so known regressions which had not been fixed.
The reason we know about these regressions is that Adrian Bunk has been
tracking them for the past few development cycles. Mr. Bunk has let it be known that he will not
be doing this tracking for future kernels. From his point of view, the
fact that the kernel was released with known regressions means that the
time spent tracking them was wasted. Why bother doing that work if it
doesn't result in the tracked problems being fixed?
What Mr. Bunk would like to see is a longer
stabilization period:
There is a conflict between Linus trying to release kernels every 2
months and releasing with few regressions. Trying to avoid
regressions might in the worst case result in an -rc12 and 4 months
between releases. If the focus is on avoiding regressions this has
to be accepted.
Here is where one finds the fundamental point of disagreement. The kernel
used to operate with long release cycles, but the "stable" kernels which
emerged at the end were not particularly well known for being regression
free. Downloading and running an early 2.4.x kernel should prove that
point to anybody who doubts it.
The reasoning behind the current development process (and the timing of the
2.6.21 release in particular), as stated by
Linus Torvalds is:
Regressions _increase_ with longer release cycles. They don't get
fewer.. This simply *does*not*work*. You might want it to work,
but it's against human psychology. People get bored, and start
wasting their time discussing esoteric scheduler issues which
weren't regressions at all.
In other words, holding up a release for a small number of known bugs
prevents a much larger set of fixes, updates, new features, additional
support, and so on from getting to the user base. Meanwhile, the
developers do not stop developing, and the pile of code to be merged in the
next cycle just gets larger, leading to even more problems when the
floodgates open. It would appear that most kernel developers believe that
it is better to leave the final problems for the stable tree and let the
development process move on.
The 2.6.21 experience might encourage a few small changes; in particular,
Linus has suggested that truly disruptive
changes should maybe have an entire development cycle to themselves. As a
whole, however, the process is not seen as being broken and is unlikely to
see any big "fixes."
For an entirely different example, let us examine the process leading to
the Emacs 22 release. Projects managed by the Free
Software Foundation have never been known for rapid or timely releases,
but, even with the right expectations in place, this Emacs cycle has been a
long one: the previous major release (version 21) was announced in
October, 2001. In those days, LWN was talking about the 2.4.11 kernel,
incorporation of patented technology into W3C standards, the upcoming
Mozilla 1.0 release, and the Gartner Group's characterization of Linux
as a convenient way for companies to negotiate lower prices from
proprietary software vendors. Things have moved on a bit since those days,
but Emacs 21 is still the current version.
The new Emacs major release was
recently scheduled for April 23, but it has not yet happened.
There is one significant issue in the way of this release: it seems that
there is a cloud over some of the code which was merged into the Emacs
Python editing mode. Until this code is either cleared or removed,
releasing Emacs would not be a particularly good idea. It also appears
that the wisdom of shipping a game called "Tetris" has been questioned anew
and is being run past the FSF's lawyers.
Before this issue came up, however, the natives in the Emacs development
community were getting a little restless. Richard Stallman may not do a
great deal of software development anymore, but he is still heavily
involved in the Emacs process. Emacs is still his baby. And this baby, it
seems, will not be released until it is free of known bugs. This approach
is distressing for Emacs developers who would like to make a release and
get more than five years' worth of development work out to the user
community.
This message From Emacs hacker Chong Yidong
is worth quoting at length:
To be fair, I think RMS' style of maintaining software, with long
release cycles and insistence on fixing all reported bugs, was
probably a good approach back in the 80s, when there was only a
handful of users with access to email to report bugs.
Nowadays, of course, the increase in the number of users with email
and the fact that Emacs CVS is now publicly available means that
there will always be a constant trickle of bug reports giving you
something to fix. Insisting---as RMS does---on fixing all reported
bugs, even those that are not serious and not regressions, now
means that you will probably never make a release.
It has often been said that "perfect" is the enemy of "good." That saying
does seem to hold true when applied to software release cycles; an attempt
to create a truly perfect release results in no release at all. Users do
not get the code, which does not seem like a "perfect" outcome to them.
Mr. Yidong has another observation which mirrors what was said in the
kernel discussion:
There is also a positive feedback loop: RMS' style for maintaining
Emacs drives away valuable contributors who feel their effects will
never be rewarded with a release (and a release is, after all, the
only reward you get from contributing to Emacs).
It's not only users who get frustrated by long development cycles; the
developers, too, find them tiresome. Projects which adopt shorter,
time-based release cycles rarely seem to regret the change. It appears
that there really are advantages to getting the code out there in a
released form. Your editor is not taking bets on when Emacs might move to
a bounded-time release process, though.
Comments (36 posted)
May 1, 2007
This article was contributed by Thomas Gleixner
The usage of proprietary operating systems in companies over the last 25
years has established a set of constraints which are not really applicable
to the way open source development - and Linux kernel development in
particular - works. My keynote talk ("
The Embedded Linux Nightmare")
at the
Embedded Linux Conference in Santa Clara addressed this mismatch; it
created quite a bit of discussion. I would like to follow up and add some
more details and thoughts about this topic.
Why follow mainline development?
The version cycles of proprietary operating systems are completely
different than the Linux kernel version cycles. Proprietary operating
systems have release cycles measured in years; the Linux kernel, instead,
is released about every three months with major updates to the
functionality and feature set and changes to internal APIs. This
fundamental difference is one of the hardest problems to handle for the
corporate mindset.
One can easily understand that companies try to apply the same mechanisms
which they applied to their formerly- (and still-) used operating systems
in order not to change procedures of development and quality
assurance. Jamming Linux into these existing procedures seems to be somehow
possible, but it is one of the main contributions to the embedded Linux
nightmare, preventing companies from tapping the full potential of open
source software. Embedded distribution vendors are equally guilty as
they try to keep up the illusion of the one-to-one replacement of
proprietary operating systems by creating heavily patched Linux Kernel
variants.
It is undisputed that kernel versions need to be frozen for product
releases, but it can be observed that those freezes are typically done very
early in the development cycle and are kept across multiple versions of the
product or product family. These freezes, which are the vain attempt to
keep the existing procedures alive, lead to backports of features found in
newer kernel versions and create monsters which put the companies into
the isolated situation of maintaining their unique fork forever, without
the help of the community.
I was asked recently whether a backport of the new upcoming wireless
network stack into Linux 2.6.10 would be possible. Of course it is
possible, but it does not make any sense at all. Backporting such a feature
requires backporting other changes in the network stack and many other
places of the kernel as well, making it even more complex to verify and
maintain. Each update and bug fix in the mainline code needs to be tracked
and carefully considered for backporting. Bugfixes which are made in the
backported code are unlikely to apply to later versions and are therefore
useless for others.
During another discussion about backporting a large feature into an old
kernel, I asked why a company would want to do that. The answer was: the
quality assurance procedures would require a full verification when the
kernel would be upgraded to a newer version. This is ridiculous. What level
of quality does such a process assure when there is a difference between
moving to a newer kernel version and patching a heavy feature set into an
old kernel? The risk of adding subtle breakage into the old kernel with a
backport is orders of magnitudes higher than the risk of breakage from an
up-to-date kernel release. Up-to-date kernels go through the community
quality assurance process; unique forks, instead, are excluded from this
free of charge service.
There is a fundamental difference between adding a feature to a
proprietary operating system and backporting a feature from a new Linux
kernel to an old one. A new feature of a proprietary operating system is
written for exactly the version which is enhanced by the feature. A new
feature for the Linux kernel is written for the newest version of the
kernel and builds upon the enhancements and features which have been
developed between the release of the old kernel and now. New Linux
kernel features are simply not designed for backporting.
I only can discourage companies from even thinking about such things.
The time spent doing backports and the maintenance of the resulting
unique kernel fork is better spent on adjusting the
internal development and quality assurance procedures to the way
in which the Linux kernel development process is done.
Otherwise it would be just another great example of a useless waste
of resources.
Benefits to companies from working with the kernel process
There are a lot of arguments made why mainlining code is not practicable in
the embedded world. One of the most commonly used arguments is that
embedded projects are one-shot developments and therefore mainlining is
useless and without value. My experience in the embedded area tells me,
instead, that most projects are built on previous projects and a lot of
products are part of a product series with different feature sets. Most
special-function semiconductors are parts of a product family and
development happens on top of existing parts. The IP blocks, which are the
base of most ASIC designs, are reused all over the place, so the code
to support those building blocks can be reused as well.
The one-shot project argument is a strawman for me. The real reasons are
the reluctance to give up control over a piece of code, the already
discussed usage of ancient kernel versions, the work which is related to
mainlining, and to some degree the fear of the unknown.
The reluctance to give up control over code is an understandable but
nevertheless misplaced relic of the proprietary closed source model.
Companies have to open up their modifications and extensions to the Linux
kernel and other open source software anyway when they ship their
product. So handing it over to the community in the first place should be
just a small step.
Of course mainlining of code is a fair amount of work and it forces
changes to the way how the development in companies works. There are
companies which have been through this change and they confirm that
there are benefits in it.
According to Andrew Morton, we change approximately 9000 lines of kernel
code per day, every day. That means that we touch something in the range of
3000 lines of code, when we take comments, blank lines and simple
reshuffling into account. The COCOMO estimate of the value of 3000 lines
of code is about $100k. So we have a total investment of $36 million per
year which flows into the kernel development. That's with all the relevant
factors set to 1. Taking David Wheelers
factors into account
would cause this figure to go up to $127 million.
This estimate does not take other efforts around the kernel into account,
like the test farms, the testing and documentation projects and the immense
number of (in)voluntary testers and bug reporters who "staff" the QA
department of the kernel.
Some companies realize the value of this huge cooperative investment and
add their own stake for the long term benefit. We recently had a
customer who asked if we could write a driver for an yet-unsupported
flash chip. His second question was whether we would try to feed it
back into the mainline. He was even willing to pay for the extra hours,
simply because he understood that it was helpful for him. This is a small
company with less than 100 employees and a definitely limited budget. But
they cannot afford the waste of maintaining even such small drivers out
of tree. I have seen such efforts of smaller companies quite often in
recent years and I really hold those folks in great respect.
Bigger players in the embedded market apparently have budgets large enough
to ignore the benefits of working with the community and just concentrate
on their private forks. This is unwise with respect to their own
investments, not to talk about the total disrespect for the values which are
given them by the community.
It is understandable that companies want to open the code for new products
very late in the product cycle, but there are ways to get this done
nevertheless. One is to work through a community proxy, such as
consultants or service providers, who know how kernel development works and
can help to make the code ready for inclusion from the very beginning.
The value of community-style development is in avoiding mistakes and the
benefit of the experience of other developers. Posting an early draft of
code for comment can be helpful for both code quality and development time.
The largest benefit of mainlining code is the automatic updates when the
kernel internal interfaces are changed and the enhancements and bugfixes
which are provided by users of the code. Mainlining code allows easy
kernel upgrades later in a product cycle when new features and
technologies have to be added. This is also true for security fixes, which
are eventually hard to backport.
Benefits to developers
I personally know developers who are not interested in working in the open
at all for a very dubious reason: as long as they have control over their
own private kernel fork, they are the undisputed experts for code on which
their company depends. If forced to hand over their code to the
community, they fear losing control and making themselves easier to
replace. Of course this is a short-sighted view, but it happens. These
developers miss the beneficial effect of gaining knowledge and expertise by
working together with others.
One of my own employees went through a ten-round review-update-review
cycle which ended
with satisfaction for both sides:
> Other than that I am very happy with this latest version. Great
> job! Thanks for your patience, I know it's always a bit
> frustrating when your code works well enough for yourself and you
> are still told to make many changes before it is acceptable
> upstream.
Well, I really appreciate good code quality. If this is the price,
I'm willing to pay it. Actually, I thank you for helping me so
much.
Over the course of this review cycle the code quality of the driver
improved; it also led to some general discussion about the affected
sensors framework and the improvement of it on the fly.
The developer improved his skills and he got an improved insight into
the framework with the result that his next project will definitely
have a much shorter review cycle. This growth makes him far more
valuable for the company than having him as the internal expert for
some "well it works for us" driver.
The framework maintainer benefited as well, as he needed to look at the
requirements of the new device and adjust the framework to handle it in a
generic way. This phenomenon is completely consistent with Greg
Kroah-Hartman's statement in his OLS
keynote last year:
We want more drivers, no matter how "obscure", because it
allows us to see patterns in the code, and realize how we
could do things better.
All of the above leads to a single conclusion: working with the kernel
development community is worth the costs it imposes in changes to internal
processes. Companies which work with the kernel developers get a kernel
which better meets their needs, is far more stable and secure, and which
will be maintained and improved by the community far into the future.
Those companies which choose to stay outside the process, instead, miss
many of the benefits of millions of dollars' worth of work being
contributed by others. Developers are able to take advantage of working
with a group of smart people with a strong dedication to code quality and
long-term maintainability.
It can be a winning situation for everybody involved - far better than
perpetuating the embedded Linux nightmare.
Comments (33 posted)
Once upon a time, there was a software firm named AppForge, Inc. This
company sold development tools for mobile platforms, allowing others to
create applications which would run on a number of different devices.
These were all proprietary tools for proprietary systems, and so wouldn't
normally be of interest on LWN. What has happened with AppForge turns out
to be worth a look, however.
It seems that AppForge went bankrupt back in March. So there will be no
support for AppForge's products going into the future. But, as it turns
out, it's
worse than that:
Crossfire licensing typically works by validating a serial number
against AppForge's server before installation on any new
device. Since AppForge went dark, end users have been unable to
provision new devices with software that they thought they
owned.
It does not take much searching to find forums full of AppForge customers
looking for ways to activate the product licenses they had already bought
and paid for. In the mean time, their businesses have come to a halt
because a core component of their products has suddenly been pulled out
from underneath them.
Adding the usual sanctimonious LWN sermon on the risks of using proprietary
software seems superfluous here.
More recently, Progeny Linux Systems ceased operations. This company,
which had based its hopes on a specialized, configurable version of the
Debian distribution aimed at appliance vendors, had been quiet for some
time. Founder Ian Murdock headed off to greener pastures (first the Free
Standards Group, then Sun) a while back. Press releases and other
communications had dried up. The last repository update posted to the
mailing lists happened in October, 2006. The DCC Alliance, a Progeny-led effort
to create a standard distribution based on Debian, has had no news to offer
since 2005. Now the company's web site
states that Progeny ceased operations on April 30.
Progeny seems to have lost out in the market to others with more
interesting offerings. Ubuntu declined to join the DCC Alliance for what
looks like a clear business reason: Ubuntu is becoming the standardized,
cleaned-up version of Debian that DCC wanted to be, and with predictable
releases as a bonus. Companies like rPath
appear to be finding more success at signing up customers in the appliance
market. With no wind in its sails, Progeny was unable to bring in the
revenue to keep going.
Progeny's customers, too, will lose the support offered by the company.
There will be no distribution upgrades, no security fixes, and nobody to
answer questions. This loss will clearly be a concern for any affected
customers, but those customers are in a very different position from those
who were dependent on AppForge tools. Since they were using a free
platform, nothing prevents Progeny's customers from continuing to ship
their products. These customers can also readily find companies (or
consultants) who can continue to support the Progeny platform, should they
need that support. The cost may be unwelcome, but the core truth remains:
any Progeny customer which has a need to keep the Progeny platform secure or fix
bugs in it will be able to do so.
The nature of the technology market is such that the failure of product
lines and entire companies is not an uncommon event. When one company
depends on another company's products, the risk of this sort of failure
must be kept in mind. That risk is far lower, however, when companies base
their products on free software.
(Thanks to Scott Preece for bringing the AppForge situation to our
awareness).
Comments (5 posted)
Page editor: Jonathan Corbet
Next page: Security>>