User: Password:
|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for May 3, 2007

A tale of two release cycles

As most LWN readers will be aware, the 2.6.21 kernel has been released. The 2.6.21 process was relatively difficult, mostly as a result of the core timer changes which went in. These changes were necessary - they are the path forward to a kernel which works better on all types of hardware - but they caused some significant delays in the release of the final 2.6.21 kernel. Even at release time, this kernel was known not to be perfect; there were a dozen or so known regressions which had not been fixed.

The reason we know about these regressions is that Adrian Bunk has been tracking them for the past few development cycles. Mr. Bunk has let it be known that he will not be doing this tracking for future kernels. From his point of view, the fact that the kernel was released with known regressions means that the time spent tracking them was wasted. Why bother doing that work if it doesn't result in the tracked problems being fixed?

What Mr. Bunk would like to see is a longer stabilization period:

There is a conflict between Linus trying to release kernels every 2 months and releasing with few regressions. Trying to avoid regressions might in the worst case result in an -rc12 and 4 months between releases. If the focus is on avoiding regressions this has to be accepted.

Here is where one finds the fundamental point of disagreement. The kernel used to operate with long release cycles, but the "stable" kernels which emerged at the end were not particularly well known for being regression free. Downloading and running an early 2.4.x kernel should prove that point to anybody who doubts it.

The reasoning behind the current development process (and the timing of the 2.6.21 release in particular), as stated by Linus Torvalds is:

Regressions _increase_ with longer release cycles. They don't get fewer.. This simply *does*not*work*. You might want it to work, but it's against human psychology. People get bored, and start wasting their time discussing esoteric scheduler issues which weren't regressions at all.

In other words, holding up a release for a small number of known bugs prevents a much larger set of fixes, updates, new features, additional support, and so on from getting to the user base. Meanwhile, the developers do not stop developing, and the pile of code to be merged in the next cycle just gets larger, leading to even more problems when the floodgates open. It would appear that most kernel developers believe that it is better to leave the final problems for the stable tree and let the development process move on.

The 2.6.21 experience might encourage a few small changes; in particular, Linus has suggested that truly disruptive changes should maybe have an entire development cycle to themselves. As a whole, however, the process is not seen as being broken and is unlikely to see any big "fixes."

For an entirely different example, let us examine the process leading to the Emacs 22 release. Projects managed by the Free Software Foundation have never been known for rapid or timely releases, but, even with the right expectations in place, this Emacs cycle has been a long one: the previous major release (version 21) was announced in October, 2001. In those days, LWN was talking about the 2.4.11 kernel, incorporation of patented technology into W3C standards, the upcoming Mozilla 1.0 release, and the Gartner Group's characterization of Linux as a convenient way for companies to negotiate lower prices from proprietary software vendors. Things have moved on a bit since those days, but Emacs 21 is still the current version.

The new Emacs major release was recently scheduled for April 23, but it has not yet happened. There is one significant issue in the way of this release: it seems that there is a cloud over some of the code which was merged into the Emacs Python editing mode. Until this code is either cleared or removed, releasing Emacs would not be a particularly good idea. It also appears that the wisdom of shipping a game called "Tetris" has been questioned anew and is being run past the FSF's lawyers.

Before this issue came up, however, the natives in the Emacs development community were getting a little restless. Richard Stallman may not do a great deal of software development anymore, but he is still heavily involved in the Emacs process. Emacs is still his baby. And this baby, it seems, will not be released until it is free of known bugs. This approach is distressing for Emacs developers who would like to make a release and get more than five years' worth of development work out to the user community.

This message From Emacs hacker Chong Yidong is worth quoting at length:

To be fair, I think RMS' style of maintaining software, with long release cycles and insistence on fixing all reported bugs, was probably a good approach back in the 80s, when there was only a handful of users with access to email to report bugs.

Nowadays, of course, the increase in the number of users with email and the fact that Emacs CVS is now publicly available means that there will always be a constant trickle of bug reports giving you something to fix. Insisting---as RMS does---on fixing all reported bugs, even those that are not serious and not regressions, now means that you will probably never make a release.

It has often been said that "perfect" is the enemy of "good." That saying does seem to hold true when applied to software release cycles; an attempt to create a truly perfect release results in no release at all. Users do not get the code, which does not seem like a "perfect" outcome to them.

Mr. Yidong has another observation which mirrors what was said in the kernel discussion:

There is also a positive feedback loop: RMS' style for maintaining Emacs drives away valuable contributors who feel their effects will never be rewarded with a release (and a release is, after all, the only reward you get from contributing to Emacs).

It's not only users who get frustrated by long development cycles; the developers, too, find them tiresome. Projects which adopt shorter, time-based release cycles rarely seem to regret the change. It appears that there really are advantages to getting the code out there in a released form. Your editor is not taking bets on when Emacs might move to a bounded-time release process, though.

Comments (36 posted)

The embedded Linux nightmare - an epilogue

May 1, 2007

This article was contributed by Thomas Gleixner

The usage of proprietary operating systems in companies over the last 25 years has established a set of constraints which are not really applicable to the way open source development - and Linux kernel development in particular - works. My keynote talk ("The Embedded Linux Nightmare") at the Embedded Linux Conference in Santa Clara addressed this mismatch; it created quite a bit of discussion. I would like to follow up and add some more details and thoughts about this topic.

Why follow mainline development?

The version cycles of proprietary operating systems are completely different than the Linux kernel version cycles. Proprietary operating systems have release cycles measured in years; the Linux kernel, instead, is released about every three months with major updates to the functionality and feature set and changes to internal APIs. This fundamental difference is one of the hardest problems to handle for the corporate mindset.

One can easily understand that companies try to apply the same mechanisms which they applied to their formerly- (and still-) used operating systems in order not to change procedures of development and quality assurance. Jamming Linux into these existing procedures seems to be somehow possible, but it is one of the main contributions to the embedded Linux nightmare, preventing companies from tapping the full potential of open source software. Embedded distribution vendors are equally guilty as they try to keep up the illusion of the one-to-one replacement of proprietary operating systems by creating heavily patched Linux Kernel variants.

It is undisputed that kernel versions need to be frozen for product releases, but it can be observed that those freezes are typically done very early in the development cycle and are kept across multiple versions of the product or product family. These freezes, which are the vain attempt to keep the existing procedures alive, lead to backports of features found in newer kernel versions and create monsters which put the companies into the isolated situation of maintaining their unique fork forever, without the help of the community.

I was asked recently whether a backport of the new upcoming wireless network stack into Linux 2.6.10 would be possible. Of course it is possible, but it does not make any sense at all. Backporting such a feature requires backporting other changes in the network stack and many other places of the kernel as well, making it even more complex to verify and maintain. Each update and bug fix in the mainline code needs to be tracked and carefully considered for backporting. Bugfixes which are made in the backported code are unlikely to apply to later versions and are therefore useless for others.

During another discussion about backporting a large feature into an old kernel, I asked why a company would want to do that. The answer was: the quality assurance procedures would require a full verification when the kernel would be upgraded to a newer version. This is ridiculous. What level of quality does such a process assure when there is a difference between moving to a newer kernel version and patching a heavy feature set into an old kernel? The risk of adding subtle breakage into the old kernel with a backport is orders of magnitudes higher than the risk of breakage from an up-to-date kernel release. Up-to-date kernels go through the community quality assurance process; unique forks, instead, are excluded from this free of charge service.

There is a fundamental difference between adding a feature to a proprietary operating system and backporting a feature from a new Linux kernel to an old one. A new feature of a proprietary operating system is written for exactly the version which is enhanced by the feature. A new feature for the Linux kernel is written for the newest version of the kernel and builds upon the enhancements and features which have been developed between the release of the old kernel and now. New Linux kernel features are simply not designed for backporting.

I only can discourage companies from even thinking about such things. The time spent doing backports and the maintenance of the resulting unique kernel fork is better spent on adjusting the internal development and quality assurance procedures to the way in which the Linux kernel development process is done. Otherwise it would be just another great example of a useless waste of resources.

Benefits to companies from working with the kernel process

There are a lot of arguments made why mainlining code is not practicable in the embedded world. One of the most commonly used arguments is that embedded projects are one-shot developments and therefore mainlining is useless and without value. My experience in the embedded area tells me, instead, that most projects are built on previous projects and a lot of products are part of a product series with different feature sets. Most special-function semiconductors are parts of a product family and development happens on top of existing parts. The IP blocks, which are the base of most ASIC designs, are reused all over the place, so the code to support those building blocks can be reused as well.

The one-shot project argument is a strawman for me. The real reasons are the reluctance to give up control over a piece of code, the already discussed usage of ancient kernel versions, the work which is related to mainlining, and to some degree the fear of the unknown.

The reluctance to give up control over code is an understandable but nevertheless misplaced relic of the proprietary closed source model. Companies have to open up their modifications and extensions to the Linux kernel and other open source software anyway when they ship their product. So handing it over to the community in the first place should be just a small step.

Of course mainlining of code is a fair amount of work and it forces changes to the way how the development in companies works. There are companies which have been through this change and they confirm that there are benefits in it.

According to Andrew Morton, we change approximately 9000 lines of kernel code per day, every day. That means that we touch something in the range of 3000 lines of code, when we take comments, blank lines and simple reshuffling into account. The COCOMO estimate of the value of 3000 lines of code is about $100k. So we have a total investment of $36 million per year which flows into the kernel development. That's with all the relevant factors set to 1. Taking David Wheelers factors into account would cause this figure to go up to $127 million. This estimate does not take other efforts around the kernel into account, like the test farms, the testing and documentation projects and the immense number of (in)voluntary testers and bug reporters who "staff" the QA department of the kernel.

Some companies realize the value of this huge cooperative investment and add their own stake for the long term benefit. We recently had a customer who asked if we could write a driver for an yet-unsupported flash chip. His second question was whether we would try to feed it back into the mainline. He was even willing to pay for the extra hours, simply because he understood that it was helpful for him. This is a small company with less than 100 employees and a definitely limited budget. But they cannot afford the waste of maintaining even such small drivers out of tree. I have seen such efforts of smaller companies quite often in recent years and I really hold those folks in great respect.

Bigger players in the embedded market apparently have budgets large enough to ignore the benefits of working with the community and just concentrate on their private forks. This is unwise with respect to their own investments, not to talk about the total disrespect for the values which are given them by the community.

It is understandable that companies want to open the code for new products very late in the product cycle, but there are ways to get this done nevertheless. One is to work through a community proxy, such as consultants or service providers, who know how kernel development works and can help to make the code ready for inclusion from the very beginning.

The value of community-style development is in avoiding mistakes and the benefit of the experience of other developers. Posting an early draft of code for comment can be helpful for both code quality and development time. The largest benefit of mainlining code is the automatic updates when the kernel internal interfaces are changed and the enhancements and bugfixes which are provided by users of the code. Mainlining code allows easy kernel upgrades later in a product cycle when new features and technologies have to be added. This is also true for security fixes, which are eventually hard to backport.

Benefits to developers

I personally know developers who are not interested in working in the open at all for a very dubious reason: as long as they have control over their own private kernel fork, they are the undisputed experts for code on which their company depends. If forced to hand over their code to the community, they fear losing control and making themselves easier to replace. Of course this is a short-sighted view, but it happens. These developers miss the beneficial effect of gaining knowledge and expertise by working together with others.

One of my own employees went through a ten-round review-update-review cycle which ended with satisfaction for both sides:

	> Other than that I am very happy with this latest version. Great
	> job!  Thanks for your patience, I know it's always a bit
	> frustrating when your code works well enough for yourself and you
	> are still told to make many changes before it is acceptable
	> upstream.

	Well, I really appreciate good code quality. If this is the price,
	I'm willing to pay it. Actually, I thank you for helping me so
	much.

Over the course of this review cycle the code quality of the driver improved; it also led to some general discussion about the affected sensors framework and the improvement of it on the fly. The developer improved his skills and he got an improved insight into the framework with the result that his next project will definitely have a much shorter review cycle. This growth makes him far more valuable for the company than having him as the internal expert for some "well it works for us" driver.

The framework maintainer benefited as well, as he needed to look at the requirements of the new device and adjust the framework to handle it in a generic way. This phenomenon is completely consistent with Greg Kroah-Hartman's statement in his OLS keynote last year:

We want more drivers, no matter how "obscure", because it allows us to see patterns in the code, and realize how we could do things better.

All of the above leads to a single conclusion: working with the kernel development community is worth the costs it imposes in changes to internal processes. Companies which work with the kernel developers get a kernel which better meets their needs, is far more stable and secure, and which will be maintained and improved by the community far into the future. Those companies which choose to stay outside the process, instead, miss many of the benefits of millions of dollars' worth of work being contributed by others. Developers are able to take advantage of working with a group of smart people with a strong dedication to code quality and long-term maintainability.

It can be a winning situation for everybody involved - far better than perpetuating the embedded Linux nightmare.

Comments (33 posted)

A tale of two dead companies

Once upon a time, there was a software firm named AppForge, Inc. This company sold development tools for mobile platforms, allowing others to create applications which would run on a number of different devices. These were all proprietary tools for proprietary systems, and so wouldn't normally be of interest on LWN. What has happened with AppForge turns out to be worth a look, however.

It seems that AppForge went bankrupt back in March. So there will be no support for AppForge's products going into the future. But, as it turns out, it's worse than that:

Crossfire licensing typically works by validating a serial number against AppForge's server before installation on any new device. Since AppForge went dark, end users have been unable to provision new devices with software that they thought they owned.

It does not take much searching to find forums full of AppForge customers looking for ways to activate the product licenses they had already bought and paid for. In the mean time, their businesses have come to a halt because a core component of their products has suddenly been pulled out from underneath them.

Adding the usual sanctimonious LWN sermon on the risks of using proprietary software seems superfluous here.

More recently, Progeny Linux Systems ceased operations. This company, which had based its hopes on a specialized, configurable version of the Debian distribution aimed at appliance vendors, had been quiet for some time. Founder Ian Murdock headed off to greener pastures (first the Free Standards Group, then Sun) a while back. Press releases and other communications had dried up. The last repository update posted to the mailing lists happened in October, 2006. The DCC Alliance, a Progeny-led effort to create a standard distribution based on Debian, has had no news to offer since 2005. Now the company's web site states that Progeny ceased operations on April 30.

Progeny seems to have lost out in the market to others with more interesting offerings. Ubuntu declined to join the DCC Alliance for what looks like a clear business reason: Ubuntu is becoming the standardized, cleaned-up version of Debian that DCC wanted to be, and with predictable releases as a bonus. Companies like rPath appear to be finding more success at signing up customers in the appliance market. With no wind in its sails, Progeny was unable to bring in the revenue to keep going.

Progeny's customers, too, will lose the support offered by the company. There will be no distribution upgrades, no security fixes, and nobody to answer questions. This loss will clearly be a concern for any affected customers, but those customers are in a very different position from those who were dependent on AppForge tools. Since they were using a free platform, nothing prevents Progeny's customers from continuing to ship their products. These customers can also readily find companies (or consultants) who can continue to support the Progeny platform, should they need that support. The cost may be unwelcome, but the core truth remains: any Progeny customer which has a need to keep the Progeny platform secure or fix bugs in it will be able to do so.

The nature of the technology market is such that the failure of product lines and entire companies is not an uncommon event. When one company depends on another company's products, the risk of this sort of failure must be kept in mind. That risk is far lower, however, when companies base their products on free software.

(Thanks to Scott Preece for bringing the AppForge situation to our awareness).

Comments (5 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Security: IPv6 source routing: history repeats itself; New vulnerabilities in gimp, the kernel, tomcat, vim, and many other packages.
  • Kernel: Merged (and to be merged) for 2.6.22; UIO: user-space drivers; Large block size support.
  • Distributions: Looking into the future of Mandriva, Freespire and Linspire; new releases - Fedora 7 Test 4, OpenBSD 4.1 and YDL 5.0.1; Alinex
  • Development: Improved Linux debugging with Chronicle, Python 3000 PEP Parade, Adobe to open source Flex, new versions of SQLite, LCDproc, Apache SpamAssassin, SSL-Explorer, SilverStripe, Free-SA, Ardour, alsaplayer, eSpeak, jack_capture, GNOME, GARNOME, KJWaves, Cryptkeeper, SQL-Ledger, FreeCol, PyQt, Wine, pyliblo, Freevo, Wixi.
  • Press: U.S. Supreme Court patent decisions, wireless Linux robots, Akonadi Hacking Meeting, MySQL user conf, China Open Source Software Summit winners, Dell and Ubuntu, MySQL AB's IPO plans, OLPC for US Schools, GPLv3 Questions Answered, 64 bit audio workstation, functional languages, Alfresco CMS, Wikia open-source web search engine.
  • Announcements: Linux Foundation travel fund, Manifesto for Free Appliances. Coverity to scan 250 open-source projects, KOffice / KDE ODF Infrastructure Meeting, Libre Graphics Meeting, Life 2.0 keynotes announced, the Maker Faire, Cross Desktop Text Layout Summit, Cryptome shutdown by ISP.
Next page: Security>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds