The embedded Linux nightmare - an epilogue
Why follow mainline development?
The version cycles of proprietary operating systems are completely different than the Linux kernel version cycles. Proprietary operating systems have release cycles measured in years; the Linux kernel, instead, is released about every three months with major updates to the functionality and feature set and changes to internal APIs. This fundamental difference is one of the hardest problems to handle for the corporate mindset.
One can easily understand that companies try to apply the same mechanisms which they applied to their formerly- (and still-) used operating systems in order not to change procedures of development and quality assurance. Jamming Linux into these existing procedures seems to be somehow possible, but it is one of the main contributions to the embedded Linux nightmare, preventing companies from tapping the full potential of open source software. Embedded distribution vendors are equally guilty as they try to keep up the illusion of the one-to-one replacement of proprietary operating systems by creating heavily patched Linux Kernel variants.
It is undisputed that kernel versions need to be frozen for product releases, but it can be observed that those freezes are typically done very early in the development cycle and are kept across multiple versions of the product or product family. These freezes, which are the vain attempt to keep the existing procedures alive, lead to backports of features found in newer kernel versions and create monsters which put the companies into the isolated situation of maintaining their unique fork forever, without the help of the community.
I was asked recently whether a backport of the new upcoming wireless network stack into Linux 2.6.10 would be possible. Of course it is possible, but it does not make any sense at all. Backporting such a feature requires backporting other changes in the network stack and many other places of the kernel as well, making it even more complex to verify and maintain. Each update and bug fix in the mainline code needs to be tracked and carefully considered for backporting. Bugfixes which are made in the backported code are unlikely to apply to later versions and are therefore useless for others.
During another discussion about backporting a large feature into an old kernel, I asked why a company would want to do that. The answer was: the quality assurance procedures would require a full verification when the kernel would be upgraded to a newer version. This is ridiculous. What level of quality does such a process assure when there is a difference between moving to a newer kernel version and patching a heavy feature set into an old kernel? The risk of adding subtle breakage into the old kernel with a backport is orders of magnitudes higher than the risk of breakage from an up-to-date kernel release. Up-to-date kernels go through the community quality assurance process; unique forks, instead, are excluded from this free of charge service.
There is a fundamental difference between adding a feature to a proprietary operating system and backporting a feature from a new Linux kernel to an old one. A new feature of a proprietary operating system is written for exactly the version which is enhanced by the feature. A new feature for the Linux kernel is written for the newest version of the kernel and builds upon the enhancements and features which have been developed between the release of the old kernel and now. New Linux kernel features are simply not designed for backporting.
I only can discourage companies from even thinking about such things. The time spent doing backports and the maintenance of the resulting unique kernel fork is better spent on adjusting the internal development and quality assurance procedures to the way in which the Linux kernel development process is done. Otherwise it would be just another great example of a useless waste of resources.
Benefits to companies from working with the kernel process
There are a lot of arguments made why mainlining code is not practicable in the embedded world. One of the most commonly used arguments is that embedded projects are one-shot developments and therefore mainlining is useless and without value. My experience in the embedded area tells me, instead, that most projects are built on previous projects and a lot of products are part of a product series with different feature sets. Most special-function semiconductors are parts of a product family and development happens on top of existing parts. The IP blocks, which are the base of most ASIC designs, are reused all over the place, so the code to support those building blocks can be reused as well.
The one-shot project argument is a strawman for me. The real reasons are the reluctance to give up control over a piece of code, the already discussed usage of ancient kernel versions, the work which is related to mainlining, and to some degree the fear of the unknown.
The reluctance to give up control over code is an understandable but nevertheless misplaced relic of the proprietary closed source model. Companies have to open up their modifications and extensions to the Linux kernel and other open source software anyway when they ship their product. So handing it over to the community in the first place should be just a small step.
Of course mainlining of code is a fair amount of work and it forces changes to the way how the development in companies works. There are companies which have been through this change and they confirm that there are benefits in it.
According to Andrew Morton, we change approximately 9000 lines of kernel code per day, every day. That means that we touch something in the range of 3000 lines of code, when we take comments, blank lines and simple reshuffling into account. The COCOMO estimate of the value of 3000 lines of code is about $100k. So we have a total investment of $36 million per year which flows into the kernel development. That's with all the relevant factors set to 1. Taking David Wheelers factors into account would cause this figure to go up to $127 million. This estimate does not take other efforts around the kernel into account, like the test farms, the testing and documentation projects and the immense number of (in)voluntary testers and bug reporters who "staff" the QA department of the kernel.
Some companies realize the value of this huge cooperative investment and add their own stake for the long term benefit. We recently had a customer who asked if we could write a driver for an yet-unsupported flash chip. His second question was whether we would try to feed it back into the mainline. He was even willing to pay for the extra hours, simply because he understood that it was helpful for him. This is a small company with less than 100 employees and a definitely limited budget. But they cannot afford the waste of maintaining even such small drivers out of tree. I have seen such efforts of smaller companies quite often in recent years and I really hold those folks in great respect.
Bigger players in the embedded market apparently have budgets large enough to ignore the benefits of working with the community and just concentrate on their private forks. This is unwise with respect to their own investments, not to talk about the total disrespect for the values which are given them by the community.
It is understandable that companies want to open the code for new products very late in the product cycle, but there are ways to get this done nevertheless. One is to work through a community proxy, such as consultants or service providers, who know how kernel development works and can help to make the code ready for inclusion from the very beginning.
The value of community-style development is in avoiding mistakes and the benefit of the experience of other developers. Posting an early draft of code for comment can be helpful for both code quality and development time. The largest benefit of mainlining code is the automatic updates when the kernel internal interfaces are changed and the enhancements and bugfixes which are provided by users of the code. Mainlining code allows easy kernel upgrades later in a product cycle when new features and technologies have to be added. This is also true for security fixes, which are eventually hard to backport.
Benefits to developers
I personally know developers who are not interested in working in the open at all for a very dubious reason: as long as they have control over their own private kernel fork, they are the undisputed experts for code on which their company depends. If forced to hand over their code to the community, they fear losing control and making themselves easier to replace. Of course this is a short-sighted view, but it happens. These developers miss the beneficial effect of gaining knowledge and expertise by working together with others.
One of my own employees went through a ten-round review-update-review cycle which ended with satisfaction for both sides:
> Other than that I am very happy with this latest version. Great > job! Thanks for your patience, I know it's always a bit > frustrating when your code works well enough for yourself and you > are still told to make many changes before it is acceptable > upstream. Well, I really appreciate good code quality. If this is the price, I'm willing to pay it. Actually, I thank you for helping me so much.
Over the course of this review cycle the code quality of the driver improved; it also led to some general discussion about the affected sensors framework and the improvement of it on the fly. The developer improved his skills and he got an improved insight into the framework with the result that his next project will definitely have a much shorter review cycle. This growth makes him far more valuable for the company than having him as the internal expert for some "well it works for us" driver.
The framework maintainer benefited as well, as he needed to look at the requirements of the new device and adjust the framework to handle it in a generic way. This phenomenon is completely consistent with Greg Kroah-Hartman's statement in his OLS keynote last year:
All of the above leads to a single conclusion: working with the kernel development community is worth the costs it imposes in changes to internal processes. Companies which work with the kernel developers get a kernel which better meets their needs, is far more stable and secure, and which will be maintained and improved by the community far into the future. Those companies which choose to stay outside the process, instead, miss many of the benefits of millions of dollars' worth of work being contributed by others. Developers are able to take advantage of working with a group of smart people with a strong dedication to code quality and long-term maintainability.
It can be a winning situation for everybody involved - far better than
perpetuating the embedded Linux nightmare.
| Index entries for this article | |
|---|---|
| GuestArticles | Gleixner, Thomas |
