LWN.net Logo

The embedded Linux nightmare - an epilogue

May 1, 2007

This article was contributed by Thomas Gleixner

The usage of proprietary operating systems in companies over the last 25 years has established a set of constraints which are not really applicable to the way open source development - and Linux kernel development in particular - works. My keynote talk ("The Embedded Linux Nightmare") at the Embedded Linux Conference in Santa Clara addressed this mismatch; it created quite a bit of discussion. I would like to follow up and add some more details and thoughts about this topic.

Why follow mainline development?

The version cycles of proprietary operating systems are completely different than the Linux kernel version cycles. Proprietary operating systems have release cycles measured in years; the Linux kernel, instead, is released about every three months with major updates to the functionality and feature set and changes to internal APIs. This fundamental difference is one of the hardest problems to handle for the corporate mindset.

One can easily understand that companies try to apply the same mechanisms which they applied to their formerly- (and still-) used operating systems in order not to change procedures of development and quality assurance. Jamming Linux into these existing procedures seems to be somehow possible, but it is one of the main contributions to the embedded Linux nightmare, preventing companies from tapping the full potential of open source software. Embedded distribution vendors are equally guilty as they try to keep up the illusion of the one-to-one replacement of proprietary operating systems by creating heavily patched Linux Kernel variants.

It is undisputed that kernel versions need to be frozen for product releases, but it can be observed that those freezes are typically done very early in the development cycle and are kept across multiple versions of the product or product family. These freezes, which are the vain attempt to keep the existing procedures alive, lead to backports of features found in newer kernel versions and create monsters which put the companies into the isolated situation of maintaining their unique fork forever, without the help of the community.

I was asked recently whether a backport of the new upcoming wireless network stack into Linux 2.6.10 would be possible. Of course it is possible, but it does not make any sense at all. Backporting such a feature requires backporting other changes in the network stack and many other places of the kernel as well, making it even more complex to verify and maintain. Each update and bug fix in the mainline code needs to be tracked and carefully considered for backporting. Bugfixes which are made in the backported code are unlikely to apply to later versions and are therefore useless for others.

During another discussion about backporting a large feature into an old kernel, I asked why a company would want to do that. The answer was: the quality assurance procedures would require a full verification when the kernel would be upgraded to a newer version. This is ridiculous. What level of quality does such a process assure when there is a difference between moving to a newer kernel version and patching a heavy feature set into an old kernel? The risk of adding subtle breakage into the old kernel with a backport is orders of magnitudes higher than the risk of breakage from an up-to-date kernel release. Up-to-date kernels go through the community quality assurance process; unique forks, instead, are excluded from this free of charge service.

There is a fundamental difference between adding a feature to a proprietary operating system and backporting a feature from a new Linux kernel to an old one. A new feature of a proprietary operating system is written for exactly the version which is enhanced by the feature. A new feature for the Linux kernel is written for the newest version of the kernel and builds upon the enhancements and features which have been developed between the release of the old kernel and now. New Linux kernel features are simply not designed for backporting.

I only can discourage companies from even thinking about such things. The time spent doing backports and the maintenance of the resulting unique kernel fork is better spent on adjusting the internal development and quality assurance procedures to the way in which the Linux kernel development process is done. Otherwise it would be just another great example of a useless waste of resources.

Benefits to companies from working with the kernel process

There are a lot of arguments made why mainlining code is not practicable in the embedded world. One of the most commonly used arguments is that embedded projects are one-shot developments and therefore mainlining is useless and without value. My experience in the embedded area tells me, instead, that most projects are built on previous projects and a lot of products are part of a product series with different feature sets. Most special-function semiconductors are parts of a product family and development happens on top of existing parts. The IP blocks, which are the base of most ASIC designs, are reused all over the place, so the code to support those building blocks can be reused as well.

The one-shot project argument is a strawman for me. The real reasons are the reluctance to give up control over a piece of code, the already discussed usage of ancient kernel versions, the work which is related to mainlining, and to some degree the fear of the unknown.

The reluctance to give up control over code is an understandable but nevertheless misplaced relic of the proprietary closed source model. Companies have to open up their modifications and extensions to the Linux kernel and other open source software anyway when they ship their product. So handing it over to the community in the first place should be just a small step.

Of course mainlining of code is a fair amount of work and it forces changes to the way how the development in companies works. There are companies which have been through this change and they confirm that there are benefits in it.

According to Andrew Morton, we change approximately 9000 lines of kernel code per day, every day. That means that we touch something in the range of 3000 lines of code, when we take comments, blank lines and simple reshuffling into account. The COCOMO estimate of the value of 3000 lines of code is about $100k. So we have a total investment of $36 million per year which flows into the kernel development. That's with all the relevant factors set to 1. Taking David Wheelers factors into account would cause this figure to go up to $127 million. This estimate does not take other efforts around the kernel into account, like the test farms, the testing and documentation projects and the immense number of (in)voluntary testers and bug reporters who "staff" the QA department of the kernel.

Some companies realize the value of this huge cooperative investment and add their own stake for the long term benefit. We recently had a customer who asked if we could write a driver for an yet-unsupported flash chip. His second question was whether we would try to feed it back into the mainline. He was even willing to pay for the extra hours, simply because he understood that it was helpful for him. This is a small company with less than 100 employees and a definitely limited budget. But they cannot afford the waste of maintaining even such small drivers out of tree. I have seen such efforts of smaller companies quite often in recent years and I really hold those folks in great respect.

Bigger players in the embedded market apparently have budgets large enough to ignore the benefits of working with the community and just concentrate on their private forks. This is unwise with respect to their own investments, not to talk about the total disrespect for the values which are given them by the community.

It is understandable that companies want to open the code for new products very late in the product cycle, but there are ways to get this done nevertheless. One is to work through a community proxy, such as consultants or service providers, who know how kernel development works and can help to make the code ready for inclusion from the very beginning.

The value of community-style development is in avoiding mistakes and the benefit of the experience of other developers. Posting an early draft of code for comment can be helpful for both code quality and development time. The largest benefit of mainlining code is the automatic updates when the kernel internal interfaces are changed and the enhancements and bugfixes which are provided by users of the code. Mainlining code allows easy kernel upgrades later in a product cycle when new features and technologies have to be added. This is also true for security fixes, which are eventually hard to backport.

Benefits to developers

I personally know developers who are not interested in working in the open at all for a very dubious reason: as long as they have control over their own private kernel fork, they are the undisputed experts for code on which their company depends. If forced to hand over their code to the community, they fear losing control and making themselves easier to replace. Of course this is a short-sighted view, but it happens. These developers miss the beneficial effect of gaining knowledge and expertise by working together with others.

One of my own employees went through a ten-round review-update-review cycle which ended with satisfaction for both sides:

	> Other than that I am very happy with this latest version. Great
	> job!  Thanks for your patience, I know it's always a bit
	> frustrating when your code works well enough for yourself and you
	> are still told to make many changes before it is acceptable
	> upstream.

	Well, I really appreciate good code quality. If this is the price,
	I'm willing to pay it. Actually, I thank you for helping me so
	much.

Over the course of this review cycle the code quality of the driver improved; it also led to some general discussion about the affected sensors framework and the improvement of it on the fly. The developer improved his skills and he got an improved insight into the framework with the result that his next project will definitely have a much shorter review cycle. This growth makes him far more valuable for the company than having him as the internal expert for some "well it works for us" driver.

The framework maintainer benefited as well, as he needed to look at the requirements of the new device and adjust the framework to handle it in a generic way. This phenomenon is completely consistent with Greg Kroah-Hartman's statement in his OLS keynote last year:

We want more drivers, no matter how "obscure", because it allows us to see patterns in the code, and realize how we could do things better.

All of the above leads to a single conclusion: working with the kernel development community is worth the costs it imposes in changes to internal processes. Companies which work with the kernel developers get a kernel which better meets their needs, is far more stable and secure, and which will be maintained and improved by the community far into the future. Those companies which choose to stay outside the process, instead, miss many of the benefits of millions of dollars' worth of work being contributed by others. Developers are able to take advantage of working with a group of smart people with a strong dedication to code quality and long-term maintainability.

It can be a winning situation for everybody involved - far better than perpetuating the embedded Linux nightmare.


(Log in to post comments)

The embedded Linux nightmare - an epilogue

Posted May 1, 2007 21:23 UTC (Tue) by raven667 (subscriber, #5198) [Link]

I can only can discourage companies to even think about such things. The time for backports and the maintenance of the resulting unique kernel fork is definitely better spent on adjusting the internal development and quality assurance procedures to the way how the Linux kernel development process works. Otherwise it would be just another great example for useless waste of resources.

So how does this relate to long-term supported vendor releases such as RHEL, SLES, Ubuntu LTS or Debian? Each of these vendors maintains one forked kernel version into which they backport fixes for the life of their product. It would seem that you are arguing that this is also the wrong way to approach the problem of maintenance unless you see a fundamental difference between the two cases.

The embedded Linux nightmare - an epilogue

Posted May 1, 2007 22:14 UTC (Tue) by AdHoc (subscriber, #1115) [Link]

They backport fixes, which is much different from backporting features. A fix is generally localized, features may be touch many parts of the kernel. I'm guessing they occasionally backport new drivers if the driver framework hasn't changed much.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 0:10 UTC (Wed) by drag (subscriber, #31333) [Link]

Plus the kernel differences don't seem as huge.

Currently Debian Stable and Ubuntu are using 2.6.18 kernel variations. Redhat EL 5 uses 2.6.18 variation. All of them are released fairly recently.

Old Debian Stable shipped 2.6.8 and Redhat EL 4 used 2.6.9. Those systems are the longest supported distributions out of all mainline Linux versions and now they are basicly obsolete, except as legacy installations. Nobody would want to install either of those systems for new servers or workstations.

Adding new hardware support from backports would be exponentially easier from 2.6.21 to 2.6.18 then trying to go from 2.6.21 to 2.6.8

So in the author's example he was asked about backporting drivers to 2.6.10... probably that means that the device is not even into beta mode yet if they are still working out the details of hardware support. So that is a very odd kernel for anything still in relatively early stages of development.

That kernel is very old now. It will be ancient by the time it reaches market. So it seems very weird that you'd choose that version to work with.

I guess that a good approach would be that generally people will want to keep up with the latest stable releases for as long as they can, and 'freeze' the version as late into the development cycle as comfortably possible.

but ...

Posted May 2, 2007 1:45 UTC (Wed) by JoeBuck (subscriber, #2330) [Link]

Red Hat is still supporting RHEL 3 for their customers.

but ...

Posted May 2, 2007 20:01 UTC (Wed) by tglx (subscriber, #31301) [Link]

right, because it's a product and they have support contracts. But it is in maintainence mode. They won't add the new wireless stack or whatever big feature to it anymore.

but ...

Posted May 2, 2007 21:18 UTC (Wed) by riel (subscriber, #3142) [Link]

RHEL releases only get select new hardware and feature support for the first 2 to 3 years after release. The rest of the 7 year support cycle the product will only get fixes for security problems and other really severe bugs. RHEL 2.1 is in that latter stage of the support cycle and RHEL 3 will be there soon.

We also have a policy of making sure that bugs our customers find in RHEL also get fixed in the upstream kernel. Preferably, they get fixed in the upstream kernel before the fix is put in RHEL.

Not only do we get the benefit of upstream review of the code change, but the bug is also automatically fixed when we come out with our next RHEL release. We only need to fix each bug once, and the upstream community gets the benefits of our work.

When dealing with a long product cycle and the faster upstream Linux development cycle there are ways to ensure that both your product and the upstream Linux community benefit.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 3:48 UTC (Wed) by raven667 (subscriber, #5198) [Link]

Old Debian Stable shipped 2.6.8 and Redhat EL 4 used 2.6.9. Those systems are the longest supported distributions out of all mainline Linux versions and now they are basicly obsolete, except as legacy installations. Nobody would want to install either of those systems for new servers or workstations.

I disagree with this statement. I still install RHEL4 in my business and plan to do so for some time, heck I still install the occasional RHEL3 host. I'm not going to rebuild systems that currently work just to get the latest and greatest without some clear justification. It costs time to maintain another OS variant, re-checking our services and potentially porting them to the new release. Of course they are all similar (RHEL3-RHEL5) but each version has differences (or else it wouldn't be a new version) that I need to understand and account for. The long term support for RHEL, SLES and Ubuntu LTS exists for a reason, users which don't want a high rate of churn for their base OS.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 5:17 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]

however, you won't be trying to install that old version on brand new hardware, you will be installing it on the hardware that it supports.

as such you just need bugfixes, you don't need new drivers and other major improvements from the new kernel.

so you are just fine. the maintinance nightmare is when you try to backport larger chunks of things, and they drag in more stuff that they depend on.

even bugfixes aren't going to be complete, many bugs get fixed by replacing the code that is buggy (frequently without the person writing the new code recognising all the implications of the bug, sometimes without them noticing that there is a bug) with new code, useually to add new features, or as part of a cleanup effort.

bugs that are fixed like this are extremely hard to backport

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 11:59 UTC (Thu) by raven667 (subscriber, #5198) [Link]

<blockquote><p>however, you won't be trying to install that old version on brand new hardware,
you will be installing it on the hardware that it supports.</p>

<p>as such you just need bugfixes, you don't need new drivers and other major improvements
from the new kernel.</p>
</blockquote>
<p>I hate to say it again but this is also not true. RHEL3 supports modern hardware, the have
backported newer versions of drivers and other compatability fixes. RHEL3 is supported on the
latest Dell PowerEdge 9th generation systems for example and even comes with the Dell hardware
management kernel drivers that were integrated with upstream.</p>

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 4:47 UTC (Wed) by AJWM (guest, #15888) [Link]

> Redhat EL 4 used 2.6.9. [...] Nobody would want to install either of those systems for new servers or workstations.

Heh, I wish I could agree with you, but within the next couple of days I'll be doing a fresh Redhat 4 install. The (4x 3GHz Xeon) server is being repurposed, currently it's still running RHEL 2.1. Most of the servers in this particular environment are running RH 3, with just a handful at RH 4. (Mind, they do get updates). Once you've got an enterprise system stable (eg, a four-node Oracle cluster with 8 CPUs/node (16 virtual with hyperthreading) and a SAN backend) you don't want to mess with it very often. (When that's upgraded it'll also be to new boxes, 2 each 32-way IA-64s, and we can kiss hugemem goodbye - but it'll be to RH4, not RH5).

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 11:51 UTC (Wed) by drag (subscriber, #31333) [Link]

I said _want_.

You don't want to use it, do you? I'd expect you'd rather just install the newest of what was aviable and leave it at that.

You use it because you _need_ to, right?

Also I suppose you'd like all new hardware, too.

:-)

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 5:28 UTC (Thu) by msmeissn (subscriber, #13641) [Link]

SUSE is still supporting 2.4.21 kernels (SLES 8)...

And it really only sees security updates at this time... Very
small self contained fixes.

Ciao, Marcus

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 3:54 UTC (Wed) by sobdk (guest, #38278) [Link]

RHEL 4 with a 2.6.9 kernel has the mutex patches backported from 2.6.16. I would hardly call this a fix, nor is it a new driver. Perhaps they do this to actually closer track the mainline while pretending to keep the kernel at 2.6.9, but they certainly do not just backport fixes.

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 23:05 UTC (Thu) by ronaldcole (guest, #1462) [Link]

I believe that would be because RHEL4 still has one year before it enters maintenance mode, and would therefore only get bug fixes.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 7:14 UTC (Wed) by bos (guest, #6154) [Link]

The big vendors do *not* limit themselves to backporting fixes into their enterprise kernels. Each of the five extant RHEL 4 kernels (original and updates 1 through 4) is quite different from the others. For example, the core device model code changed TWICE during updates between RHEL 4 kernels.

Backporting drivers to older kernels absolutely sucks as a way to spend your time because of changes like this, and worse, it feels like a waste of effort while you do it. Dispiriting stuff.

The embedded Linux nightmare - an epilogue

Posted May 1, 2007 22:26 UTC (Tue) by dmarti (subscriber, #11625) [Link]

It seems like there's a difference between backporting fixes (the kind of changes that happen between 2.6.x.0 and 2.6.x.y in mainline) and backporting whole new features such as the new wireless code.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 15:34 UTC (Wed) by vmole (guest, #111) [Link]

One other thing that makes RedHat different: they have actual full-time kernel developers working for them. That makes it *far* more likely that they'll be able to successfully backport a feature/fix than J. Random Embedded. Not because Ms. Embedded is a bad programmer, but because she doesn't spend all her time working with and understanding the kernel.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 10:14 UTC (Wed) by NAR (subscriber, #1313) [Link]

During another discussion about backporting a large feature into an old kernel, I asked why a company would want to do that.

I guess they want the new feature (e.g. wireless stack), but don't want the new bugs in e.g. the memory management, scheduler or block layer. And please don't tell me there are no new bugs.

Bye,NAR

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 13:55 UTC (Wed) by jreiser (subscriber, #11027) [Link]

they want the new feature... but don't want the new bugs

This is exactly what matters for an embedded product, and exactly where the Linux development process fails. Last week Adrian Bunk quit in disgust at the regressions http://lwn.net/Articles/231993/ and Andrew Morton said the situaition is likely to get worse http://lwn.net/Articles/232432/. Consumers do not tolerate such shoddy products.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 16:26 UTC (Wed) by khim (subscriber, #9252) [Link]

This is exactly what matters for an embedded product, and exactly where the Linux development process fails.

Not really. If you'll try to backport new features to ancient kernel you'll get new features, new bugs (in subsystems which must be modified to accommodate new features) and your own unique bugs (because you'll have mostly-untested mix of old and new kernels).

Consumers do not tolerate such shoddy products.

Sure they do. If you'll take line of mobile phones from any manufacturer and count number of bugs and regressions - you'll probably find more bugs/regressions then 14. And most of them are known. And some are even fixed in unofficial firmware. This is waay worse then what happens with kernel - yet Nokia and Samsung are selling millions of phones...

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 14:50 UTC (Wed) by bene42 (subscriber, #32623) [Link]

On the other hand they don't like old bugs. So you have to choose. And it is far more easier to get help from nearly everybody to fix a new bug, then backport and maintain a bunch of patches to fix old bugs.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 15:48 UTC (Wed) by NAR (subscriber, #1313) [Link]

On the other hand they don't like old bugs.

Yes. But they spent half year on testing their stuff with the old kernel with the workloads they expect, so they can be pretty sure that even if there are old bugs, these bugs don't cause problems. Or if they do, they can be fixed (even by backporting from newer versions) or by working around them.

Bye,NAR

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 16:34 UTC (Wed) by bene42 (subscriber, #32623) [Link]

Where is the difference?
The have to spend a complete QA cycle in any case. Every shortcut here is window-dressing. Or lets say it in other words: Every shortcut is the programmers guaranteed right to fire up some viewy firework...
see 6/4/1996

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 15:51 UTC (Wed) by tglx (subscriber, #31301) [Link]

> I guess they want the new feature (e.g. wireless stack), but don't want the new bugs in e.g. the memory management, scheduler or block layer. And please don't tell me there are no new bugs.

Right, and therefor they backport the wireless stack, which includes the modifications to the network stack on which it relies. Now they need to fixup the memory managment changes in both and some other nasty details all over the place.

This is done by a very small group of people. Do they have the full expertise and overview on all this and the testing and verification capacities at hand to verify this hackery on their own ?

If yes, then they should really go and identify and fix the bugs in mainline. They would be a great asset to the kernel community.

If no, the nightmare is there in it's full glory.

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 16:02 UTC (Wed) by NAR (subscriber, #1313) [Link]

They are probably a little naive and think that e.g. the wireless network stack and the block layer are independent of each other and there are well defined and stable internal APIs within the kernel, so it doesn't get redesigned in every two months.

Bye,NAR

The embedded Linux nightmare - an epilogue

Posted May 2, 2007 23:14 UTC (Wed) by roelofs (guest, #2599) [Link]

It is undisputed that kernel versions need to be frozen for product releases, but it can be observed that those freezes are typically done very early in the development cycle and are kept across multiple versions of the product or product family. These freezes, which are the vain attempt to keep the existing procedures alive ...

That's a tad simplistic--sometimes the reason is as innocuous as "minimize your variables." For example, when we encounted bugs on a project on which I worked several years ago, it was rarely obvious whether it was an application bug, a toolchain bug (GCC/binutils/glibc), a kernel bug, or a hardware bug (CPU itself or support chips/boards, each of which may be literally unique due to manufacturing flaws). Indeed, over the course of the project we found examples of each flavor--and at least one of the kernel bugs wasn't in driver code; it was in regular, mainline code (albeit infrequently used mainline code).

That said, we did upgrade kernels at least two or three times (I recall at least 2.4.7, 2.4.14, and 2.4.18), and for all I know, there were further updates before the product(s) shipped. But it's an overstatement to solely blame a desire to "keep the existing procedures alive"--unless, of course, one of the procedures in question is the scientific method.

Bigger players in the embedded market apparently have budgets large enough to ignore the benefits of working with the community and just concentrate on their private forks.

I suspect it's less an issue of enjoying large budgets than of suffering large bureaucracies. ;-) Barring a champion with enough authority to make his or her own decisions on behalf of the company (rare!), it can be extremely difficult even to find out whose department has such authority, much less to make contact and convince her/him/them. Even my implication that there's a single such department is questionable; it could easily involve multiple departments/divisions/management chains, not to mention lawyers and perhaps even one or more third parties. Ten iterations of review/revise before acceptance into the mainline kernel is nothing compared to that level of brutality. :-(

Greg

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 1:14 UTC (Thu) by bene42 (subscriber, #32623) [Link]

> Ten iterations of review/revise before acceptance into the mainline kernel is nothing compared to that level of brutality. :-(

But we have to show the "kernel" way is the cheapest way. And every QA, exept the SM ones, is on our side. For a diligent QA is it easier to accept a new stable kernel version, then a ton of backports. They do the same tests on each kernel, so the difference is one question: What backing?

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 4:19 UTC (Thu) by roelofs (guest, #2599) [Link]

But we have to show the "kernel" way is the cheapest way.

Oh, I get that, and I'm grateful for articles like this one and its predecessor--it should be an excellent tool to help embedded engineers (who are often favorably inclined toward FLOSS in the first place) convince their management chains to work in a more community-compatible way. I was just pointing out that one of Thomas's arguments was maybe not quite as strong as it might appear at first glance; sometimes there are good reasons for sticking with an older kernel for an extended period.

Also note that not all architectures get as much developer attention as i386 and x86_64, nor are they all as unified.

Greg

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 6:44 UTC (Thu) by drag (subscriber, #31333) [Link]

How much of what you described above can be blamed on just stumbling along new bugs by introducing new software versions versus maybe just not having the ability to quickly diagnose bugs in the first place?

Say if this project described on the following article is successfull?:
http://lwn.net/Articles/232769/

So if that makes it much easier to dectect and isolate bugs (say make it more obvious were in the hardware->application toolchain the bug is occuring) would it have a positive affect on embedded developers to be able to track closer to mainline kernel?

Could it it be that lack of good debugging tools for Linux is going to hold people back from using newer Linux versions?

Or is it that what you have now is good enough for embedded developers and such things probably aren't going to make anybody's job any easier?

The embedded Linux nightmare - an epilogue

Posted May 4, 2007 3:50 UTC (Fri) by roelofs (guest, #2599) [Link]

How much of what you described above can be blamed on just stumbling along [upon?] new bugs by introducing new software versions versus maybe just not having the ability to quickly diagnose bugs in the first place?

Well, it was certainly true that, given the size of the problem space, we were more than a bit understaffed. And the Chronicle debugger (thanks for the link, btw--I hadn't gotten to this week's Development page yet), had it existed back then, probably would have helped, at least if we could have run it from the host system. But insofar as it runs on top of Valgrind, it appears to be limited to apps and libraries, and the really hard, time-consuming bugs were the ones in the kernel and hardware.

Could it it be that lack of good debugging tools for Linux is going to hold people back from using newer Linux versions? Or is it that what you have now is good enough for embedded developers and such things probably aren't going to make anybody's job any easier?

I'm no longer doing embedded work (quite the opposite, in fact!), but I don't know anyone who would claim the existing tools are "good enough." That is, they're sufficient to get most things done, but it's clear they could be much, much better. In fact, most reasonably unbiased developers with whom I've spoken seem to agree that development tools are one of the principal areas where Microsoft still holds a significant lead over the FLOSS community, and I fully believe them. After all, MS has been working on their toolchain and using it to woo developers (their own approach to World Domination) for a quarter-century, and even when they were spread the thinnest, they supported only three or four CPU architectures--and that for only a few years. GCC and gdb are truly marvels of portability, and it's really nice not to have to learn a new set of commands on every platform, but that very portability has been a huge drag on the pace of development, both in optimization and in debugging (and, no doubt, other areas).

That said, I continue to use GCC and gdb every day, both at work and at home. And as much as I love to eke the last little bit of performance out of my code, I'm far more excited about the possibility of major strides on the debugging end of things--especially for massively multithreaded applications. The sooner, the better...

Greg

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 1:35 UTC (Thu) by jkowing (subscriber, #5172) [Link]

I've often wondered why the comedi (http://www.comedi.org/) drivers are not part of the mainline kernel. Is there a reason why they aren't?

Comedi

Posted May 3, 2007 1:41 UTC (Thu) by corbet (editor, #1) [Link]

For code to get into the kernel, somebody has to post it to a relevant list, respond to comments, and generally work to get it merged. I've not seen that happening with the Comedi drivers. 'spose I could have missed it...

The embedded Linux nightmare - an epilogue

Posted May 3, 2007 23:59 UTC (Thu) by gregkh (subscriber, #8) [Link]

The developers of comedi have not wanted to get into the kernel tree, I have approached them numerous times about this in the past. Because of this, it hasn't been submitted.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds