Long-term support and backport risk
So it is interesting that, at the recently-concluded Linux Foundation Collaboration Summit, numerous people were heard expressing concerns about this model. Grumbles were voiced in the official panels and over beer in the evening; they came from representatives of the relevant vendors, their customers, and from not-so-innocent bystanders. The "freeze and support" model has its merits, but there appears to be a growing group of people who are wondering if it is the best way to support a fast-moving system like Linux.
The problem is that there is a great deal of stress between the "completely stable" ideal and the desire for new features and hardware support. That leads to the distribution of some interesting kernels. Consider, for example, Red Hat Enterprise Linux 4, which was released in February, 2005, with a stabilized 2.6.9 kernel. RHEL4 systems are still running a 2.6.9 kernel, but it has seen a few changes:
- Update
1 added a disk-based crash dump facility (requiring driver-level
support), a completely new Megaraid driver, a number of block I/O
subsystem and driver changes to support filesystems larger than 2TB,
and new versions of a dozen or so device drivers.
- Update
2 threw in SystemTap, an updated ext3 filesystem, the in-kernel
key management subsystem, a new OpenIPMI module, a new audit
subsystem, and about a dozen updated device drivers.
- For update
3, Red Hat added the InfiniBand subsystem, access control list
support, the error detection and correction (EDAC) subsystem, and
plenty of updated drivers.
- Update 4 added WiFi protected access (WPA) capability, ACL support in NFS, support for a number of processor models and low-level chipsets, and a large number of new and updated drivers.
The end result is that, while running uname -r on a RHEL4 system will yield "2.6.9", what Red Hat is shipping is a far cry from the original 2.6.9 kernel, and, more to the point, it is far removed from the kernel shipped with RHEL4 when it first became available. This enterprise kernel is not quite as stable as one might have thought.
Greg Kroah-Hartman recently posted an article on this topic which makes it clear that Red Hat is not alone in backporting features into its stable kernels:
Similar things have been known to happen in the embedded world. In every case, the distributors are responding to two conflicting wishes expressed by their customers: those customers want stability, but they also want useful new features and support for new hardware. This conflict forces distributors to walk a fine line, carefully backporting just enough new stuff to keep their customers happy without breaking things.
The word from the summit is that this balancing act does not always work. There were stories of production systems falling over after updates were applied - to the point that some high-end users are starting to reconsider their use of Linux in some situations. It is hard to see how this problem can be fixed: the backporting of code is an inherently risky operation. No matter how well the backported code has been tested, it has not been tested in the older environment into which it has been transplanted. This code may depend on other, seemingly unrelated fixes which were merged at other times; all of those fixes must be picked up to do the backport properly. It is also not the same code which is found in current kernels; distributor-private changes will have to be made to get the backported code to work with the older kernel. Backporting code can only serve to destabilize it, often in obscure ways which do not come to light until some important customer attempts to put it into production.
All of this argues against the backporting of code into the stabilized kernels used in long-term-support distributions. But customer demand for features, and (especially) hardware support will not go away. In fact, it is likely to get worse. Quoting Greg again:
So, if one goes on the assumption that the Plan For World Domination includes moving Linux out of the server room onto a wider variety of systems, the pressure for additional hardware support in "stabilized" kernels can only grow.
What is to be done? Greg offers three approaches, the first two of which are business as usual and the elimination of backports. The disadvantages of the first option should have been made clear by now; going to a "bug fixes only" mode has its appeal, but the resulting kernels will look old and obsolete in a very short time. Greg's third option is one which your editor heard advocated by several people at the Collaboration summit: the long-term-support distributions would simply move to a current kernel every time they do a major update.
Such a change would have obvious advantages: all of the new features and new drivers would come automatically, with no need for backporting. Distributors could focus more on stabilizing the mainline, knowing that those fixes would get to their customers quickly. Many more bug fixes would get into kernel updates in general; no distributor can possibly hope to backport even a significant percentage of the fixes which get into the mainline. The attempt to graft an old support model better suited to proprietary systems would end, and long-term support Linux customers would get something that looks more like Linux.
Of course, there may be some disadvantages as well. Dave Jones has expressed some discomfort with this idea:
As Dave also notes, some mainline kernel releases are better than others; the current 2.6.21 kernel would probably not be welcomed in many stable environments. So any plan which involved upgrading to current kernels would have to give some thought to the problem of ensuring that those kernels are suitably stable.
Some of the key ideas to achieve that goal may already be in place. There was talk at the summit of getting the long-term support vendors to coordinate their release schedules to be able to take advantage of an occasional extra-stable kernel release cycle. It has often been suggested that the kernel could go to an even/odd cycle model, where even-numbered releases are done with stability as the primary goal. Such a cycle could work well for distributors; an odd release could be used in beta distribution releases, with the idea of fixing the resulting bugs for the following even release. The final distribution release (or update) would then use the resulting stable kernel. There is opposition to the even/odd idea, but that could change if the benefits become clear enough.
Both Greg and Dave consider the effects such a change would have on the
providers of binary-only modules. Greg thinks that staying closer to the
upstream would make life easier by reducing the number of kernel variants
that these vendors have to support. Dave, instead, thinks that binary-only
modules would break more often, and "This kind of breakage in an
update isn't acceptable for the people paying for those expensive support
contracts
". If the latter position proves true, it can be seen as
an illustration of the costs imposed on the process by proprietary modules.
Dave concludes with the thought that the status quo will not change anytime
soon. Certainly distribution vendors would have to spend a lot of time
thinking and talking with their customers before making such a fundamental
change in how their products are maintained. But the pressures for change
would appear to be strong, and customers may well conclude that they would
be better off staying closer to the mainline. Linux and free software have
forced many fundamental changes in how the industry operates; we may yet
have a better solution to the long-term support problem as well.
Posted Jun 20, 2007 17:56 UTC (Wed)
by oak (guest, #2786)
[Link] (2 responses)
Posted Jun 20, 2007 18:10 UTC (Wed)
by Bogerr (guest, #36700)
[Link] (1 responses)
Posted Jun 21, 2007 13:56 UTC (Thu)
by jamesh (guest, #1159)
[Link]
If after the 4th update to e.g. RHEL users are on 5 different kernel versions depending on when they did the install, the distributor is going to have to produce 5 different security updates. That doesn't sound like it would fly.
Posted Jun 20, 2007 18:08 UTC (Wed)
by pj (subscriber, #4506)
[Link] (2 responses)
Posted Jun 20, 2007 20:10 UTC (Wed)
by amikins (guest, #451)
[Link] (1 responses)
Posted Jun 23, 2007 2:38 UTC (Sat)
by kingdon (guest, #4526)
[Link]
Now, unless there is a fairly big cultural shift, I don't really see these kinds of emulators and autotests getting popular in the kernel world (and I'll fully admit that this kind of thing is easier for, say, compilers than kernels, which have things like locking and races all over the place). But that's more a statement of "I don't think people are going to try it" than "I think if they did try it, it would be useless".
Posted Jun 20, 2007 18:20 UTC (Wed)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Or they use Debian which provide long-term support, adhere to a strict no-backport policy, and does not require to pay a subscription.
The issue with the 3rd option (move to a current kernel) is that new kernel sometimes require updated user-land tools (udev, etc.) to run properly. This can cause quickly a cascading of update, defeating the whole point of
Also stability actually means much more that the software will not change too much rather than it stays bug-free. People tend to write fragile apps, secure in the knowledge that the environment will not change. If the environment change too much, the apps will fail. This is a common pattern.
Posted Jul 2, 2007 1:01 UTC (Mon)
by hazelsct (guest, #3659)
[Link]
I have been encouraged though by the upkeep of the 2.6.16 kernel. For some reason I have believed that was due to its use in Dapper Drake; had not known before this article that 2.6.16 is also in a major SUSE release.
So here's a fourth option: have a bunch of distros decide together to use a given kernel release, like 2.6.16 here. Then when security patches and new drivers are backported, they can all use the same patches.
The only trouble arises when new drivers require new infrastructure, like the new wireless stuff; the only way to get around that is option 3 with its set of problems. :-(
Posted Jun 20, 2007 18:36 UTC (Wed)
by pcampe (guest, #28223)
[Link]
I guess it's possible to change the balance between (time between major releases) and (time between minor releases) if we change accordingly the balance between (new features in majors) and (new "features" in minors). Maybe a major release should be shipped once a year, and the <7 years of minor release support could cover only bug and security fixes.
Pros for the customer: stable ABI for third party drivers and applications once a version has been chosen; more time for choosing when(if) doing a major upgrade, knowing that a fallback exists. The vendor will have more versions to support, but each one will definitely requires a lot less effort.
Posted Jun 20, 2007 18:52 UTC (Wed)
by dlang (guest, #313)
[Link] (1 responses)
even under the old even/odd release you had some odd kernels that were extremely stable and some even kernels that you wouldn't want to run
if the distros don't want to run the latest kernel.org kernels then let them pick a kernel 1-2 revisions back (after the -stable series has cleaned up what was found) and use that, accepting that it doesn't have all the latest fixes, but has been tested more.
I've been useing kernel.org kernels in production environments since 1996 and I see no sign of the declining quality that people keep claiming.
what I do is when I look to do a kernel upgrade I take the latest kernel and start testing it. I also watch for reports of problems with that kernel and the type of things that are fixed in -stable. after a few weeks (with me spending no more then a couple hours a week on this) it's pretty clear if this is a good candidate or not. if not I wait for the next release and try again, if so I build kernels for all my different hardware and setup some stress tests. I spend a day or two hammering on test boxes and then roll out the result to production. a year or so later I repeat the process. (unless there is a security hole found that forces an upgrade sooner)
if I were to go with distro kernels instead I would have to do about the same testing with the kernel the distro provides becouse I'm still the one responsible for any system failures and I know from painful experiance that even if the failure is completely the vendors fault I'm the one who gets blamed (after all I selected that vendor, or I should have tested more to find the problem)
Posted Jun 21, 2007 4:30 UTC (Thu)
by arjan (subscriber, #36785)
[Link]
Posted Jun 20, 2007 19:58 UTC (Wed)
by edschofield (guest, #39993)
[Link] (1 responses)
Posted Jun 22, 2007 15:47 UTC (Fri)
by garloff (subscriber, #319)
[Link]
For enterprise customers, it's often much better to see only 5 out of 10
It's pretty tough to avoid this occasional bug or the risk of it:
Getting every bug out still would not be enough ...
Some people argue that the kernel is not keeping interfaces stable
But that's not good enough:
And whenever we hear about such breakage we try to help the app creator
So my guess is that the model won't change anytime soon.
Maybe it could be changed for a subset of use scenarios, where you work
Posted Jun 20, 2007 20:21 UTC (Wed)
by jwboyer (guest, #23296)
[Link] (2 responses)
That isn't specific to proprietary modules. It can happen for any out-of-tree module, regardless of license.
Posted Jun 21, 2007 6:46 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
Contrast that with a closed-source vendor's (*cough* NVidia *cough*) policy that their latest driver not only fixes bugs and supports newer hardware, but also discontinues support for "older" hardware.
Ouch.
Posted Jun 21, 2007 13:21 UTC (Thu)
by jwboyer (guest, #23296)
[Link]
Posted Jun 20, 2007 21:56 UTC (Wed)
by iabervon (subscriber, #722)
[Link] (5 responses)
They could have each stable series use the policy of the kernel.org -stable series, and have new stable series start internally reasonably frequently, becoming available to customers when they are well-tested and have no known regressions remaining relative to the previous stable series. In the testing region of a series before it gets to customers, the rules would probably permit disabling stuff and reverting problematic patches. And not every kernel.org version would ever get listed; with all the 2.6.21 problems, they'd probably just skip that one, since it'll probably be easier to get 2.6.22 into shape than figure out which changes between 2.6.21 and 2.6.22 fixed 2.6.21 regressions.
Posted Jun 21, 2007 0:27 UTC (Thu)
by smoogen (subscriber, #97)
[Link] (4 responses)
Posted Jun 21, 2007 4:38 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
I think the people who want a really stable kernel really do best by getting the first kernel that supports everything they want, waiting for other people to hit the problems (or waiting for their system vendor to hit the problems) and then sticking with that kernel, with only -stable-quality patches afterward until they get a newer system that needs a newer kernel.
Posted Jun 21, 2007 7:51 UTC (Thu)
by dlang (guest, #313)
[Link] (2 responses)
Posted Jun 21, 2007 16:48 UTC (Thu)
by smoogen (subscriber, #97)
[Link] (1 responses)
[From someone who has done systems administration support of kernels from 1993.]
Posted Jun 21, 2007 23:53 UTC (Thu)
by dlang (guest, #313)
[Link]
I wouldn't be surprised if the 2.6.21->2.6.22 changes riveled or exceeded the 1.2.0 -> 1.3.0 changes. the fact that it's happening so quickly with so few problems is amazing.
when I referred to the problems in the 2.0, 2.2, and 2.4 series, I wasn't just thinking of the couple major problems, I'm remembering that there were several 'brown paper bag' releases scattered throughout the series
Posted Jun 20, 2007 23:34 UTC (Wed)
by error27 (subscriber, #8346)
[Link] (2 responses)
At my old job, I used to create tons of driver disks. Not because the drivers were propietary but just to fix bugs and add support for newer cards. It's not so bad if you script it.
The 2.6 build system makes it easier as well.
But both RedHat and SuSE driver disk support is pretty crappy. The error messages are not useful. Debugging them is hard. It's not very well tested so out of 5 Fedora releases I supported, I think that driver disk support was completely broken in 2 (FC2 and FC5).
Also driver disk the documentation sucks.
BTW. It's interesting to look at the aacraid driver packaging. They recompile the driver for over 100 kernels (guess) and stuff it all into 1 big rpm. :P
Posted Jun 21, 2007 4:28 UTC (Thu)
by arjan (subscriber, #36785)
[Link] (1 responses)
Posted Jun 21, 2007 8:22 UTC (Thu)
by error27 (subscriber, #8346)
[Link]
As far as I could see, RHEL3 had pretty recent libata. The last RHEL3 driver disk I created was for the 3ware 9550 which was pretty new at the time. It's been a while, but I'm pretty sure I patched the libata module in one driver disk so that's a possible option.
It is a problem dealing with kernel upgrades after the install, that's true.
I'm generally happy with RHEL.
Posted Jun 21, 2007 1:59 UTC (Thu)
by cine (guest, #5597)
[Link] (4 responses)
Posted Jun 21, 2007 2:19 UTC (Thu)
by loening (guest, #174)
[Link] (3 responses)
The majority of people using server hardware never update the hardware during the life of the equipment. That goes for most workstation hardware as well. This means the vast majority of people utilizing RHEL4 when it first came out do not need any of the backported features, they just need bug and security fixes.
The people who will need new features are the people with new equipment who are installing RHEL4 for the first time. By definition this equipment is not production yet, and as such it's not nearly as big a problem if a bug is encountered as if a bug was hit in the supposedly old stable kernel. For them they could use a more recent kernel that RHEL4 has only included recently.
Granted, this may mean supporting more kernel versions, but I gotta imagine it'd be a lot easier putting in bug fixes on 2-3 kernels at various levels of maturity than in trying to backport features to a single kernel while still trying to maintain a high level of stability.
Posted Jun 21, 2007 18:32 UTC (Thu)
by bronson (subscriber, #4806)
[Link]
The scenario: I can update all I want, I will only ever get a minimally invasive, bugfixed version of the kernel I'm already running. To get new features, I would have to explicitly ask for the new version. Of course, the newest kernel would always be used for new installations.
It's been a while since I've had to really care about uptime, thank goodness. But, frankly, if the enterprise kernel situation is as bad as this article hints, I would not touch RHEL with a 10 foot pole. Talk about scary!
Posted Jun 21, 2007 18:35 UTC (Thu)
by Fats (guest, #14882)
[Link]
Posted Jun 24, 2007 15:47 UTC (Sun)
by riel (subscriber, #3142)
[Link]
Yes, some enterprise customers want to eat their cake and have it.
No, we cannot leave the solving of that riddle to them. They are the
customer and we should get them a usable compromise.
Posted Jun 21, 2007 7:02 UTC (Thu)
by skybrian (guest, #365)
[Link] (7 responses)
Posted Jun 21, 2007 7:53 UTC (Thu)
by dlang (guest, #313)
[Link] (6 responses)
and a lot of the features that people want are more then just drivers anyway.
Posted Jun 21, 2007 8:38 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (5 responses)
And a more stable API would make it easier, not just to backport drivers, but also to forward-port out-of-tree drivers to new kernels, which in turn would make it easier for distributors to stick to kernel.org kernels, perhaps one or two releases behind the current one, with the current one as an option for non-critical setups.
Posted Jun 21, 2007 13:13 UTC (Thu)
by davecb (subscriber, #1574)
[Link] (4 responses)
Assume a function setvbuf(FILE *fp, char *buf, int type, size_t size)
Imagine that in SVABI_1.4, the int and pointer sizes bcome 64 bits,
New programs compiled after SVABI_1.4 came out use the new version,
This is a blatant oversimplification (Multics did it better), but
--dave
Posted Jun 21, 2007 21:46 UTC (Thu)
by arjan (subscriber, #36785)
[Link] (2 responses)
but once you need to include locking rules (as any in-kernel API has).. it gets a HECK of a lot more tricky. You end up with a WAAY too bloated kernel, or you get translations layers...
Posted Jun 22, 2007 7:19 UTC (Fri)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Jun 22, 2007 14:28 UTC (Fri)
by davecb (subscriber, #1574)
[Link]
The Multicians never seemed to get more than one old version,
--dave
Posted Jun 22, 2007 7:27 UTC (Fri)
by mjthayer (guest, #39183)
[Link]
Posted Jun 21, 2007 8:30 UTC (Thu)
by HenrikH (subscriber, #31152)
[Link] (7 responses)
Posted Jun 21, 2007 15:44 UTC (Thu)
by drag (guest, #31333)
[Link] (6 responses)
If you (saying Linux, realy) does not supply the features that the users _require_ then those users will go elsewere.
If there is no elsewere, they will make one or pay somebody else until they get one. In other words.. If you don't give what the customers NEED they will simply say: "Fork you, buddy".
Sure that would be expensive and difficult. But the current situation with the Linux kernel the way it is already is expensive and difficult. Expensive and difficult enough that people are starting to want to look elsewere.
Effectively (in a negative light) what your saying is: "The users are stuck with Linux. They've been suckered into using something that is costing them money. It would cost them more money to get away from it. So they have no choice but to eat it."
All you have to do is look at SCO Unix or proprietary Solaris to see how far that attitude will take you.
Posted Jun 21, 2007 23:01 UTC (Thu)
by giraffedata (guest, #1954)
[Link] (5 responses)
Gee, I don't get that at all from HenrikH's comment. I read, "The users can't do any better than Linux. It may suck, but no worse than any alternative, because the problems are fundamental to operating systems, not special to Linux."
But here's why I think HenrikH is wrong: I think the proprietary alternatives do have stability and new features to a degree Linux doesn't and can't have, and here's why: per-copy licensing. Per-copy licensing gives Sun the money it takes to pay people (testers) to use the code and shake out the bugs. The economics of Linux make that impossible; with freedom of redistribution, how is the company that does that testing going to get paid for it?
Posted Jun 22, 2007 0:43 UTC (Fri)
by drag (guest, #31333)
[Link] (4 responses)
It's effectively the same thing. I don't think that he _ment_ it like I re-stated it, but more-or-less it's the same thing.
How, in the eyes of end users, is it realy different? Except for any sort of emotional content, not much.
It's compatency. It's assuming that things can't get better, that they will always be this way.
> But here's why I think HenrikH is wrong: I think the proprietary alternatives do have stability and new features to a degree Linux doesn't and can't have, and here's why: per-copy licensing. Per-copy licensing gives Sun the money it takes to pay people (testers) to use the code and shake out the bugs. The economics of Linux make that impossible; with freedom of redistribution, how is the company that does that testing going to get paid for it?
Weither or not people have access to the source code is less and less relevent to weither or not people pay for per-seat licensing.
If your a big enough company (or country) you _can_ get access to the Windows source code.
With Solaris they've effectively openned it up. Sure it still sucks compared to Linux in a lot of ways, but it's now open. People still are paying licenses.
Weither or not people choose to pay for Windows vs Solaris vs Linux comes down to more-or-less this one thing:
Forcing people to pay licenses because they can't use anything cheaper is a losing stratigy. Same thing as figuring people don't have any choice to put up with your BS because there are no other effective alternatives.
It works for a little while, but you don't want customer's resentments. You want customers to save money by paying you money. It's possible to have a net win for everybody.
And the nice thing about Linux and open source is that it's not only makes it cheaper for the end users to use, it makes it cheaper for companies to develop and, more importantly, to support.
But if the development style of Linux... which is currently not only breaking proprietary drivers (which is immaterial, realy), but also breaking out-of-tree drivers (which is very serious), AND breaking in-tree drivers (aka regressions.. ultra serious), AND breaking userspace API (mega ultra serious).... is costing more money then it is saving by being open source then you have a serious serious problem.
It's a knife edge. Very difficult. I have no good idea on how to solve the problem.
Posted Jun 22, 2007 2:24 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (3 responses)
No difference. The situation to which you responded, and the one that HenrikH described have the same effect on a user. I brought it up only because your comment was not responsive to the comment to which you attached it, indicating you probably misread it.
I don't think access to source code is relevant at all to this thread; you'll notice I didn't mention it.
What is relevant is that all sellers of Linux kernels permit their customers (because they have to) to make as many copies as they want and pass them on to as many people as they want, for the same price as one copy. Microsoft does not do that. Neither does Sun.
And that's why Microsoft and Sun can spend millions of dollars testing and Red Hat cannot.
Posted Jun 23, 2007 15:05 UTC (Sat)
by njs (subscriber, #40338)
[Link] (2 responses)
Err... I'm not an expert on this here enterprise-y stuff, but it sure looks to me like Red Hat, you know, charges a per-seat license fee just like Microsoft and Sun?
(Obviously you can use it without paying that licensing fee, but in practice you can use Windows or Solaris without paying that fee too; big businesses tend not to in both cases.)
Posted Jun 23, 2007 17:14 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (1 responses)
Red Hat does not charge for a copyright license for the Linux kernel. Its license, GPL, is free as required as a condition of the copyright license to Red Hat by various authors of the Linux kernel (also GPL).
The per-seat charge you're thinking of is for maintenance service, and it is way less than Sun or Microsoft charge for their copyright licenses. That's why many people believe that Linux is much cheaper to use than Solaris or Windows.
I think if Red Hat asked a maintenance fee large enough to cover a Microsoft-sized test department, lots of customers would decline, and hire someone else to do the maintenance. Or get SUSE, because Novell would, legally, just take the fruits of Red Hat's testing without paying anything and continue selling SUSE at Linux prices.
Posted Jun 23, 2007 17:20 UTC (Sat)
by dlang (guest, #313)
[Link]
I'll admit that I haven't looked at them recently, but the last time I did they were chaging ~$1500/machine, which is about the same price that sun and others charge for the propriatary Unix, and above many of the microsoft license costs.
if you use the 'enterprise' pricing from linux vendors then it's fairly easy to cook the books to show that linux has a higher TCO then windows (you need to choose hardware and models that minimize the fact that linux is more efficant and find manpower costs that show that windows admins get paid less then unix admins, but you are close enough to make it work)
Posted Jun 21, 2007 9:54 UTC (Thu)
by MalcYates (guest, #45868)
[Link] (1 responses)
It is fine to discuss the ramifications for Linux development, but there is a whole eco system out there that use the kernal interfaces to deliver their solutions ( any virtualisation vendor for instance ...)
If Linux is to keep its advantages of stability etc, then this is a big consideration.
It is fine for us to rip and replace a kernel once a year, but that means all software and hardware certification would need to be revisited / retested / recertified, and that is not a cheap procedure for the vendors.
Posted Jun 21, 2007 10:32 UTC (Thu)
by mjthayer (guest, #39183)
[Link]
Posted Jun 21, 2007 11:40 UTC (Thu)
by jengelh (guest, #33263)
[Link] (1 responses)
Posted Jun 21, 2007 23:50 UTC (Thu)
by dlang (guest, #313)
[Link]
this is a common mistake people make, and I think the stability or lack of it for various releases is colored by their (incorrect) expectation.
now, it is true that immediantly after a problem release there is extra effort put into makeing the next release stable, and sometimes the result is that the version after that gets even more drastic changes (hurting it's stability a bit) and I think that's what you are seeing over the last year or so since 2.6.16
Why not have very stable server releases which don't offer (any?) UI Long-term support and backport risk
functionality and separate desktop/laptop releases that are updated more
often? Once a desktop/laptop release has shown itself to be stable (in
half a year?), it (or parts of it) can be nominated as a server release?
or another option - freeze at installation time. User get(choose) the latest stable enough kernel and stick with it.Long-term support and backport risk
The problem with that model is what to do about security updates.Long-term support and backport risk
How about more and better testing? Get vendors to fund hardware virtualization projects for the peripherals they want to be able to keep supporting - that way every developer can have access to 'the hardware', even if it's only virtual hardware. As long as it works the same, it should be fine. This would also allow for more and better automated test suites to be built by people like the autotest project.Long-term support and backport risk
It is very difficult to build a software construct of a piece of hardware that acts sufficiently alike the real hardware enough for driver testing.Long-term support and backport risk
This only really works well if the emulated device is created by the same folks what made the physical device.. and even then, there's a substantial probability of deviance in behavior.
In the case of devices where they're having to be reverse engineered just to produce a driver.. TALKING to the device is hard enough, without having to replicate behavior that isn't entirely understood.
Well, one side effect of writing such an emulator is that understanding of the hardware improves. Sure, if you are reverse engineering and guessing it becomes harder, but that is true with or without the emulator.Long-term support and backport risk
>One of the main selling points touted by many Linux-oriented vendors is stability. Once a customer buys a subscription for an "enterprise" Linux or embedded systems product, the vendor will fix bugs in the software but otherwise keep it stable.Long-term support and backport risk
a stable system.
Great idea regarding Debian's kernel model. Just one problem though: try installing a two-year-old kernel in new hardware: it just won't work, the drivers aren't there.Debian model problem: drivers
Now there's a major release of RHEL every 18 months (from RHEL 4 to RHEL 5 there have been about 20 months, due to the difficult Xen integration into the kernel), and every 4 months there's a minor upgrade (5.0 -> 5.1 -> 5.2), and the support is for several years (7 IIRC).Long-term support and backport risk
the big problem with all these stability proposals is the idea that you can know ahead of time if the kernel release is going to be good enough or a dog.Long-term support and backport risk
so distros ship a 3 months old kernel, not the brand spanking latest. that's new enough to have all the hardware, but proven enough to know if it's good or not, and if not, how much it needs fixing.Long-term support and backport risk
This is an interesting article about an important topic. If enterprise distributors are breaking interfaces for external drivers despite their backporting efforts, Greg's third option seems an entirely sensible proposal. The solution to the problem of regressions in newer kernels is obvious: with all the time distributors' kernel teams save in not backporting features and drivers, they can do more testing and fixing of bugs in current kernels. An effective solution to the problem of proprietary drivers is to warn customers away from such hardware in favour of "certified" open hardware, by explicitly exempting uncertified and closed hardware from guarantees of stability between service-pack updates. As side-effects, the quality of mainline kernels improves for everybody and the costs of closed hardware are transferred to the vendors responsible in the form of lost sales.Long-term support and backport risk
> The solution to the problem of regressions in newer kernels is obvious:Long-term support and backport risk
[...]
old bugs fixed rather than all of them but at the cost of introducing 1
new one. They need predictability, and any regression or just the risk
of it is much more painful than not having some limitations/bugs
fixed.
This is what makes the value proposition of enterprise linux work today.
Limited change that can be assessed.
The process to assure with a high enough level of confidence that
there's no regression anywhere would require a really large effort.
The least thing a customer wants is to test every vendor update
extensively before deploying it. What would he be paying the vendor
for?
As ballombe correctly pointed out, stability has two dimensions:
1. Get the bugs out
2. Don't change anything that a user or an app could possibly depend on
enough. There may be examples where I would agree, but I don't think
that criticism is fair at large.
With the speed that innovation happens in the kernel community, the
stability of anything userspace can see and expect to be stable is
quite OK.
- Sometimes, we have not been clear enough of what we consider a stable
interface and what not. Sometimes, to get to some information, an app
actually has no choice but using unstable interfaces. sysfs is the
primary example for this.
- Sometimes, apps are horribly broken by making certain assumptions which
only happen to be true and noone sane would consider a change there to
break an interface. Yet the application breaks.
to fix, but assuming that we can get the app world 100% clean is just
too optimistic, I'm afraid. An OS vendor also will never (and should
not need to) know about 100% of the apps that customers are running.
with white lists of what has been validated ...
""This kind of breakage in an update isn't acceptable for the people paying for those expensive support contracts." If the latter position proves true, it can be seen as an illustration of the costs imposed on the process by proprietary modules."Long-term support and backport risk
Sure it can, but at least the open-source out-of-tree module is fixable.Long-term support and backport risk
Right, I was merely commenting that Dave's original quote was taken slightly out of context. There are open-source drivers that are out of tree and still go through certification on Enterprise distros. Having to rework and recertifiy those is something that vendors and users hate, especially when paying high dollars.Long-term support and backport risk
It seems to me that the sensible thing to do would be to never backport features, but to offer a recent kernel with each minor version upgrade. So if you're using RHEL 4 and you want infiniband, you upgrade to RHEL 4.3, and you get a kernel that infiniband waas merged for that's also survived a lot of Red Hat testing. If you don't want infiniband or anything else new, you stick with RHEL 4 and that's only got bugfixes.Long-term support and backport risk
That seems to presuppose that 2.6.22 is a stability kernel.. there are a ton of new features added in.. and it may have as many problems as 2.6.21 does.. people won't know until a large enough population with more than just the latest 'dell/ibm' hardware can test it. The people who pay the money usually do not test something that comes out straight. They wait until the other people have had their knocks. This kind of logic worked for the old days when 2.0.x and 2.4.x meant just stability and few new features... but that caused too many issues for other people who want the latest thing NOW and it caused too many breakings when the 2.1.x and 2.5.x tree opened. Linus has decided that he would prefer that if things break.. people are going to be held responsible for it right away.. not 18 months later then 2.8.0 shows up.Long-term support and backport risk
2.6.22 isn't changing anything as tricky as 2.6.21 changed. It's not going to be a stability kernel, in the sense of being focused on solidifying things without introducing anything new or substantially different, but it's probably going to go better than 2.6.21, just due to not undermining so many long-time assumptions.Long-term support and backport risk
people who think that 2.0, 2.2 or 2.4 kernels never had problem releases like 2.6.21 just weren't there to experiance them. there were releases in all three series that make 2.6.21 look rock solid (even ignoreing the early releases in all three series)Long-term support and backport risk
Actually I did gloss over the issues of major changes in the 2.0, 2.2, and 2.4 series (2.4.9->2.4.14.. actually 2.0.10, 2.2.10 all were areas of instability). In most cases they were usually the kernel people seeing that something was majorly borked in their assumptions and having to retrofix a lot of stuff that they didnt make assumptions for. However from the items that LWN has shown on the kernel pages.. the amount of code changes in those series for a period of 2-4 sub-releases was less than what occurs between 2.6.20 and 2.6.21 or 2.6.14 and 2.6.15.Long-term support and backport risk
there is no question that the rate of change has increased drasticly.Long-term support and backport risk
Basically all the cons are boil down to "It's a lot of work" and "driver disks are hard."Long-term support and backport risk
the driver disk model breaks down once you have to update the libata core to have the latest sata work, or to update drm to have the latest graphics work. If it were just drivers, it's one thing. but in general it spans wider the longer the time lag is.Long-term support and backport risk
Sometimes you do have to patch the kernel core, but most hardware support can be dealt with through driver disks.Long-term support and backport risk
Why are they not doing the obvious?Long-term support and backport risk
Putting both the old stable kernel and a newer kernel with better hardware support for those that require that in the release?
That sounds like the best of both worlds, with minimal effort on the system administrator.
Exactly what I wss thinking.Long-term support and backport risk
This sounds like the only sane solution to me. Yes, it means a proliferation of kernels, but most of them will just be tiny updates. Not a big deal for the maintainers, and with the auto-update tools we have now, not be a big deal for the customers either.Long-term support and backport risk
It is good to be able to standardize on a certain distro like RHEL4 company wide. For older and the most recent bought hardware.Long-term support and backport risk
Long-term support and backport risk
The majority of people using server hardware never update the
hardware during the life of the equipment.
They do roll out new servers though, and want to support that new
hardware. They also run 3rd party applications and want to run the same
operating system across all their servers.
I suppose I'll be tarred and feathered for suggesting that stable device API's would go a long way towards making backporting drivers easy. If the point is to change part of the kernel while leaving the rest alone, shouldn't there be a distinct boundary between them with a clear contract? Then an "out of tree" driver can be a driver from a future kernel release.Driver API anyone?
it would also prevent a lot of improvements by defining all these fixed API'sDriver API anyone?
I still wonder why a stable API has to be such an all or nothing thing? Is a middle ground not possible, on the lines of "we will take trouble to prevent API changes, but not at any cost"? Or to put it another way, stop and think three times before breaking the API, but if you still come to the same conclusion the third time then do it anyway.Driver API anyone?
There is a middle ground: versioned APIs. Multics did thisDriver API anyone?
long before Unix even existed, and the big commercial vendors
are doing a subset of it for kernel->userland APIs.
where the size_t is about to becomes a 64-bit variable. The user
writes a call to setvbuf and at link time it's mapped internally
to setvbuf'SYSVABI_1.3 where the SYSVABI_1.3 is only visible to
tools likle (Solaris) pvs.
but our program doesn't call it: it calls the 32-bit one
from SYSVABI_1.3. Updated systems have two copies of setvbuf,
one for each version, and the elf loader uses the labelled ones
to disambiguate calls from old versus new binaries.
and if a program or driver is recompiled, it uses the newest ones.
you get the flavor...
this sounds easy...Driver API anyone?
Something like that could work (at least, I imagine it could :) ) if the API/ABI numbers applied to subsystems (e.g. the wireless extensions) and not to single functions. You would need at least two version numbers - the current one and the last one with which the current is backwards compatible. It would probably be more of an "ease the transistion pain" measure though, as I can't see the kernel developers being happy to keep maintaining older APIs once they started needing too much code of their own.Driver API anyone?
In practice, we only maintained one older version for anythingDriver API anyone?
in development, but froze anything we gave to end-user customers.
Since this is a **kernel** api we're talking about, I wouldn't
keep more tan one "old" version.
but that assumed a very active support process to keep all
the customer machines sufficiently current.
Versioned APIs/ABIs could help here where changes are unavoidable, but I think that the main problem here - at least from the point of view of those who write out-of-tree kernel extensions - is that many kernel developers do not consider the kernel APIs to be a stable external APIs at all, preferring instead that external drivers and subsystems be brought into the kernel at the earliest possible point in time. I think that they do not make too much effort to keep the interfaces stable because they do not really consider them to be interfaces.Driver API anyone?
>ome high-end users are starting to reconsider their use of Linux in some situationsLong-term support and backport risk
And what would the alternative be? It's not like there is some other magical OS out there that keep getting the shiny new features that the vendors like while at the same time not changing any of the code so there is no stability problems...
Solaris?Long-term support and backport risk
Windows?
OpenSolaris?
Long-term support and backport risk
some high-end users are starting to reconsider their use of Linux in some situations
And what would the alternative be? It's not like there is some other magical OS out there
Effectively (in a negative light) what you're saying is: "The users are stuck with Linux. They've been suckered into using something that is costing them money. It would cost them more money to get away from it. So they have no choice but to eat it."
> Gee, I don't get that at all from HenrikH's comment. I read, "The users can't do any better than Linux. It may suck, but no worse than any alternative, because the problems are fundamental to operating systems, not special to Linux."Long-term support and backport risk
"Do I make more money by paying Redhat/Microsoft/Solaris/IBM then if I don't?"
Long-term support and backport risk
How, in the eyes of end users, is it really different?
Whether or not people have access to the source code is less and less relevant to whether or not people pay for per-seat licensing
>And that's why Microsoft and Sun can spend millions of dollars testing and Red Hat cannot. Long-term support and backport risk
Long-term support and backport risk
it sure looks to me like Red Hat, you know, charges a per-seat license fee just like Microsoft and Sun?
Obviously you can use it without paying that licensing fee, but in practice you can use Windows or Solaris without paying that fee too; big businesses tend not to in both cases
have you looked at the redhat prices?Long-term support and backport risk
There is no view here of how any changes in the delivery of backports can affect the development and support of commercial applications that rely on stability of interfaces.But what about commercial Hardware and Software Vendors
If those people are targeting more than a single distribution, then distribution kernels with thousands of backports can make their lives harder, not easier. It means that there is yet another kernel to support, possibly with major differences to the upstream one.But what about commercial Hardware and Software Vendors
Odd/even sounds like a very good idea. Personally, I too think that 2.6.17, .19 and .21 seemed worse than .16, .18, .20 and (hopefully) the upcoming .22.Long-term support and backport risk
this is a coincidence, there has never been stable/testing for the y in 2.x.y the stable/testing was based on the value of xLong-term support and backport risk