Long-term support and backport risk

[Posted June 20, 2007 by corbet]

One of the main selling points touted by many Linux-oriented vendors is stability. Once a customer buys a subscription for an "enterprise" Linux or embedded systems product, the vendor will fix bugs in the software but otherwise keep it stable. The value for customers is that they can put these supported distributions into important parts of their operations (or products) secure in the knowledge that their supplier will provide updates which keep the system bug-free and secure without breaking things. This business model predates Linux by many years, but, as the success of certain companies shows, there is still demand for this sort of service.

So it is interesting that, at the recently-concluded Linux Foundation Collaboration Summit, numerous people were heard expressing concerns about this model. Grumbles were voiced in the official panels and over beer in the evening; they came from representatives of the relevant vendors, their customers, and from not-so-innocent bystanders. The "freeze and support" model has its merits, but there appears to be a growing group of people who are wondering if it is the best way to support a fast-moving system like Linux.

The problem is that there is a great deal of stress between the "completely stable" ideal and the desire for new features and hardware support. That leads to the distribution of some interesting kernels. Consider, for example, Red Hat Enterprise Linux 4, which was released in February, 2005, with a stabilized 2.6.9 kernel. RHEL4 systems are still running a 2.6.9 kernel, but it has seen a few changes:

Update 1 added a disk-based crash dump facility (requiring driver-level support), a completely new Megaraid driver, a number of block I/O subsystem and driver changes to support filesystems larger than 2TB, and new versions of a dozen or so device drivers.
Update 2 threw in SystemTap, an updated ext3 filesystem, the in-kernel key management subsystem, a new OpenIPMI module, a new audit subsystem, and about a dozen updated device drivers.
For update 3, Red Hat added the InfiniBand subsystem, access control list support, the error detection and correction (EDAC) subsystem, and plenty of updated drivers.
Update 4 added WiFi protected access (WPA) capability, ACL support in NFS, support for a number of processor models and low-level chipsets, and a large number of new and updated drivers.

The end result is that, while running uname -r on a RHEL4 system will yield "2.6.9", what Red Hat is shipping is a far cry from the original 2.6.9 kernel, and, more to the point, it is far removed from the kernel shipped with RHEL4 when it first became available. This enterprise kernel is not quite as stable as one might have thought.

Greg Kroah-Hartman recently posted an article on this topic which makes it clear that Red Hat is not alone in backporting features into its stable kernels:

An example of how this works can be seen in the latest Novell SLES10 Service Pack 1 release. Originally the SLES10 kernel was based on the 2.6.16 kernel release with a number of bugfixes added to it. At the time of the Service Pack 1 release, it was still based on the 2.6.16 kernel version, but the SCSI core, libata core, and all SATA drivers were backported from the 2.6.20 kernel.org kernel release to be included in this 2.6.16 based kernel package. This changed a number of ABI issues for any external SCSI or storage driver that they would need to be aware of when producing an updated version of their driver for the Service Pack 1 release.

Similar things have been known to happen in the embedded world. In every case, the distributors are responding to two conflicting wishes expressed by their customers: those customers want stability, but they also want useful new features and support for new hardware. This conflict forces distributors to walk a fine line, carefully backporting just enough new stuff to keep their customers happy without breaking things.

The word from the summit is that this balancing act does not always work. There were stories of production systems falling over after updates were applied - to the point that some high-end users are starting to reconsider their use of Linux in some situations. It is hard to see how this problem can be fixed: the backporting of code is an inherently risky operation. No matter how well the backported code has been tested, it has not been tested in the older environment into which it has been transplanted. This code may depend on other, seemingly unrelated fixes which were merged at other times; all of those fixes must be picked up to do the backport properly. It is also not the same code which is found in current kernels; distributor-private changes will have to be made to get the backported code to work with the older kernel. Backporting code can only serve to destabilize it, often in obscure ways which do not come to light until some important customer attempts to put it into production.

All of this argues against the backporting of code into the stabilized kernels used in long-term-support distributions. But customer demand for features, and (especially) hardware support will not go away. In fact, it is likely to get worse. Quoting Greg again:

For machines that must work with new hardware all the time (laptops and some desktops), the 12-18 month cycle before adding new device support makes them pretty much impossible to use at times. (i.e. people want you to support the latest toy they just bought from the store.) This makes things like "enterprise" kernels that are directed toward desktops quite uncomfortable to use after even a single year has passed.

So, if one goes on the assumption that the Plan For World Domination includes moving Linux out of the server room onto a wider variety of systems, the pressure for additional hardware support in "stabilized" kernels can only grow.

What is to be done? Greg offers three approaches, the first two of which are business as usual and the elimination of backports. The disadvantages of the first option should have been made clear by now; going to a "bug fixes only" mode has its appeal, but the resulting kernels will look old and obsolete in a very short time. Greg's third option is one which your editor heard advocated by several people at the Collaboration summit: the long-term-support distributions would simply move to a current kernel every time they do a major update.

Such a change would have obvious advantages: all of the new features and new drivers would come automatically, with no need for backporting. Distributors could focus more on stabilizing the mainline, knowing that those fixes would get to their customers quickly. Many more bug fixes would get into kernel updates in general; no distributor can possibly hope to backport even a significant percentage of the fixes which get into the mainline. The attempt to graft an old support model better suited to proprietary systems would end, and long-term support Linux customers would get something that looks more like Linux.

Of course, there may be some disadvantages as well. Dave Jones has expressed some discomfort with this idea:

The big problem with this scenario is that it ignores the fact that kernel.org kernels are on the whole significantly less stable these days than they used to be. With the unified development/stable model, we introduce a lot of half-baked untested code into the trees, and this typically doesn't get stabilised until after a distro rebases to that kernel for their next release, and uncovers all the nasty problems with it whilst it's in beta. As well as pulling 'all bugfixes and security updates', a rebase pulls in all sorts of unknown new problems.

As Dave also notes, some mainline kernel releases are better than others; the current 2.6.21 kernel would probably not be welcomed in many stable environments. So any plan which involved upgrading to current kernels would have to give some thought to the problem of ensuring that those kernels are suitably stable.

Some of the key ideas to achieve that goal may already be in place. There was talk at the summit of getting the long-term support vendors to coordinate their release schedules to be able to take advantage of an occasional extra-stable kernel release cycle. It has often been suggested that the kernel could go to an even/odd cycle model, where even-numbered releases are done with stability as the primary goal. Such a cycle could work well for distributors; an odd release could be used in beta distribution releases, with the idea of fixing the resulting bugs for the following even release. The final distribution release (or update) would then use the resulting stable kernel. There is opposition to the even/odd idea, but that could change if the benefits become clear enough.

Both Greg and Dave consider the effects such a change would have on the providers of binary-only modules. Greg thinks that staying closer to the upstream would make life easier by reducing the number of kernel variants that these vendors have to support. Dave, instead, thinks that binary-only modules would break more often, and "This kind of breakage in an update isn't acceptable for the people paying for those expensive support contracts". If the latter position proves true, it can be seen as an illustration of the costs imposed on the process by proprietary modules.

Dave concludes with the thought that the status quo will not change anytime soon. Certainly distribution vendors would have to spend a lot of time thinking and talking with their customers before making such a fundamental change in how their products are maintained. But the pressures for change would appear to be strong, and customers may well conclude that they would be better off staying closer to the mainline. Linux and free software have forced many fundamental changes in how the industry operates; we may yet have a better solution to the long-term support problem as well.

Long-term support and backport risk

Posted Jun 20, 2007 17:56 UTC (Wed) by oak (guest, #2786) [Link] (2 responses)

Why not have very stable server releases which don't offer (any?) UI
functionality and separate desktop/laptop releases that are updated more
often? Once a desktop/laptop release has shown itself to be stable (in
half a year?), it (or parts of it) can be nominated as a server release?

Long-term support and backport risk

Posted Jun 20, 2007 18:10 UTC (Wed) by Bogerr (guest, #36700) [Link] (1 responses)

or another option - freeze at installation time. User get(choose) the latest stable enough kernel and stick with it.

Long-term support and backport risk

Posted Jun 21, 2007 13:56 UTC (Thu) by jamesh (guest, #1159) [Link]

The problem with that model is what to do about security updates.

If after the 4th update to e.g. RHEL users are on 5 different kernel versions depending on when they did the install, the distributor is going to have to produce 5 different security updates. That doesn't sound like it would fly.

Long-term support and backport risk

Posted Jun 20, 2007 18:08 UTC (Wed) by pj (subscriber, #4506) [Link] (2 responses)

How about more and better testing? Get vendors to fund hardware virtualization projects for the peripherals they want to be able to keep supporting - that way every developer can have access to 'the hardware', even if it's only virtual hardware. As long as it works the same, it should be fine. This would also allow for more and better automated test suites to be built by people like the autotest project.

Long-term support and backport risk

Posted Jun 20, 2007 20:10 UTC (Wed) by amikins (guest, #451) [Link] (1 responses)

It is very difficult to build a software construct of a piece of hardware that acts sufficiently alike the real hardware enough for driver testing.
This only really works well if the emulated device is created by the same folks what made the physical device.. and even then, there's a substantial probability of deviance in behavior.
In the case of devices where they're having to be reverse engineered just to produce a driver.. TALKING to the device is hard enough, without having to replicate behavior that isn't entirely understood.

Long-term support and backport risk

Posted Jun 23, 2007 2:38 UTC (Sat) by kingdon (guest, #4526) [Link]

Well, one side effect of writing such an emulator is that understanding of the hardware improves. Sure, if you are reverse engineering and guessing it becomes harder, but that is true with or without the emulator.

Now, unless there is a fairly big cultural shift, I don't really see these kinds of emulators and autotests getting popular in the kernel world (and I'll fully admit that this kind of thing is easier for, say, compilers than kernels, which have things like locking and races all over the place). But that's more a statement of "I don't think people are going to try it" than "I think if they did try it, it would be useless".

Long-term support and backport risk

Posted Jun 20, 2007 18:20 UTC (Wed) by ballombe (subscriber, #9523) [Link] (1 responses)

>One of the main selling points touted by many Linux-oriented vendors is stability. Once a customer buys a subscription for an "enterprise" Linux or embedded systems product, the vendor will fix bugs in the software but otherwise keep it stable.

Or they use Debian which provide long-term support, adhere to a strict no-backport policy, and does not require to pay a subscription.

The issue with the 3rd option (move to a current kernel) is that new kernel sometimes require updated user-land tools (udev, etc.) to run properly. This can cause quickly a cascading of update, defeating the whole point of
a stable system.

Also stability actually means much more that the software will not change too much rather than it stays bug-free. People tend to write fragile apps, secure in the knowledge that the environment will not change. If the environment change too much, the apps will fail. This is a common pattern.

Debian model problem: drivers

Posted Jul 2, 2007 1:01 UTC (Mon) by hazelsct (guest, #3659) [Link]

Great idea regarding Debian's kernel model. Just one problem though: try installing a two-year-old kernel in new hardware: it just won't work, the drivers aren't there.

I have been encouraged though by the upkeep of the 2.6.16 kernel. For some reason I have believed that was due to its use in Dapper Drake; had not known before this article that 2.6.16 is also in a major SUSE release.

So here's a fourth option: have a bunch of distros decide together to use a given kernel release, like 2.6.16 here. Then when security patches and new drivers are backported, they can all use the same patches.

The only trouble arises when new drivers require new infrastructure, like the new wireless stuff; the only way to get around that is option 3 with its set of problems. :-(

Long-term support and backport risk

Posted Jun 20, 2007 18:36 UTC (Wed) by pcampe (guest, #28223) [Link]

Now there's a major release of RHEL every 18 months (from RHEL 4 to RHEL 5 there have been about 20 months, due to the difficult Xen integration into the kernel), and every 4 months there's a minor upgrade (5.0 -> 5.1 -> 5.2), and the support is for several years (7 IIRC).

I guess it's possible to change the balance between (time between major releases) and (time between minor releases) if we change accordingly the balance between (new features in majors) and (new "features" in minors). Maybe a major release should be shipped once a year, and the <7 years of minor release support could cover only bug and security fixes.

Pros for the customer: stable ABI for third party drivers and applications once a version has been chosen; more time for choosing when(if) doing a major upgrade, knowing that a fallback exists. The vendor will have more versions to support, but each one will definitely requires a lot less effort.

Long-term support and backport risk

Posted Jun 20, 2007 18:52 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

the big problem with all these stability proposals is the idea that you can know ahead of time if the kernel release is going to be good enough or a dog.

even under the old even/odd release you had some odd kernels that were extremely stable and some even kernels that you wouldn't want to run

if the distros don't want to run the latest kernel.org kernels then let them pick a kernel 1-2 revisions back (after the -stable series has cleaned up what was found) and use that, accepting that it doesn't have all the latest fixes, but has been tested more.

I've been useing kernel.org kernels in production environments since 1996 and I see no sign of the declining quality that people keep claiming.

what I do is when I look to do a kernel upgrade I take the latest kernel and start testing it. I also watch for reports of problems with that kernel and the type of things that are fixed in -stable. after a few weeks (with me spending no more then a couple hours a week on this) it's pretty clear if this is a good candidate or not. if not I wait for the next release and try again, if so I build kernels for all my different hardware and setup some stress tests. I spend a day or two hammering on test boxes and then roll out the result to production. a year or so later I repeat the process. (unless there is a security hole found that forces an upgrade sooner)

if I were to go with distro kernels instead I would have to do about the same testing with the kernel the distro provides becouse I'm still the one responsible for any system failures and I know from painful experiance that even if the failure is completely the vendors fault I'm the one who gets blamed (after all I selected that vendor, or I should have tested more to find the problem)

Long-term support and backport risk

Posted Jun 21, 2007 4:30 UTC (Thu) by arjan (subscriber, #36785) [Link]

so distros ship a 3 months old kernel, not the brand spanking latest. that's new enough to have all the hardware, but proven enough to know if it's good or not, and if not, how much it needs fixing.

Long-term support and backport risk

Posted Jun 20, 2007 19:58 UTC (Wed) by edschofield (guest, #39993) [Link] (1 responses)

This is an interesting article about an important topic. If enterprise distributors are breaking interfaces for external drivers despite their backporting efforts, Greg's third option seems an entirely sensible proposal. The solution to the problem of regressions in newer kernels is obvious: with all the time distributors' kernel teams save in not backporting features and drivers, they can do more testing and fixing of bugs in current kernels. An effective solution to the problem of proprietary drivers is to warn customers away from such hardware in favour of "certified" open hardware, by explicitly exempting uncertified and closed hardware from guarantees of stability between service-pack updates. As side-effects, the quality of mainline kernels improves for everybody and the costs of closed hardware are transferred to the vendors responsible in the form of lost sales.

Long-term support and backport risk

Posted Jun 22, 2007 15:47 UTC (Fri) by garloff (subscriber, #319) [Link]

> The solution to the problem of regressions in newer kernels is obvious:
[...]

For enterprise customers, it's often much better to see only 5 out of 10
old bugs fixed rather than all of them but at the cost of introducing 1
new one. They need predictability, and any regression or just the risk
of it is much more painful than not having some limitations/bugs
fixed.
This is what makes the value proposition of enterprise linux work today.
Limited change that can be assessed.

It's pretty tough to avoid this occasional bug or the risk of it:
The process to assure with a high enough level of confidence that
there's no regression anywhere would require a really large effort.
The least thing a customer wants is to test every vendor update
extensively before deploying it. What would he be paying the vendor
for?

Getting every bug out still would not be enough ...
As ballombe correctly pointed out, stability has two dimensions:
1. Get the bugs out
2. Don't change anything that a user or an app could possibly depend on

Some people argue that the kernel is not keeping interfaces stable
enough. There may be examples where I would agree, but I don't think
that criticism is fair at large.
With the speed that innovation happens in the kernel community, the
stability of anything userspace can see and expect to be stable is
quite OK.

But that's not good enough:
- Sometimes, we have not been clear enough of what we consider a stable
interface and what not. Sometimes, to get to some information, an app
actually has no choice but using unstable interfaces. sysfs is the
primary example for this.
- Sometimes, apps are horribly broken by making certain assumptions which
only happen to be true and noone sane would consider a change there to
break an interface. Yet the application breaks.

And whenever we hear about such breakage we try to help the app creator
to fix, but assuming that we can get the app world 100% clean is just
too optimistic, I'm afraid. An OS vendor also will never (and should
not need to) know about 100% of the apps that customers are running.

So my guess is that the model won't change anytime soon.

Maybe it could be changed for a subset of use scenarios, where you work
with white lists of what has been validated ...

Long-term support and backport risk

Posted Jun 20, 2007 20:21 UTC (Wed) by jwboyer (guest, #23296) [Link] (2 responses)

""This kind of breakage in an update isn't acceptable for the people paying for those expensive support contracts." If the latter position proves true, it can be seen as an illustration of the costs imposed on the process by proprietary modules."

That isn't specific to proprietary modules. It can happen for any out-of-tree module, regardless of license.

Long-term support and backport risk

Posted Jun 21, 2007 6:46 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

Sure it can, but at least the open-source out-of-tree module is fixable.

Contrast that with a closed-source vendor's (*cough* NVidia *cough*) policy that their latest driver not only fixes bugs and supports newer hardware, but also discontinues support for "older" hardware.

Ouch.

Long-term support and backport risk

Posted Jun 21, 2007 13:21 UTC (Thu) by jwboyer (guest, #23296) [Link]

Right, I was merely commenting that Dave's original quote was taken slightly out of context. There are open-source drivers that are out of tree and still go through certification on Enterprise distros. Having to rework and recertifiy those is something that vendors and users hate, especially when paying high dollars.

Long-term support and backport risk

Posted Jun 20, 2007 21:56 UTC (Wed) by iabervon (subscriber, #722) [Link] (5 responses)

It seems to me that the sensible thing to do would be to never backport features, but to offer a recent kernel with each minor version upgrade. So if you're using RHEL 4 and you want infiniband, you upgrade to RHEL 4.3, and you get a kernel that infiniband waas merged for that's also survived a lot of Red Hat testing. If you don't want infiniband or anything else new, you stick with RHEL 4 and that's only got bugfixes.

They could have each stable series use the policy of the kernel.org -stable series, and have new stable series start internally reasonably frequently, becoming available to customers when they are well-tested and have no known regressions remaining relative to the previous stable series. In the testing region of a series before it gets to customers, the rules would probably permit disabling stuff and reverting problematic patches. And not every kernel.org version would ever get listed; with all the 2.6.21 problems, they'd probably just skip that one, since it'll probably be easier to get 2.6.22 into shape than figure out which changes between 2.6.21 and 2.6.22 fixed 2.6.21 regressions.

Long-term support and backport risk

Posted Jun 21, 2007 0:27 UTC (Thu) by smoogen (subscriber, #97) [Link] (4 responses)

That seems to presuppose that 2.6.22 is a stability kernel.. there are a ton of new features added in.. and it may have as many problems as 2.6.21 does.. people won't know until a large enough population with more than just the latest 'dell/ibm' hardware can test it. The people who pay the money usually do not test something that comes out straight. They wait until the other people have had their knocks. This kind of logic worked for the old days when 2.0.x and 2.4.x meant just stability and few new features... but that caused too many issues for other people who want the latest thing NOW and it caused too many breakings when the 2.1.x and 2.5.x tree opened. Linus has decided that he would prefer that if things break.. people are going to be held responsible for it right away.. not 18 months later then 2.8.0 shows up.

Long-term support and backport risk

Posted Jun 21, 2007 4:38 UTC (Thu) by iabervon (subscriber, #722) [Link]

2.6.22 isn't changing anything as tricky as 2.6.21 changed. It's not going to be a stability kernel, in the sense of being focused on solidifying things without introducing anything new or substantially different, but it's probably going to go better than 2.6.21, just due to not undermining so many long-time assumptions.

I think the people who want a really stable kernel really do best by getting the first kernel that supports everything they want, waiting for other people to hit the problems (or waiting for their system vendor to hit the problems) and then sticking with that kernel, with only -stable-quality patches afterward until they get a newer system that needs a newer kernel.

Long-term support and backport risk

Posted Jun 21, 2007 7:51 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

people who think that 2.0, 2.2 or 2.4 kernels never had problem releases like 2.6.21 just weren't there to experiance them. there were releases in all three series that make 2.6.21 look rock solid (even ignoreing the early releases in all three series)

Long-term support and backport risk

Posted Jun 21, 2007 16:48 UTC (Thu) by smoogen (subscriber, #97) [Link] (1 responses)

Actually I did gloss over the issues of major changes in the 2.0, 2.2, and 2.4 series (2.4.9->2.4.14.. actually 2.0.10, 2.2.10 all were areas of instability). In most cases they were usually the kernel people seeing that something was majorly borked in their assumptions and having to retrofix a lot of stuff that they didnt make assumptions for. However from the items that LWN has shown on the kernel pages.. the amount of code changes in those series for a period of 2-4 sub-releases was less than what occurs between 2.6.20 and 2.6.21 or 2.6.14 and 2.6.15.

[From someone who has done systems administration support of kernels from 1993.]

Long-term support and backport risk

Posted Jun 21, 2007 23:53 UTC (Thu) by dlang (guest, #313) [Link]

there is no question that the rate of change has increased drasticly.

I wouldn't be surprised if the 2.6.21->2.6.22 changes riveled or exceeded the 1.2.0 -> 1.3.0 changes. the fact that it's happening so quickly with so few problems is amazing.

when I referred to the problems in the 2.0, 2.2, and 2.4 series, I wasn't just thinking of the couple major problems, I'm remembering that there were several 'brown paper bag' releases scattered throughout the series

Long-term support and backport risk

Posted Jun 20, 2007 23:34 UTC (Wed) by error27 (subscriber, #8346) [Link] (2 responses)

Basically all the cons are boil down to "It's a lot of work" and "driver disks are hard."

At my old job, I used to create tons of driver disks. Not because the drivers were propietary but just to fix bugs and add support for newer cards. It's not so bad if you script it.

The 2.6 build system makes it easier as well.

But both RedHat and SuSE driver disk support is pretty crappy. The error messages are not useful. Debugging them is hard. It's not very well tested so out of 5 Fedora releases I supported, I think that driver disk support was completely broken in 2 (FC2 and FC5).

Also driver disk the documentation sucks.

BTW. It's interesting to look at the aacraid driver packaging. They recompile the driver for over 100 kernels (guess) and stuff it all into 1 big rpm. :P

Long-term support and backport risk

Posted Jun 21, 2007 4:28 UTC (Thu) by arjan (subscriber, #36785) [Link] (1 responses)

the driver disk model breaks down once you have to update the libata core to have the latest sata work, or to update drm to have the latest graphics work. If it were just drivers, it's one thing. but in general it spans wider the longer the time lag is.

Long-term support and backport risk

Posted Jun 21, 2007 8:22 UTC (Thu) by error27 (subscriber, #8346) [Link]

Sometimes you do have to patch the kernel core, but most hardware support can be dealt with through driver disks.

As far as I could see, RHEL3 had pretty recent libata. The last RHEL3 driver disk I created was for the 3ware 9550 which was pretty new at the time. It's been a while, but I'm pretty sure I patched the libata module in one driver disk so that's a possible option.

It is a problem dealing with kernel upgrades after the install, that's true.

I'm generally happy with RHEL.

Long-term support and backport risk

Posted Jun 21, 2007 1:59 UTC (Thu) by cine (guest, #5597) [Link] (4 responses)

Why are they not doing the obvious?
Putting both the old stable kernel and a newer kernel with better hardware support for those that require that in the release?
That sounds like the best of both worlds, with minimal effort on the system administrator.

Long-term support and backport risk

Posted Jun 21, 2007 2:19 UTC (Thu) by loening (guest, #174) [Link] (3 responses)

Exactly what I wss thinking.

The majority of people using server hardware never update the hardware during the life of the equipment. That goes for most workstation hardware as well. This means the vast majority of people utilizing RHEL4 when it first came out do not need any of the backported features, they just need bug and security fixes.

The people who will need new features are the people with new equipment who are installing RHEL4 for the first time. By definition this equipment is not production yet, and as such it's not nearly as big a problem if a bug is encountered as if a bug was hit in the supposedly old stable kernel. For them they could use a more recent kernel that RHEL4 has only included recently.

Granted, this may mean supporting more kernel versions, but I gotta imagine it'd be a lot easier putting in bug fixes on 2-3 kernels at various levels of maturity than in trying to backport features to a single kernel while still trying to maintain a high level of stability.

Long-term support and backport risk

Posted Jun 21, 2007 18:32 UTC (Thu) by bronson (subscriber, #4806) [Link]

This sounds like the only sane solution to me. Yes, it means a proliferation of kernels, but most of them will just be tiny updates. Not a big deal for the maintainers, and with the auto-update tools we have now, not be a big deal for the customers either.

The scenario: I can update all I want, I will only ever get a minimally invasive, bugfixed version of the kernel I'm already running. To get new features, I would have to explicitly ask for the new version. Of course, the newest kernel would always be used for new installations.

It's been a while since I've had to really care about uptime, thank goodness. But, frankly, if the enterprise kernel situation is as bad as this article hints, I would not touch RHEL with a 10 foot pole. Talk about scary!

Long-term support and backport risk

Posted Jun 21, 2007 18:35 UTC (Thu) by Fats (guest, #14882) [Link]

It is good to be able to standardize on a certain distro like RHEL4 company wide. For older and the most recent bought hardware.

Long-term support and backport risk

Posted Jun 24, 2007 15:47 UTC (Sun) by riel (subscriber, #3142) [Link]

The majority of people using server hardware never update the hardware during the life of the equipment.

They do roll out new servers though, and want to support that new hardware. They also run 3rd party applications and want to run the same operating system across all their servers.

Yes, some enterprise customers want to eat their cake and have it. No, we cannot leave the solving of that riddle to them. They are the customer and we should get them a usable compromise.

Driver API anyone?

Posted Jun 21, 2007 7:02 UTC (Thu) by skybrian (guest, #365) [Link] (7 responses)

I suppose I'll be tarred and feathered for suggesting that stable device API's would go a long way towards making backporting drivers easy. If the point is to change part of the kernel while leaving the rest alone, shouldn't there be a distinct boundary between them with a clear contract? Then an "out of tree" driver can be a driver from a future kernel release.

Driver API anyone?

Posted Jun 21, 2007 7:53 UTC (Thu) by dlang (guest, #313) [Link] (6 responses)

it would also prevent a lot of improvements by defining all these fixed API's

and a lot of the features that people want are more then just drivers anyway.

Driver API anyone?

Posted Jun 21, 2007 8:38 UTC (Thu) by mjthayer (guest, #39183) [Link] (5 responses)

I still wonder why a stable API has to be such an all or nothing thing? Is a middle ground not possible, on the lines of "we will take trouble to prevent API changes, but not at any cost"? Or to put it another way, stop and think three times before breaking the API, but if you still come to the same conclusion the third time then do it anyway.

And a more stable API would make it easier, not just to backport drivers, but also to forward-port out-of-tree drivers to new kernels, which in turn would make it easier for distributors to stick to kernel.org kernels, perhaps one or two releases behind the current one, with the current one as an option for non-critical setups.

Driver API anyone?

Posted Jun 21, 2007 13:13 UTC (Thu) by davecb (subscriber, #1574) [Link] (4 responses)

There is a middle ground: versioned APIs. Multics did this
long before Unix even existed, and the big commercial vendors
are doing a subset of it for kernel->userland APIs.

Assume a function setvbuf(FILE *fp, char *buf, int type, size_t size)
where the size_t is about to becomes a 64-bit variable. The user
writes a call to setvbuf and at link time it's mapped internally
to setvbuf'SYSVABI_1.3 where the SYSVABI_1.3 is only visible to
tools likle (Solaris) pvs.

Imagine that in SVABI_1.4, the int and pointer sizes bcome 64 bits,
but our program doesn't call it: it calls the 32-bit one
from SYSVABI_1.3. Updated systems have two copies of setvbuf,
one for each version, and the elf loader uses the labelled ones
to disambiguate calls from old versus new binaries.

New programs compiled after SVABI_1.4 came out use the new version,
and if a program or driver is recompiled, it uses the newest ones.

This is a blatant oversimplification (Multics did it better), but
you get the flavor...

--dave

Driver API anyone?

Posted Jun 21, 2007 21:46 UTC (Thu) by arjan (subscriber, #36785) [Link] (2 responses)

this sounds easy...

but once you need to include locking rules (as any in-kernel API has).. it gets a HECK of a lot more tricky. You end up with a WAAY too bloated kernel, or you get translations layers...

Driver API anyone?

Posted Jun 22, 2007 7:19 UTC (Fri) by mjthayer (guest, #39183) [Link] (1 responses)

Something like that could work (at least, I imagine it could :) ) if the API/ABI numbers applied to subsystems (e.g. the wireless extensions) and not to single functions. You would need at least two version numbers - the current one and the last one with which the current is backwards compatible. It would probably be more of an "ease the transistion pain" measure though, as I can't see the kernel developers being happy to keep maintaining older APIs once they started needing too much code of their own.

Driver API anyone?

Posted Jun 22, 2007 14:28 UTC (Fri) by davecb (subscriber, #1574) [Link]

In practice, we only maintained one older version for anything
in development, but froze anything we gave to end-user customers.
Since this is a **kernel** api we're talking about, I wouldn't
keep more tan one "old" version.

The Multicians never seemed to get more than one old version,
but that assumed a very active support process to keep all
the customer machines sufficiently current.

--dave

Driver API anyone?

Posted Jun 22, 2007 7:27 UTC (Fri) by mjthayer (guest, #39183) [Link]

Versioned APIs/ABIs could help here where changes are unavoidable, but I think that the main problem here - at least from the point of view of those who write out-of-tree kernel extensions - is that many kernel developers do not consider the kernel APIs to be a stable external APIs at all, preferring instead that external drivers and subsystems be brought into the kernel at the earliest possible point in time. I think that they do not make too much effort to keep the interfaces stable because they do not really consider them to be interfaces.

Long-term support and backport risk

Posted Jun 21, 2007 8:30 UTC (Thu) by HenrikH (subscriber, #31152) [Link] (7 responses)

>ome high-end users are starting to reconsider their use of Linux in some situations
And what would the alternative be? It's not like there is some other magical OS out there that keep getting the shiny new features that the vendors like while at the same time not changing any of the code so there is no stability problems...

Long-term support and backport risk

Posted Jun 21, 2007 15:44 UTC (Thu) by drag (guest, #31333) [Link] (6 responses)

Solaris?
Windows?
OpenSolaris?

If you (saying Linux, realy) does not supply the features that the users _require_ then those users will go elsewere.

If there is no elsewere, they will make one or pay somebody else until they get one. In other words.. If you don't give what the customers NEED they will simply say: "Fork you, buddy".

Sure that would be expensive and difficult. But the current situation with the Linux kernel the way it is already is expensive and difficult. Expensive and difficult enough that people are starting to want to look elsewere.

Effectively (in a negative light) what your saying is: "The users are stuck with Linux. They've been suckered into using something that is costing them money. It would cost them more money to get away from it. So they have no choice but to eat it."

All you have to do is look at SCO Unix or proprietary Solaris to see how far that attitude will take you.

Long-term support and backport risk

Posted Jun 21, 2007 23:01 UTC (Thu) by giraffedata (guest, #1954) [Link] (5 responses)

some high-end users are starting to reconsider their use of Linux in some situations
And what would the alternative be? It's not like there is some other magical OS out there
Effectively (in a negative light) what you're saying is: "The users are stuck with Linux. They've been suckered into using something that is costing them money. It would cost them more money to get away from it. So they have no choice but to eat it."

Gee, I don't get that at all from HenrikH's comment. I read, "The users can't do any better than Linux. It may suck, but no worse than any alternative, because the problems are fundamental to operating systems, not special to Linux."

But here's why I think HenrikH is wrong: I think the proprietary alternatives do have stability and new features to a degree Linux doesn't and can't have, and here's why: per-copy licensing. Per-copy licensing gives Sun the money it takes to pay people (testers) to use the code and shake out the bugs. The economics of Linux make that impossible; with freedom of redistribution, how is the company that does that testing going to get paid for it?

Long-term support and backport risk

Posted Jun 22, 2007 0:43 UTC (Fri) by drag (guest, #31333) [Link] (4 responses)

> Gee, I don't get that at all from HenrikH's comment. I read, "The users can't do any better than Linux. It may suck, but no worse than any alternative, because the problems are fundamental to operating systems, not special to Linux."

It's effectively the same thing. I don't think that he _ment_ it like I re-stated it, but more-or-less it's the same thing.

How, in the eyes of end users, is it realy different? Except for any sort of emotional content, not much.

It's compatency. It's assuming that things can't get better, that they will always be this way.

> But here's why I think HenrikH is wrong: I think the proprietary alternatives do have stability and new features to a degree Linux doesn't and can't have, and here's why: per-copy licensing. Per-copy licensing gives Sun the money it takes to pay people (testers) to use the code and shake out the bugs. The economics of Linux make that impossible; with freedom of redistribution, how is the company that does that testing going to get paid for it?

Weither or not people have access to the source code is less and less relevent to weither or not people pay for per-seat licensing.

If your a big enough company (or country) you _can_ get access to the Windows source code.

With Solaris they've effectively openned it up. Sure it still sucks compared to Linux in a lot of ways, but it's now open. People still are paying licenses.

Weither or not people choose to pay for Windows vs Solaris vs Linux comes down to more-or-less this one thing:
"Do I make more money by paying Redhat/Microsoft/Solaris/IBM then if I don't?"

Forcing people to pay licenses because they can't use anything cheaper is a losing stratigy. Same thing as figuring people don't have any choice to put up with your BS because there are no other effective alternatives.

It works for a little while, but you don't want customer's resentments. You want customers to save money by paying you money. It's possible to have a net win for everybody.

And the nice thing about Linux and open source is that it's not only makes it cheaper for the end users to use, it makes it cheaper for companies to develop and, more importantly, to support.

But if the development style of Linux... which is currently not only breaking proprietary drivers (which is immaterial, realy), but also breaking out-of-tree drivers (which is very serious), AND breaking in-tree drivers (aka regressions.. ultra serious), AND breaking userspace API (mega ultra serious).... is costing more money then it is saving by being open source then you have a serious serious problem.

It's a knife edge. Very difficult. I have no good idea on how to solve the problem.

Long-term support and backport risk

Posted Jun 22, 2007 2:24 UTC (Fri) by giraffedata (guest, #1954) [Link] (3 responses)

How, in the eyes of end users, is it really different?

No difference. The situation to which you responded, and the one that HenrikH described have the same effect on a user. I brought it up only because your comment was not responsive to the comment to which you attached it, indicating you probably misread it.

Whether or not people have access to the source code is less and less relevant to whether or not people pay for per-seat licensing

I don't think access to source code is relevant at all to this thread; you'll notice I didn't mention it.

What is relevant is that all sellers of Linux kernels permit their customers (because they have to) to make as many copies as they want and pass them on to as many people as they want, for the same price as one copy. Microsoft does not do that. Neither does Sun.

And that's why Microsoft and Sun can spend millions of dollars testing and Red Hat cannot.

Long-term support and backport risk

Posted Jun 23, 2007 15:05 UTC (Sat) by njs (subscriber, #40338) [Link] (2 responses)

>And that's why Microsoft and Sun can spend millions of dollars testing and Red Hat cannot.

Err... I'm not an expert on this here enterprise-y stuff, but it sure looks to me like Red Hat, you know, charges a per-seat license fee just like Microsoft and Sun?

(Obviously you can use it without paying that licensing fee, but in practice you can use Windows or Solaris without paying that fee too; big businesses tend not to in both cases.)

Long-term support and backport risk

Posted Jun 23, 2007 17:14 UTC (Sat) by giraffedata (guest, #1954) [Link] (1 responses)

it sure looks to me like Red Hat, you know, charges a per-seat license fee just like Microsoft and Sun?

Red Hat does not charge for a copyright license for the Linux kernel. Its license, GPL, is free as required as a condition of the copyright license to Red Hat by various authors of the Linux kernel (also GPL).

The per-seat charge you're thinking of is for maintenance service, and it is way less than Sun or Microsoft charge for their copyright licenses. That's why many people believe that Linux is much cheaper to use than Solaris or Windows.

Obviously you can use it without paying that licensing fee, but in practice you can use Windows or Solaris without paying that fee too; big businesses tend not to in both cases

I think if Red Hat asked a maintenance fee large enough to cover a Microsoft-sized test department, lots of customers would decline, and hire someone else to do the maintenance. Or get SUSE, because Novell would, legally, just take the fruits of Red Hat's testing without paying anything and continue selling SUSE at Linux prices.

Long-term support and backport risk

Posted Jun 23, 2007 17:20 UTC (Sat) by dlang (guest, #313) [Link]

have you looked at the redhat prices?

I'll admit that I haven't looked at them recently, but the last time I did they were chaging ~$1500/machine, which is about the same price that sun and others charge for the propriatary Unix, and above many of the microsoft license costs.

if you use the 'enterprise' pricing from linux vendors then it's fairly easy to cook the books to show that linux has a higher TCO then windows (you need to choose hardware and models that minimize the fact that linux is more efficant and find manpower costs that show that windows admins get paid less then unix admins, but you are close enough to make it work)

But what about commercial Hardware and Software Vendors

Posted Jun 21, 2007 9:54 UTC (Thu) by MalcYates (guest, #45868) [Link] (1 responses)

There is no view here of how any changes in the delivery of backports can affect the development and support of commercial applications that rely on stability of interfaces.

It is fine to discuss the ramifications for Linux development, but there is a whole eco system out there that use the kernal interfaces to deliver their solutions ( any virtualisation vendor for instance ...)

If Linux is to keep its advantages of stability etc, then this is a big consideration.

It is fine for us to rip and replace a kernel once a year, but that means all software and hardware certification would need to be revisited / retested / recertified, and that is not a cheap procedure for the vendors.

But what about commercial Hardware and Software Vendors

Posted Jun 21, 2007 10:32 UTC (Thu) by mjthayer (guest, #39183) [Link]

If those people are targeting more than a single distribution, then distribution kernels with thousands of backports can make their lives harder, not easier. It means that there is yet another kernel to support, possibly with major differences to the upstream one.

Long-term support and backport risk

Posted Jun 21, 2007 11:40 UTC (Thu) by jengelh (guest, #33263) [Link] (1 responses)

Odd/even sounds like a very good idea. Personally, I too think that 2.6.17, .19 and .21 seemed worse than .16, .18, .20 and (hopefully) the upcoming .22.

Long-term support and backport risk

Posted Jun 21, 2007 23:50 UTC (Thu) by dlang (guest, #313) [Link]

this is a coincidence, there has never been stable/testing for the y in 2.x.y the stable/testing was based on the value of x

this is a common mistake people make, and I think the stability or lack of it for various releases is colored by their (incorrect) expectation.

now, it is true that immediantly after a problem release there is extra effort put into makeing the next release stable, and sometimes the result is that the version after that gets even more drastic changes (hurting it's stability a bit) and I think that's what you are seeing over the last year or so since 2.6.16