LWN.net Logo

Possible changes to longterm kernel maintenance

Greg Kroah-Hartman has posted a proposal for some changes to how the stable and (especially) longterm kernels are maintained. The changes are being driven by users other than the enterprise distributors. "Now that 2.6.32 is over a year and a half, and the enterprise distros are off doing their thing with their multi-year upgrade cycles, there's no real need from the distros for a new longterm kernel release. But it turns out that the distros are not the only user of the kernel, other groups and companies have been approaching me over the past year, asking how they could pick the next longterm kernel, or what the process is in determining this." The core idea is to pick a new longterm kernel once a year; that kernel would then be maintained for two years thereafter. There is some discussion on Google+; it should move to the mailing list around August 15.
(Log in to post comments)

Possible changes to longterm kernel maintenance

Posted Aug 13, 2011 17:04 UTC (Sat) by alvieboy (subscriber, #51617) [Link]

As the kernel/distro maintainer for a commercial (focused) linux distribution, I have always been in trouble choosing which kernel to use and evaluate (as well as other core parts like C library). Latest kernels (>=2.6.32.X) have too much issues for us to use them unless in very specific scenarios. Those issues might not relate to the kernel directly (for example, we do have to support old mISDN releases), but the whole system integration is at a stake [I remember a simple iptables formatting change broke all of our reporting system, as well as frequent crashes on Xen due to heavy memory/IO contention].

We cannot afford a full regression test. We need to rely on what kernel guys tell us - it's a "longterm" and "stable" release. This means we can expect patches for severe problems to be applied, without compromising the system as a whole. We don't hack the kernel ourselves, nor have the means and knowledge to do so. Nor can we afford to be a "debian" or a "redhat" - effort spent extracting, forward/backporting patches is just too high for a company like mine.

In case you're wondering, yes we do comply to all licensing terms in all our components, and often we do contribute back (although our contributions are rarely accepted mainstream).

2 years for a long term kernel is a killer. 2 years for developing a full-fledged product is just too narrow - a 5-year support would be ideal here.

Things must move on, but for small companies (<30 employees) doing linux, tracking all kernel changes and adapting them (not all of those changes are kernel-mode specific, they require userspace adaptation) is a no-go.

Remember official Xen Dom0 support was stuck to 2.6.18.

Alvie

Possible changes to longterm kernel maintenance

Posted Aug 13, 2011 17:38 UTC (Sat) by clugstj (subscriber, #4020) [Link]

Seems to me you should not try to be a "debian/redhat", but just use their kernel.

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 8:22 UTC (Sun) by lkundrak (subscriber, #43452) [Link]

Note that it is now somehow hard to track RedHat kernel.

Possible changes to longterm kernel maintenance

Posted Aug 25, 2011 6:47 UTC (Thu) by hpro (subscriber, #74751) [Link]

There was a lot of stir about that when it landed, but what happpened after that? Did everyone just settle, taking it, or did someone actually try to go through the motions for arguing that a tarball is not "preferred form" ?

Possible changes to longterm kernel maintenance

Posted Aug 13, 2011 19:29 UTC (Sat) by jengelh (subscriber, #33263) [Link]

>I remember a simple iptables formatting change broke all of our reporting system

What change would that have been?

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 0:50 UTC (Sun) by HenrikH (guest, #31152) [Link]

And even so, wouldn't that be a userspace change?

Possible changes to longterm kernel maintenance

Posted Aug 20, 2011 18:58 UTC (Sat) by BenHutchings (subscriber, #37955) [Link]

As the kernel/distro maintainer for a commercial (focused) linux distribution, I have always been in trouble choosing which kernel to use and evaluate (as well as other core parts like C library).

Then I suggest you create a derivative of a long-term supported distribution (Debian/RHEL/SLE/Ubuntu) instead of trying to create one from scratch.

We cannot afford a full regression test. [...] Nor can we afford to be a "debian" or a "redhat" - effort spent extracting, forward/backporting patches is just too high for a company like mine.

I have to take issue with this idea that Debian has massive development resources. Adjusting for available time, the entire Debian kernel team probably adds up to no more than 2 full-time developers. AFAIK, none of us are paid to work on Debian; it's all spare time.

What you are saying is: we want other people to do maintenance for us. You are perfectly allowed to do that; in fact most distributions are largely dependent on upstream developers for maintenance of most packages. But I think you'll have to track another long-term distribution. If you care about support for new hardware, for example, you aren't going to get it in the longterm series, which only adds support where the necessary code change is very small.

Possible changes to longterm kernel maintenance

Posted Aug 13, 2011 18:51 UTC (Sat) by rahulsundaram (subscriber, #21946) [Link]

Seems like Google Plus has become a Linux kernel hacker destination for GNOME 3 rants and Linux kernel discussions as well. I wonder what the Free software implications are.

Possible changes to longterm kernel maintenance

Posted Aug 13, 2011 19:05 UTC (Sat) by cesarb (subscriber, #6266) [Link]

> Seems like Google Plus has become a Linux kernel hacker destination for GNOME 3 rants and Linux kernel discussions as well. I wonder what the Free software implications are.

Well, Linus is using it. So the Free Software implication probably is "if Linus is ever booted from it, he will write in two weeks something even faster, and licensed under GPLv2 (but not later)". Or something like that.

Possible changes to longterm kernel maintenance

Posted Aug 16, 2011 8:59 UTC (Tue) by psankar (subscriber, #68004) [Link]

Very nice comment. I wish I could "+1" this ;-)

Possible changes to longterm kernel maintenance

Posted Aug 13, 2011 19:05 UTC (Sat) by fest3er (guest, #60379) [Link]

If there is no way to tell which features will be available (or are targetted) in any future version of Linux, then throw a dart at the list of releases and pick the one with the hole in it.

In a perfect world, Linux would have some sort of published feature goals that it kept up-to-date (weekly); Linux would evolve from a stable version with good features usable for a long while, through a period of releases where new features are developed, perfected and released, to another release with a solid feature set that is usable for a long while. I would choose versions with solid feature sets for long-term support. I believe I would chose a kernel for long-term support once every 4 years. On the other hand, I would choose a kernel to receive mid-term (2-year) support every year; such a kernel would be targetted to product development: give the developer a few safe kernels to use through the current period of kernel development turmoil. This would give businessmen what they need: a long-term outlook on which they can build their own long-term plans for their own long-term support and new product development.

Think of Linux as a duck on water. On the surface, Linux appears solid and serene (long-term-support), but below, the feet (development) are going a mile a minute every which way. Over time, new ducklings (mid-term support) grow into ducks and old ducks die off.

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 3:28 UTC (Sun) by vonbrand (subscriber, #4458) [Link]

This idea was the 2.0/2.1 then 2.2/2.3, etc development model, which failed misserably because the hardware churn is just too fast. Besides, to set any type of longer-term goals just won't work. Not for the kernel (ever tried to heard cats?), and hasn't worked elsewehere either (ever seen the mesianic anouncements by MSFT for the next Windows version and compared with the resulting release?).

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 6:12 UTC (Sun) by slashdot (guest, #22014) [Link]

Also, in the current model, features that are introduced are required to be in good shape immediately at merge time, which improves stability.

Conversely, with the 2.1/2.3/2.5 models, changes were made which destabilized the whole development kernel for a long time, making it hard to test it, and casting doubt over whether the final release would actually have those issues fixed or not.

Also, in addition to new hardware support, pretty basic features are still added, requiring constant updates to enjoy them: for example Linux 3.0 added the ability to run ping without root, the ability to set alarms to wake up the system from suspend/shutdown, the ability to create holes in ext4 files and a system call to send multiple UDP packets at once, among others.

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 9:39 UTC (Sun) by lmb (subscriber, #39048) [Link]

The whole idea of separate long-term maintained software releases is, in my usual most humble opinion, quite broken. (The technical term would be "fubar".)

A fear of regressions is of course real, but better addressed by test-driven development, code review, better quality assurance, and other such mechanisms.

The fear of depreciated functionality (ABIs, APIs, drivers) is, of course, real too: but the solution "backport selected fixes" has got it exactly backwards. Instead, the functionality should be *forward-ported* for the duration.

External large patchsets that mess with internals are another problem; that is not readily fixed, but I doubt that backporting fixes them much better. They're not community players, and the community doesn't really benefit from them. So it makes sense to off-load their cost back to them; let them maintain their own backports. But be willing to help them with forward-porting, merging, or even give reasonable consideration to them maintaining an API/ABI they can rely on.

The (real or imagined) need for a "stable" base very much indicates a failure of the development process to me. It is time to fix that, instead of perpetuating the problem.

(And I'd be grateful if instead of responding to implementation details that I am missing here and which I assure you I am quite acutely aware of since this post would otherwise be not so succinct, critics would instead focus on the strategy, not the tactics. Thanks. ;-)

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 15:01 UTC (Sun) by vonbrand (subscriber, #4458) [Link]

Please read Documentation/stable_api_nonsense.txt before going on about "stability guarantees." The problems stated there are quite real, see for example the links to Raymond Chen's blog on Windows at Dan's data

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 15:25 UTC (Sun) by lmb (subscriber, #39048) [Link]

Thankfully, I didn't say anything about eternally stable APIs.

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 19:16 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

It isn't clear what strategy you're advocating. Can you be more specific?

But you do seem to be saying there shouldn't be a release stream that, for a year or two, doesn't see any changes to add features; and you didn't address the primary reason people do that: every code change carries a risk of unintended regression. Many applications are themselves stable so would not benefit from new features. Hence, a code change to add a feature would be a net loss.

Are you maybe proposing that code changes to the mainline to add features not have bugs?

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 19:44 UTC (Sun) by lmb (subscriber, #39048) [Link]

Backports for "just bugfixes" also carry the risk of unintended regressions. e.g., side effects that are either present both in the tip, or occur just in the backport because the changeset interacts with other patches that have since been applied but not backported. Or even whole fixes that would be applicable to the user base but have not yet been identified and thus not backported (yet). Or that many many users have run kernels leading up to the current tip, but the userbase testing the combination of patches in the backported environment is usually much smaller.

The whole notion that "backports are safer" is, well, a viable business model, but not necessarily sound engineering practice, at least if performed at any non-trivial scale.

Code changes to mainline carry a risk of introducing regressions or new bugs, sure. But so do backports. What I'm proposing is to strengthen tip against regressions by improved QA and process.

Another fallacy is that, because upgrades are scary, you want to do them less often. But that doesn't work out - the delta *keeps* getting larger, and the amount of time that passes during which you *didn't* pay attention does too. The cost does not go down, the effort to get it all working again actually *increases* and needs to be paid in much larger bills than if one had a reasonably fine grained continuous policy.

I already believe that, since code quality *is* getting better over time faster than it is getting worse, that upgrading is generally the safer choice - at least if the regression tests pass. (I'm not saying that we're doing all that we can or should, sure, there are things that can be improved.) But people still cling to the enterprisey-mindset.

Guys and gals, if the enterprisey mindeset worked and was overall the better choice, we'd still be running Solaris, IRIX, UnixWare and the like.

Possible changes to longterm kernel maintenance

Posted Aug 15, 2011 0:11 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

it would really help if there was support for older kernels for a little longer than there currently is for those cases where a companies regression test does run into a problem with the new kernel.

Possible changes to longterm kernel maintenance

Posted Aug 18, 2011 18:48 UTC (Thu) by vonbrand (subscriber, #4458) [Link]

You know that if you come up with the manpower to do it, it will happen. Otherwise, good luck...

Possible changes to longterm kernel maintenance

Posted Aug 15, 2011 13:38 UTC (Mon) by NAR (subscriber, #1313) [Link]

I already believe that, since code quality *is* getting better over time faster than it is getting worse

Even if it's true, I do think the improvement is not monotonous. And it's really hard to explain costumers that "in the name of forward progress we've just broke your system and you won't be able to get any work done for a week on the system you've paid thousands of dollars".

Guys and gals, if the enterprisey mindeset worked and was overall the better choice, we'd still be running Solaris, IRIX, UnixWare and the like.

Actually we still run some of them. And we run stable distributions, for example on the server I'm working the kernel is more than 3 years old.

Possible changes to longterm kernel maintenance

Posted Aug 15, 2011 20:16 UTC (Mon) by raven667 (subscriber, #5198) [Link]

In a previous existence we just started looking at migrating workloads from 2.4.21 (RHEL3 when it went unsupported) that hadn't even been run on a 2.6.x system before. The user space changes were more than the kernel changes though.

I run RH derived systems and I think the differences between kernel releases in the 2.6.x and now 3.x world are becoming less and less transformative and disruptive as the kernel is now a very mature software project. I feel somewhat ambivalent between running RH 2.6.18 or RH 2.6.32 (or even the older RH 2.6.9) which is very unlike moving from RH 2.4.21 to 2.6.x which was so clearly better.

I run a vendor kernel and not kernel.org so I'm not following the head of development but from reading lwn it seems that newer releases have higher required quality and fewer and less severe regressions than kernels say 5 years ago. I don't see a technical reason why vendors couldn't move to a new 3.x version during each major service pack release and run 3.x.y for security-only updates during a particular release. As long as the kernel is run in the wild and put through their QA it shouldn't be that different from the current case of back porting.

The big problem is selling that change to enterprise customers who don't want to see the version number change even though that is the reality of what is going on as the kernel gets backported major new features at new service pack release levels.

Possible changes to longterm kernel maintenance

Posted Aug 14, 2011 23:00 UTC (Sun) by welinder (guest, #4699) [Link]

Them pesky customers! :-)

> The (real or imagined) need for a "stable" base very much indicates a
> failure of the development process to me.

Most users don't want to upgrade their software daily -- "the software
is here for me, not the other way around". Users will happily wait
years and long as the old photoshop (gimp, whatever) does what they
need.

But security updates are different: through no fault of the user,
he has to update because he is screwed if he doesn't update soon.
If software A needs to be updated and requires updates to software
B, C, and D, then pretty soon the menus in a program have changed
and two others are broken until the authors update them to work with
the newer libraries.

So clearly a stable base is needed -- for the kernel as well as other
parts of the software stack. Distributions fill this role reasonably
well.

> A fear of regressions is of course real, but better addressed by
> test-driven development

Lovely in theory, but for the kernel with bugs that often depend on
insanely complex interactions between multiple programs and/or
machines, I don't think anyone has a workable inkling about how to
even begin doing that.

> code review, better quality assurance, and other such mechanisms.

Infinite, highly skilled man power is a pipe dream. (There wouldn't
be any need for security updates if it wasn't.)

Possible changes to longterm kernel maintenance

Posted Aug 15, 2011 20:24 UTC (Mon) by raven667 (subscriber, #5198) [Link]

The kernel doesn't change its userspace ABI, that is intentially kept very stable, so while I agree that for large complex userspace software following the development head can put you in dependancy hell, that problem doesn't really exist for the kernel. The kernel is tied to certain userspace components that are required to change on updates but those components generally don't have complex dependancies and so aren't going to drive A needing B and C where D also requires C=v1.2 which requires D to be bumped which then requires, blah blah blah. (actually the fact that modern package managers can model this kind of interaction at all is sickly amazing).

I'm not convinced that the status quo of having vendors keep the version number string unchanging while they muck around with the internals backporting security fixes and features from future kernels is actually less disruptive than just QA'ing newer kernel releases periodically and shipping that.

Possible changes to longterm kernel maintenance

Posted Aug 18, 2011 15:27 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

The kernel doesn't intentionally break it's ABI but that does happen. And if you think driver regressions don't happen in new kernels, you're sadly mistaken. When I was running Fedora 8, the Intel Wireless driver broke so often that I had to reconfigure Yum to keep a large number of old kernels around, because I could never be sure which one would work for me. And video drivers break too.

And if widely-used consumer hardware can have such obvious regressions, you can't tell me that less-well-tested specialty enterprise hardware won't also, at least occasionally, suffer regressions.

Possible changes to longterm kernel maintenance

Posted Aug 18, 2011 23:28 UTC (Thu) by raven667 (subscriber, #5198) [Link]

I think we are talking past one another. I am not suggesting that regressions don't happen and that they blindly ship new releases, damn the torpedoes full speed ahead. The enterprise distributions already pull new versions of the major network and storage drivers, backport them to the older kernel release then QA and ship that. If there are regressions in the drivers then they still have the same problem.

What I'm suggesting is to be less afraid to update the version number and track a kernel.org stable release and update to new kernel.org stable releases over the lifetime of the product rather than maintaining a private stable release that never shows a version number change over the product lifecycle. I've found that endlessly confusing to admins, auditors and consultants, trying to match up feature documentation and bugfixes between kernel.org and vendor kernels.

The enterprise vendors already have a large QA process before shipping new kernel features in service packs, is there really a meaningful difference in the number of regressions between a large number of private backports and custom integrations or just re-basing on upstream stable releases periodically. Would it really be harmful to have the enterprise vendor just take ownership of maintaing a stable kernel.org release so that patches and whatnot in that release are documented and the version number is bumped along with other kernel.org releases?

For example I'd like to dispense with the fiction that labeling the RHEL6 kernel 2.6.32 is really a meaningful description because it is _not_ accurate. If they said it's 2.6.32.45 maybe that would be better.

Possible changes to longterm kernel maintenance

Posted Aug 19, 2011 0:54 UTC (Fri) by mrshiny (subscriber, #4266) [Link]

Having the enterprise distros collaborate on the stable kernel branch is probably not a bad idea. But its lifespan is typically far less than an enterprise distro, on the order of years. So Red Hat still needs to maintain old kernels or, as you say, fully QA a new kernel.

My point is that no matter what the kernel.org people say, the kernel does have regressions often. This is particularly a problem because the drivers must ship with the kernel. This means that if you have a regression anywhere in the codebase that applies to your computer/workload, you need to upgrade or downgrade the whole kernel. This is why people are afraid of changing their kernel. For over a year I was unable to upgrade my kernel because one or two of the drivers I used were unstable from release to release. I didn't have the choice of pinning one driver version that worked, while upgrading all the rest of the kernel.

This is why the stable kernel series, or a stable kernel maintained by the enterprise distros is a needed thing. Customers don't want the new kernel to break something. Even a small bug in a single driver somewhere might severely disrupt operations after an upgrade. So upgrading the whole kernel is a risk because it all changes.

I don't subscribe to the "stable api nonsense" idea. I've been personally bitten by the fact that every driver in the kernel is permanently locked together in one big tarball with the kernel itself. I know I'm not the only one. The enterprise distros don't want this to happen to their customers, so they have no choice but to maintain a specific kernel.

Possible changes to longterm kernel maintenance

Posted Aug 19, 2011 1:52 UTC (Fri) by foom (subscriber, #14868) [Link]

And let me just relate my experience, as a perhaps representative example. I'm running mostly CentOS "2.6.18" right now. Upstream 2.6.22 was also used successfully in the past, but it seemed better to stay on a maintained kernel.

I've tried upgrading to a new kernel multiple times, because the new kernels do have some new features I'd like to use (actual new features, not just drivers). But each time I've tried upgrading, I've hit major regressions in my workload. Finding a new kernel version that works is not actually the most important work to do, so there's usually significant delays for someone to get around to figuring out what's wrong, or trying again with a different version.

The following versions have been attempted since 2.6.22 (to the best of my recollection this is right)
2.6.25: File corruption when using writev (introduced in 2.6.23!).
2.6.26: I forget what was wrong with the initial attempt, but something was broken. The debian lenny 2.6.26 appears to work now, though not using it much (since most prod boxes are running centos).
2.6.29: (or thereabouts) introduced serious performance degradation in our workload due (I think) to disk page-in behavior changes in mmaped files. Didn't really attempt to track down, got distracted by other stuff.
2.6.31: file corruption when using writev in an ext3 fs. (different bug from before)
2.6.32: works, but still worse performance than RH18 and Deb26, though not as bad as 29. Same behavior with Deb2.6.32.
2.6.38: Currently running on a few systems. Performance seems to be better. Haven't found anything critically wrong yet...so far so good?

So anyways, in *my* experience, new upstream kernel versions have a dismal success rate, while new patch releases of a working stable (distro) kernel version have a 100% success rate.

I expect it's not really the case that no releases between 22 and 38 worked, I just never managed to hit one -- unlucky.

Possible changes to longterm kernel maintenance

Posted Aug 19, 2011 2:49 UTC (Fri) by raven667 (subscriber, #5198) [Link]

Doing regression testing and QA work on kernel.org isn't something that the average admin wants to be spending a lot of time on, I agree, which is why people pay the big enterprise distributers to do that for them. For example redhat shipped based on 2.6.18, do you think there was some push to make that a golden perfect release, different than the standards for 2.6.22-38 that you tested, or do you think it achieved its stability for your workload via QA and testing by redhat? Was it really more like a 2.6.18.n release with the major problems fixed that became 2.6.?.n++ as time went on. Do you think that the amount of change between the original shipping RH 2.6.18 and the current RH 2.6.18 given the backported drivers, any infrastructure the new drivers depend on, wholly new subsystems, backported fixes and infrastructure changes are any different than the changes between 2.6.18 and say 2.6.32 if you gave 2.6.32 the same kind of QA and testing that went into stabilizing 2.6.18 for the enterprise customers.

The vendors and kernel.org have been converging for years, since the low point of RH 2.4.21 which was basically _nothing_ like kernel.org 2.4.21. It'll be years until there is enough market pressure to justify working on making a new RHEL7, I've known people who are barely getting off RHEL3, but I think it's a worthwhile question to ask whether the next major enterprise version shouldn't just totally converge with kernel.org or at least periodically re-base and do the full QA/test cycle rather than try to maintain and ever increasing diff off of some random kernel version that is no more or less pristine and bug-free than any other version.

Possible changes to longterm kernel maintenance

Posted Aug 19, 2011 13:04 UTC (Fri) by foom (subscriber, #14868) [Link]

> Do you think that the amount of change between the original shipping RH 2.6.18 and the current RH 2.6.18 [...] are any different than the changes between 2.6.18 and say 2.6.32 if you gave 2.6.32 the same kind of QA and testing that went into stabilizing 2.6.18 for the enterprise customers.

Yes, I do. I think the changes RH makes in their stable updates are significantly smaller than the changes that upstream makes in new releases, and thus significantly less likely to introduce regressions to existing users. And thus less costly for RH to test and qualify, as well.

Possible changes to longterm kernel maintenance

Posted Aug 20, 2011 19:08 UTC (Sat) by BenHutchings (subscriber, #37955) [Link]

The kernel doesn't change its userspace ABI, that is intentially kept very stable...

This is simply not true. Parts of procfs and sysfs are quite deliberately changed or removed. This generally happens after a deprecation period of years, but userland isn't always updated fast enough (and in the case of proprietary programs there is nothing that distributions can do about it).

Possible changes to longterm kernel maintenance

Posted Aug 15, 2011 11:14 UTC (Mon) by ahoogerhuis (subscriber, #4041) [Link]

I may be blind as a bat, but where is the official place to go look for which versions are currently being maintained for the longterm?

-A

Possible changes to longterm kernel maintenance

Posted Aug 18, 2011 7:47 UTC (Thu) by Zenith (subscriber, #24899) [Link]

http://www.kernel.org/ should do the trick

Possible changes to longterm kernel maintenance

Posted Aug 18, 2011 7:48 UTC (Thu) by ahoogerhuis (subscriber, #4041) [Link]

Thanks. I stand by my claim, being blind as a bat.

-A

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds