LWN.net Logo

Kernel competition in the enterprise space

By Jonathan Corbet
March 14, 2012
Kernel developers like to grumble about the kernels shipped by enterprise distributions. Those kernels tend to be managed in ways that ignore the best features of the Linux development process; indeed, sometimes they seem to work against that process. But, enterprise kernels and the systems built on them are also the platform on which the money that supports kernel development is made, so developers only push their complaints so far. For years, it has seemed that nothing could change the "enterprise mindset," but recent releases show that there may, indeed, be change brewing in this area.

Consider Red Hat Enterprise Linux 6; its kernel is ostensibly based on the 2.6.32 release. The actual kernel, as shipped by Red Hat, differs from 2.6.32 by around 7,700 patches, though. Many of those are fixes, but others are major new features, often backported from more recent releases. Thus, the RHEL "2.6.32" kernel includes features like per-session group scheduling, receive packet/flow steering, transparent huge pages, pstore, and, of course, support for a wide range of hardware that was not available when 2.6.32 shipped. Throw in a few out-of-tree features (SystemTap, for example), and the end result is a kernel far removed from anything shipped by kernel.org. That is why Red Hat has had no real use for the 2.6.32 stable kernel series for some years.

Red Hat's motivation for creating these kernels is not hard to understand; the company is trying to provide its customers with a combination of the stability that comes from well-aged software and the features, fixes, and performance improvements from the leading edge. This process, when it goes well, can give those customers the best of both worlds. On the other hand, the resulting kernels differ widely from the community's product, have not been tested by the community, and exclude recent features that have not been chosen for backporting. They are also quite expensive to create; behind Red Hat's many high-profile kernel hackers is an army of developers tasked with backporting features and keeping the resulting kernel stable and secure.

When developers grumble about enterprise kernels, what they are really saying is that enterprise distributions might be better served by simply updating to more current kernels. In the process they would get all those features, improvements, and bug fixes from the community, in the form that they were developed and tested by that community. Enterprise distributors shipping current kernels could dispense with much of their support expense and could better benefit from shared maintenance of stable kernel releases. The response that typically comes back is that enterprise customers worry about kernel version bumps (though massive changes hidden behind a minor number change are apparently not a problem) and that new kernels bring new bugs with them. The cost of stabilizing a new kernel release, it is suggested, could exceed that of backporting desired features into an older release.

Given that, it is interesting to see two other enterprise distributors pushing forward with newer kernels. Both SUSE Linux Enterprise Server 11 Service Pack 2 and Oracle's Unbreakable Enterprise Kernel Release 2 feature much more recent kernels - 3.0.10 and 3.0.16, respectively. In each case, the shift to a newer kernel is a clear attempt to create a more attractive distribution; we may be seeing the beginning of a change in the longstanding enterprise mindset.

SUSE seems firmly stuck in a second-place market position relative to Red Hat. As a result, the company will be searching for ways to differentiate its distribution from RHEL. SUSE almost certainly also lacks the kind of resources that Red Hat is able to apply to its enterprise kernels, so it will be looking for cheaper ways to provide a competitive set of features. Taking better advantage of the community's work by shipping more current kernels is one obvious way to do that. By shipping recent releases, SUSE does not have to backport fixes and features, and it is able to take advantage of the long-term stable support planned for the 3.0 kernel. In that context, it is not entirely surprising that SUSE has repeatedly pulled its customers forward, jumping from 2.6.27 to 2.6.32 in the Service Pack 1 release, then to 3.0.

Oracle, too, has a need to differentiate its distribution - even more so, given that said distribution is really just a rebranded RHEL. To that end, Oracle would like to push some of its in-house features like btrfs, which is optimistically labeled "production-ready" in a recent press release. If btrfs is indeed ready for production use, it certainly has only gotten there in very recent releases; moving to the 3.0 kernel allows Oracle to push this feature while minimizing the amount of work required to backport the most recent fixes. Oracle is offering this kernel with releases 5 and 6 of Oracle Linux; had Oracle stuck with Red Hat's RHEL 5 kernel, Oracle Linux 5 users would still be running something based on 2.6.18. For a company trying to provide a more feature-rich distribution on a budget, dropping in a current kernel must seem like a bargain.

What about the down side of new kernels - all those new bugs? Both companies have clearly tried to mitigate that risk by letting 3.0 stabilize for six months or so before shipping it to customers. There have been over 1,500 fixes applied in the 24 updates to 3.0 released so far. The real proof, though, is in users' experience. If SLES or Oracle Linux users experience bugs or performance regressions as a result of the kernel version change, they may soon start looking for alternatives. In the Oracle case, the original Red Hat kernel remains an option for customers; SUSE, instead, seems committed to the newer version.

Between these two distributions there should be enough users to eventually establish whether moving to newer kernels in the middle of an enterprise distribution's support period is a smart move or not. If it works out, SUSE and Oracle may benefit from an influx of customers who are tired of Red Hat's hybrid kernels. If the new kernels prove not to be enterprise-ready, instead, Red Hat's position may become even stronger. Learning which way things will go may take a while. Should Red Hat show up one day with a newer kernel for RHEL customers, though, we'll know that the issue has been decided at last.


(Log in to post comments)

Kernel competition in the enterprise space

Posted Mar 15, 2012 9:42 UTC (Thu) by michaeljt (subscriber, #39183) [Link]

I thought that one of the main blockers was that RedHat wanted to be able to keep binary compatibility for kernel modules, so that any module built for a kernel in a particular RHEL version would work with any kernel of the same series. Note that I say "wanted" - I doubt it works perfectly in practice and may be wrong altogether.

Kernel competition in the enterprise space

Posted Mar 15, 2012 16:27 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

It works as long as the module sticks to the defined set of symbols which make up the RHEL kernel stable ABI. RHEL kernel backports usually are also very cautious in changing the meaning of enums and other stuff like that.

Kernel competition in the enterprise space

Posted Mar 15, 2012 9:44 UTC (Thu) by dag- (subscriber, #30207) [Link]

What I think this article lacks to mention is the effect of newer kernels on userspace. While in theory kernels do not break binary compatibility, in practice means that if you switch kernels, a bunch of user-space tools need to get updated. Even up to udev and Gnome integration.

We, at the ELRepo project, offer backported drivers for RHEL and RHEL-clones (like CentOS, Scientific Linux and Oracle Linux) and one way of getting people to test newer drivers for us to backport is by providing mainline and stable kernels for RHEL. Our experience from testing these mainline kernels on RHEL5 is that the removal of certain infrastructure (in a specific case /proc/acpi entries or /dev/rtc nodes) would lead to various problems in userspace. For more information read Akemi Yagi's well written article "A kernel too far" at: http://blog.toracat.org/2011/03/a-kernel-too-far/

We did report those issues to the kernel developers, but there was little interested to get this fixed.

And while it may be easier for SuSE and Oracle to contain such problems in userspace than it is in kernelspace, in the long run even this is unsustainable and forward-porting older infrastructure may be necessary to avoid having to upgrade larger software parts. I look forward to the next 5 years and see how this plays out, but at least Red Hat's way of working, while expensive and tedious, proved to have worked out well. Oracle will have to give up its RHEL compatibility at some point though in its 10 year life-span.

Either you are leading the way or you are following, trying to manage to do both will tear you apart ;-)

Kernel competition in the enterprise space

Posted Mar 15, 2012 19:49 UTC (Thu) by iabervon (subscriber, #722) [Link]

I suspect that SuSE and Oracle may have better luck getting newer kernels to work with older userspace, in part because they have kernel developers on staff to prepare patches to newer kernels that fix the incompatibilities they find, and in part because they have more influence on the kernel development community. If, for example, SuSE wanted to fix sysfs in 3.0 to accommodate older udev, they'd have to find someone to write a change to sysfs stuff, and they'd have to get the sysfs maintainer to sign off on it, and they'd have to get the stable maintainer to include it. At the time, this would have been Greg KH getting himself to write a patch, and convincing himself and himself.

Also, SuSE intended from the release of SLES 11 with 2.6.32 to be able to upgrade the major release of the kernel, and exerted pressure to keep things in mainline from breaking SLES 11 userspace. It's a lot easier to NACK patches and revert commits that haven't seen an official release yet than revert behavior changes where different programs may have come to rely on each behavior.

Kernel competition in the enterprise space

Posted Mar 16, 2012 4:34 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

as someone who has been deploying custom kernels in production (in what decidedly qualifies as an "enterprise" datacenter) for 15 years, I can say that as long as I avoided some of the most convoluted RedHat systems (back in the kernel 2.1 and 2.3 days when the 'stable' kernel was so out of date that nobody could use it). There are really very few cases where a new kernel requires a userspace upgrade.

There are some cases where you need some new userspace tools to take advantage of some of the new kernel features, but Linus is pretty good at preventing changes that require userspace updates (and at the same time RedHat has gotten better at avoiding userspace incompatible kernel modifications)

I expect there to be very few problems with these updates

Kernel competition in the enterprise space

Posted Mar 21, 2012 20:09 UTC (Wed) by dag- (subscriber, #30207) [Link]

Well, for my RHEL6.2 system kernel 3.2 is already one kernel too far. The Intel chipset on my Thinkpad X201 starts flickering once Xorg/Gnome is started. Adding intel_iommu=igfx_off surpresses some kernel messages, but does not fix the screen flickering. Kernel 3.3 makes no difference.

And we're not even 2 years into RHEL6, 8+ years to go :-)

Kernel competition in the enterprise space

Posted Mar 15, 2012 14:56 UTC (Thu) by cma (subscriber, #49905) [Link]

Use Ubuntu Server instead... Free and open...

Kernel competition in the enterprise space

Posted Mar 22, 2012 16:34 UTC (Thu) by jospoortvliet (subscriber, #33164) [Link]

They are talking *enterprise* space, 10+ years quality (as in, by upstream engineers) support and all that. How is Ubuntu relevant here? It's not, except maybe in Mark's wet dreams :D

Kernel competition in the enterprise space

Posted Mar 20, 2012 21:40 UTC (Tue) by csamuel (✭ supporter ✭, #2624) [Link]

I would argue that, given the advice is to *not* use anything earlier than 3.2 for btrfs due a bug which can result in a corrupt filesystem on powerloss, describing Oracle characterising 3.0 as production ready for btrfs as optimistic is being rather charitable.

Especially as there is likely a disk format change necessary to fix the painfully low hard link limit in a directory which can break backuppc, gnus and (apparently) git.

Kernel competition in the enterprise space

Posted Apr 9, 2012 12:22 UTC (Mon) by Lennie (subscriber, #49641) [Link]

Trust me, they probably backported all btrfs changes from 3.2 and 3.3 to 3.0.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds