|
|
Subscribe / Log in / New account

Ksplice and CentOS

By Jonathan Corbet
July 26, 2011
Ksplice first announced itself in 2008 as a project for "rebootless kernel security updates" based at MIT. The students behind the project soon graduated, and so did the project itself; a company by the same name was formed to offer commercial no-reboot patching to customers who cared deeply about uptime. Ksplice Inc. also offered free update services for a number of distributions. Much of this came to an end on July 21, when Oracle announced that it had acquired Ksplice Inc. and would incorporate its services into its own Linux support offerings. A free form of ksplice might just live on, though, with support from an interesting direction.

On the same day that Oracle announced the acquisition, CentOS developer Karanbir Singh suggested that one place the CentOS community could help out would be in the creation of a ksplice update stream. CentOS updates had been available from Ksplice Inc., on a trial basis at least; the company even somewhat snidely let it be known that they were providing updates for CentOS during the first few months of 2011, when the CentOS project itself had dropped the ball on that job. Oracle-ksplice still claims to support CentOS, but there is not even a trial service available for free; anybody wanting update service for CentOS must pay for it from the beginning. (The free service for Fedora and Ubuntu appears to still be functioning, for now - but who builds a high-availability system on those distributions?).

It is hard to blame Oracle too much for this decision. Oracle has bought a company which, it believes, will make its support offerings more attractive. Making the ksplice service available for free to CentOS users, in the process making CentOS more attractive relative to commercial enterprise offerings, would tend to undercut the rationale behind the entire acquisition. While it would certainly be a nice thing for Oracle to provide a stream of ksplice updates for CentOS users, that is not something the company is obligated to do.

So if CentOS is to have an equivalent service, it will have to roll its own. There are a few challenges to be overcome to bring this idea to fruition, starting with the ksplice code itself. That code, by some strange coincidence, disappeared from the Ksplice Inc. site just before the acquisition was announced. The Internet tends not to forget, though, so copies of this code (which was released under the GPL) were quickly located. Karanbir has posted a repository containing the ksplice 0.9.9 code as a starting place; for good measure, there are also mirrors on gitorious and github.

Getting the ksplice code is the easy part; generating the update stream will prove to be somewhat harder. Ksplice works by looking at which functions are changed by a kernel patch; it then creates a kernel module which (at runtime) patches out the affected functions and replaces them with the fixed versions. Every patch must be examined with an eye toward what effects it will have on a running kernel and, perhaps, modified accordingly. If the original patch changes a data structure, the rebootless version may have to do things quite differently, sometimes to the point of creating a shadow structure containing the new information. And, naturally, each patch in the stream must take into account whatever previous patches may have been applied to the running kernel.

Some more information on this process can be found in this article from late 2008. The point, though, is that the creation of these runtime patches is not always a simple or mechanical process; it requires real attention from somebody who understands what the original patches are doing. CentOS has not always been able to keep up with Red Hat's patch stream as it is; the creation of this new stream for kernel patches will make the task harder. It is not immediately obvious that the project will be able to sustain that extra effort. If it does work out, though, it would clearly make CentOS a more attractive distribution for a number of high-uptime use cases.

An interesting question (for those who are into license lawyering, anyway) is whether a patch in Oracle's ksplice stream constitutes a work derived from the kernel for which the source must be provided. Having access to the source for Oracle's runtime patches would obviously facilitate the process of creating CentOS patches.

Even if a credible patch stream can be created, there is another challenge to be aware of: software patents. The Ksplice Inc. developers did not hesitate to apply for patents on their work; a quick search turns up these applications:

The first of these has a claim reading simply:

A method comprising: identifying a portion of executable code to be updated in a running computer program; and determining whether it is safe to modify the executable code of the running computer program without having to restart the running computer program.

That is an astonishingly broad claim, even by the standards of US software patents. One should note that both of the applications listed above are exactly that: applications. Chances are that they will see modifications before an actual patent is granted - if it is granted at all. But the US patent office has not always demonstrated a great ability to filter out patents that overreach or that are clearly covered by prior art.

Once again, license lawyers could get into the game and debate whether the implied patent license in the GPL would be sufficient to protect those who are distributing and using the ksplice code. Others may want to look at Oracle's litigation history and contemplate how the company might react to a free service competing with its newly-acquired company. There are other companies holding patents in this area as well. Like it or not, this technology has a potential cloud over it.

It all adds up to a daunting set of challenges for the CentOS project if it truly chooses to offer this type of service. That said, years of watching this community has made one thing abundantly clear: one should never discount what a determined group of hackers can do if they set their minds to a task. A CentOS with no-reboot kernel updates would be an appealing option in situations where uptime needs to be maximized but there are no resources for the operation of a high-availability cluster. If the CentOS community wants this feature badly enough, it can certainly make it happen.


to post comments

Who will want rebootless upgrades?

Posted Jul 26, 2011 23:42 UTC (Tue) by felixfix (subscriber, #242) [Link] (23 responses)

It might be fun to see how long a system can be kept running and updated without reboots, but it seems to me that systems too important to reboot are going to be clusters and not single systems. I am having trouble imagining what kind of system is so vital that it can't spare the reboot time, yet is all by itself.

Who will want rebootless upgrades?

Posted Jul 27, 2011 0:06 UTC (Wed) by ESRI (guest, #52806) [Link]

I think given proper planning and architecturing, you're right -- these sorts of systems shouldn't exist.

However, I am sure they can and do evolve inadvertantly. Not here of course! ;-)

Who will want rebootless upgrades?

Posted Jul 27, 2011 1:09 UTC (Wed) by mleu (guest, #73224) [Link] (12 responses)

But isn't the trend that the hypervisor is a cluster with high availability for whole VMs? Then you are back to rebooting for kernel updates of your VM...

I think that for IT departements making applications 'cluster-aware' is sometimes seen as too much of a hassle and clusters are built at the hypervisor level. So for these companies Ksplice could be useful. Too bad it seems Oracle will keep it tied to OEL.

Who will want rebootless upgrades?

Posted Jul 27, 2011 1:39 UTC (Wed) by dlang (guest, #313) [Link] (11 responses)

VM failover doesn't actually work well if the system really fails, it only works well for planned outages.

it's impossible for the entire system state to be kept in sync on another machine in real-time (the bandwidth to RAM is _much_ higher than the bandwidth to the network), so if your system crashes you have state that has not been replicated.

when doing a planned outage, you can pause the system while you sync state and do a seamless failover.

but in an unplanned failure, you are out of luck.

Who will want rebootless upgrades?

Posted Jul 27, 2011 4:58 UTC (Wed) by dtlin (subscriber, #36537) [Link]

With the wider availability of 10Gb Ethernet links now, it's not that far outpaced by main memory bandwidth.

Stratus Avance keeps full system state synchronized between two nodes in realtime, or something close to it; I think it was a year or two ago when I first saw a demo of unplanned hardware outage causing software failover with only a few seconds of downtime.

Who will want rebootless upgrades?

Posted Jul 27, 2011 13:12 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

Ah, but you don't need to keep RAM coherent after each operation. You need to do it only after network communication and/or disk IO.

It's almost like quantum mechanics: until you observe a computer its status is indeterminate. So you just keep the master computer running and replicate RAM changes only after it communicates with something external. If the master computer fails then the slave will have the state of the master after the last IO operation and will just continue from that point.

This actually works surprisingly well in practice. We use it to run legacy QNX software inside a XEN cluster.

Who will want rebootless upgrades?

Posted Jul 27, 2011 16:22 UTC (Wed) by raven667 (subscriber, #5198) [Link]

That must be how the VMware Fault Tolerance feature works as well as it is able to keep two VMs running in lock step across different pieces of hardware with low latency failover if one of the hardware dies. In any event the point is that this feature is not impossible and exists in shipping software right now.

Who will want rebootless upgrades?

Posted Jul 28, 2011 12:59 UTC (Thu) by nix (subscriber, #2304) [Link]

Quite so. The ReVirt reversible VM worked this way, but it was a research project and died :(

Who will want rebootless upgrades?

Posted Jul 28, 2011 21:35 UTC (Thu) by cmccabe (guest, #60281) [Link] (5 responses)

> Ah, but you don't need to keep RAM coherent after each operation.
> You need to do it only after network communication and/or disk IO.

It depends on how the application is written. And given that fsync still gets a blank stare from most application developers, I'm not optimistic.

Database-backed apps might have a little bit more of a chance, because database designers thought about these issues. But most database replication is still semi-synchronous rather than synchronous, because people don't want to pay the speed cost. There are also clustered databases and database engines, but they are the exception rather than the rule.

Who will want rebootless upgrades?

Posted Jul 28, 2011 22:17 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

one word:

mmap

Who will want rebootless upgrades?

Posted Aug 3, 2011 23:15 UTC (Wed) by cmccabe (guest, #60281) [Link] (1 responses)

mmap is great, but without calling msync from time to time, you have no guarantee that the data in memory matches what is on the disk. If you crash before the kernel chooses to write those pages back to disk, that work is gone.

Who will want rebootless upgrades?

Posted Aug 5, 2011 22:07 UTC (Fri) by dlang (guest, #313) [Link]

my point is that if you use mmap you don't know when a write to disk takes place. it could be well before you call msync

Who will want rebootless upgrades?

Posted Jul 29, 2011 17:38 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)

Ah, but you don't need to keep RAM coherent after each operation. You need to do it only after network communication and/or disk IO.
It depends on how the application is written.

I don't see how. The principle seems sound; how would you write an application that won't fail over properly without multiple synchronizations of memory between I/Os?

The only thing I can think of is something where time itself is part of the function -- e.g. if a query has to be answered within 30 seconds and the primary has been working on it for 25 seconds before he dies, the backup won't be able to respond within 30 seconds of the query unless he knows what the primary computed between the query and the failure.

Who will want rebootless upgrades?

Posted Jul 29, 2011 19:27 UTC (Fri) by raven667 (subscriber, #5198) [Link]

The secondary host is calculating the result at the same time as the primary using the same data. Maybe this demonstration will help make more sense.

http://www.youtube.com/watch?v=NCMMwGC0hD8

Fast forward to the 4-5m mark if you want to skip the explanation.

Who will want rebootless upgrades?

Posted Jul 27, 2011 15:15 UTC (Wed) by mleu (guest, #73224) [Link]

When you have a single application instance inside a cluster resource and the cluster node fails you also need to restart the application. It will recover faster than having to boot a whole VM, but still, it is a lot easier to encapsulate a application into a VM instead of making it cluster aware.

Who will want rebootless upgrades?

Posted Jul 27, 2011 11:19 UTC (Wed) by jamesh (guest, #1159) [Link]

While trying to keep a system up indefinitely may not seem that useful, being able to choose when to reboot is.

If a critical security vulnerability is found, it can be patched in place and the proper reboot can be delayed until scheduled maintenance since it is no longer tied to the security vulnerability.

Who will want rebootless upgrades?

Posted Jul 27, 2011 16:33 UTC (Wed) by ballombe (subscriber, #9523) [Link] (3 responses)

I used to maintain HPC servers provided to researchers to run long running computation. Researchers have ssh access to the box and are expecting 3 months uptime for their programs to run without being killed by a reboot.

Who will want rebootless upgrades?

Posted Jul 28, 2011 4:26 UTC (Thu) by ringerc (subscriber, #3071) [Link] (2 responses)

In a happy world you could checkpoint their programs, reboot the server, and resume them from their saved state.

Alas, few programs provide any kind of checkpoint facilities, and operating systems are just too complicated for general-purpose OS-based application checkpointing to work well.

Who will want rebootless upgrades?

Posted Jul 28, 2011 20:15 UTC (Thu) by Lennie (subscriber, #49641) [Link] (1 responses)

That is why people got excited when they read this:

http://lwn.net/Articles/452184/ (Checkpoint/restart (mostly) in user space)

Having a good check/point restart would be really useful.

Who will want rebootless upgrades?

Posted Jul 29, 2011 4:11 UTC (Fri) by csamuel (✭ supporter ✭, #2624) [Link]

Actually us HPC people (well, me at least) are far more interested in the Berkeley Labs Checkpoint Restart (BLCR) work as they are doing great work getting it integrated in Open-MPI and the Torque resource manager to allow the checkpoint/restart of parallel MPI jobs across multiple nodes in HPC clusters including (I believe) being able to restart on a different set of nodes to those the job was started on.

Here's the latest report on this work in progress:

http://www.supercluster.org/pipermail/torquedev/2011-July...

and here's the BLCR website:

https://ftg.lbl.gov/projects/CheckpointRestart/

Who will want rebootless upgrades?

Posted Aug 4, 2011 2:20 UTC (Thu) by zander76 (guest, #6889) [Link] (3 responses)

LOL, I don't like to reboot my computer let alone something collecting financial data.

Your forgetting one major aspect and that is MONEY. It costs money to do all of these things.

The other aspect your not thinking about is that NEW systems might be clustered but what about all the old systems that have been running for years and haven't been touched because they worked so why mess with it.

I worked at a telephone company that had a system running that the admin had been fired 2 years before and he was the only person that still had root access. The box continued to do its job and everybody simply looked the other way ( it had no outside access that we knew of ).

Who will want rebootless upgrades?

Posted Aug 4, 2011 9:37 UTC (Thu) by elanthis (guest, #6227) [Link] (2 responses)

And these machines are security liabilities.

I for one was rather in favor of some of the old (dead) proposals a few distros had about having a "kill switch" in the init scripts that would refuse to allow Internet connectivity if the OS release date (from /etc/foo-release) was more than X years old.

We'd all be a lot better off if Microsoft had done this for Win9x/ME. ;)

Who will want rebootless upgrades?

Posted Aug 4, 2011 18:51 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

And when the kill switch fires up, how do you upgrade it? You have to set the clock back and kill ntpd or something. Why not just print a 'this is out of date' message like clamav does? (Though, OK, they used a kill switch too, but just once, and a dead clamav doesn't stop you from upgrading it.)

Who will want rebootless upgrades?

Posted Aug 5, 2011 3:44 UTC (Fri) by elanthis (guest, #6227) [Link]

Those issues are exactly why nobody actually does it.

My solution would basically be to print a "you're a moron who should've upgraded two major releases ago, you don't deserve to be on the Internet" message. Granted, I'm also the kind of person who thinks that cops should pull over people who don't use their turn signals, ask them for their license and registration, and then toss both into the nearest ditch while telling the driver that he's not competent enough to be allowed the privilege of operating a two ton steel missile on wheels... ;)

Ksplice and CentOS

Posted Jul 27, 2011 11:13 UTC (Wed) by jamesh (guest, #1159) [Link] (2 responses)

If these patents cover the KSplice code itself, then surely the fact that KSplice Inc licensed their code under the GPL should give everyone the rights they need.

It would only be a problem if you wanted to use the patents under less restrictive terms, or reimplement the idea from scratch.

Ksplice and CentOS

Posted Jul 27, 2011 19:28 UTC (Wed) by jhhaller (guest, #56103) [Link] (1 responses)

There is long standing prior art for applying patches to existing systems without needing a reboot. What the patents appear to cover is the process for generating the patches without the existing code needing to be prepared to be patched ahead of time, and generally automating that process of creating the patches.

Telephony systems have long had the capability to support some limited changes, but typically required data structures to have some extra space in case it was needed, and typically used indirect function calls so that the pointer could be changed. Creating patches for these systems would require manual allocation of formally free space. This is the prior art which seems to cover most if not all of the ksplice kernel functionality.

Some of the problem of structure resizing is averted by staying within a Redhat 5 or 6 or CentOS 5 or 6, as kernel data structure changes are minimized, as kernel modules generally don't need to be recompiled between 5.X versions. I doubt there is a ksplice patch to go from 5.X to 6.X, and moving between arbitrary kernel versions is also likely to be difficult.

Ksplice and CentOS

Posted Jul 28, 2011 3:30 UTC (Thu) by jamesh (guest, #1159) [Link]

Right, but the software used to perform these patented actions (well, patent pending at least) has been released by the patent holders under the GPL. If Oracle decided to sue you, you could make a good case that they had licensed their patents to you for the purpose of running/modifying/distributing the software.

This is different to cases where you have GPL software that infringes third party patents.

Ksplice and CentOS

Posted Jul 27, 2011 13:26 UTC (Wed) by dsommers (subscriber, #55274) [Link] (2 responses)

CentOS is by all means a project with great visions, and they have managed to provide a good distribution. But from the CentOS 6.0 and CentOS 5.6, their updates fell like rocks. It took twice as long for CentOS 5.6 to come out compared to earlier releases, CentOS 6 needed something like 8 months to be released. Now they have to manage to get the 5.7 and 6.1 releases out. This is not a flame invite on their release cycle. This is the hard facts [1].

So how will CentOS be able to provide ksplice updates *in addition* to their pending updates? And especially if they try to provide this for 6.x kernels, where there are no patch-by-patch files to work with.

I honestly believe CentOS should rather focus now on getting their current supported releases updated and improve their working methods to make releases go more smoothly before looking at something like ksplice.

Seriously! ScientificLinux have until now managed to complete there 6.0, 5.6 and 6.1_rc1, with a pending 6.1 release within the coming couple of weeks (or even earlier). CentOS have barely managed to get 6.0 out-the-door, with 5.6 a bit earlier. On their announce-list there is not much CentOS 6 updates by now [2], even though ScientificLinux have had several updates [3].

Unless CentOS manages to improve their update processes, to make them more efficient, I have not faith in CentOS being able to support ksplice in addition. It's a great goal, but I don't believe it's currently possible.

Why so harsh? I consider ksplice a poor-man's cluster solution, for users who don't have a cluster to work with. If you have services which needs to be running 24/7, you don't do that on a single box, you have a cluster where you can boot each node individually in the mean time. And such sites who can afford proper clusters, most likely can afford RHEL or OEL.

ksplice is great for short term highly critical updates, to fix an exploit instantly and to get the proper fixed kernel booted when the maintenance window appears. If CentOS continues to be slow at their kernel updates to start with, there is little benefit of having ksplice to start with. This can be twisted, if CentOS manages to get ksplice updates released quicker than normal kernel updates. But in that case, I would claim their priorities is not really well aligned with the vast majority of their users.

[1] http://en.wikipedia.org/wiki/CentOS#Release_history
[2] http://thread.gmane.org/gmane.linux.centos.announce/
[3] http://listserv.fnal.gov/scripts/wa.exe?A1=ind1107&L=... (Look for SL6.x subjects)

Ksplice and CentOS

Posted Jul 29, 2011 2:25 UTC (Fri) by sciurus (guest, #58832) [Link] (1 responses)

In fairness to CentOS, despite the problems they certainly have they got version 5.6 out the door much sooner than Scientific Linux did. Prioritizing it over version 6.0 was the right choice.

http://www.standalone-sysadmin.com/~matt/centos-sl-delays...

Ksplice and CentOS

Posted Jul 29, 2011 13:08 UTC (Fri) by dag- (guest, #30207) [Link]

Still you cannot claim that 3 months without security updates is a splendid achievement. So the original commenter is correct in the assessment that there's more merit in improving the release of security updates than forking new exciting projects. Timely security updates affect more users than a working ksplice (as that's only kernel related).

Unfortunately, CentOS makes it quite obvious that they don't plan to open up the CentOS rebuild effort, so we officially don't know what caused the 5.6 release to slip 3 months, or the 6.0 release to be 8 months behind. (QA had been waiting for several weeks before anything was there to QA, and even then known fixes were slow to materialize).

http://lists.centos.org/pipermail/centos/2011-May/111670....

Having only 3 people know how to do and perform this rebuild effort is a liability, not just for getting security releases out the door.

GPL

Posted Jul 28, 2011 11:34 UTC (Thu) by robbe (guest, #16131) [Link]

> [...] whether a patch in Oracle's ksplice stream constitutes a work derived from the kernel for which the source must be provided.

To my non-lawyerly mind, it most certainly does. Which means that not only has source to be provided, but the receiver of the patch stream is able to pass it on verbatim.

I guess Oracle will try to keep that from happening via the same tricks that Red Hat uses for its update service (see https://lwn.net/Articles/430098/), the shoe being on the other foot this time.

Ksplice and CentOS

Posted Jul 29, 2011 13:13 UTC (Fri) by dag- (guest, #30207) [Link]

On the RHEL5 mailinglist, an interesting remark

http://www.redhat.com/archives/rhelv5-list/2011-July/msg0...

about Oracle being a licensee of the Open-Invention-Network patent portfolio would mean that there are no legal threats for a similar (Open Source) implementation.

http://en.wikipedia.org/wiki/Open_Invention_Network

So the patents are likely no real issue to anyone in the community ?

Ksplice and CentOS

Posted Jul 30, 2011 0:41 UTC (Sat) by sbergman27 (guest, #10767) [Link] (2 responses)

"The free service for Fedora and Ubuntu appears to still be functioning, for now - but who builds a high-availability system on those distributions?"

Ubuntu LTS would actually be a good platform for such a service. Depending upon the requirements of the vendor of the software that is intended to be "highly available". If the vendor requires RHEL or Oracle Linux (or whatever they are calling it these days) for support, then you're screwed whether you are using Ubuntu LTS, or CentOS, or Scientific Linux.

Ksplice and CentOS

Posted Aug 1, 2011 19:10 UTC (Mon) by cdmiller (guest, #2813) [Link] (1 responses)

Indeed we use Ubuntu LTS for most of our stuff, with some key services set up HA (without a need for ksplice of course). I wouldn't be surprised to find lots of Fedora, Ubuntu LTS, Debian, etc. HA setups. Our Oracle installs however, live on RHEL.

Ksplice and CentOS

Posted Aug 2, 2011 0:23 UTC (Tue) by jspaleta (subscriber, #50639) [Link]

Speaking of your Ubuntu LTS stuff....

Are you a paying Canonical support customer in some fashion? Subscriber tp Landscape management services or perhaps paying for "Ubuntu Advantage" support with the Landscape service bundled in?

-jef

Ksplice and CentOS

Posted Jul 30, 2011 17:04 UTC (Sat) by sbergman27 (guest, #10767) [Link] (2 responses)

The very last thing that the CentOS folks need to be doing is taking on more responsibilities. I had been waiting for CentOS 6, but my understanding was that the CentOS 5.6 effort was delaying it. Now, I find out that not only was the CentOS 5.6 release also delayed, but that security updates for other releases was also not happening. Sometimes is just makes sense to have a contractual agreement.

But still, I don't see a ksplice channel being attractive to folks who might require it until the CentOS project at least does a thorough and prolonged job of demonstrating that they are alive and dependable.

Sincerely,
Steve Bergman
Oklahoma City, OK (A suburb of Tuttle)

Ksplice and CentOS

Posted Aug 5, 2011 3:11 UTC (Fri) by slashdot (guest, #22014) [Link] (1 responses)

It's interesting that CentOS seems to claim that rebuilding RHEL is hard because they don't provide the exact build environment used.

However the GPLv2 clearly says:
"For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable"

So if RHEL can't be readily build, then informing both the FSF GPL violation contacts and Red Hat legal could be effective.

Ksplice and CentOS

Posted Aug 5, 2011 4:00 UTC (Fri) by elanthis (guest, #6227) [Link]

I think you're confusing something here. Red Hat provides the full SRPMs necessary to build any package they ship. That fully complies with the GPL's requirements, which are basically saying that you must ship the Makefiles (or the scripts that generate the Makefiles... or the scripts that generate the scripts that generate the Makefiles... or the scripts that generate the scripts that generate the scripts...).

What the CentOS folks are complaining about is that simply having a bunch of SRPMs does not make it particularly easy to build them all in the right order and to install them correctly to bootstrap a working OS image comprised of all those individual GPLed components. Requiring that would be similar to requiring that you provide the rsync scripts you use to mirror ftp.gnu.org so that end-users of your mirror can recreate it.


Copyright © 2011, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds