Ksplice and CentOS
On the same day that Oracle announced the acquisition, CentOS developer Karanbir Singh suggested that one place the CentOS community could help out would be in the creation of a ksplice update stream. CentOS updates had been available from Ksplice Inc., on a trial basis at least; the company even somewhat snidely let it be known that they were providing updates for CentOS during the first few months of 2011, when the CentOS project itself had dropped the ball on that job. Oracle-ksplice still claims to support CentOS, but there is not even a trial service available for free; anybody wanting update service for CentOS must pay for it from the beginning. (The free service for Fedora and Ubuntu appears to still be functioning, for now - but who builds a high-availability system on those distributions?).
It is hard to blame Oracle too much for this decision. Oracle has bought a company which, it believes, will make its support offerings more attractive. Making the ksplice service available for free to CentOS users, in the process making CentOS more attractive relative to commercial enterprise offerings, would tend to undercut the rationale behind the entire acquisition. While it would certainly be a nice thing for Oracle to provide a stream of ksplice updates for CentOS users, that is not something the company is obligated to do.
So if CentOS is to have an equivalent service, it will have to roll its own. There are a few challenges to be overcome to bring this idea to fruition, starting with the ksplice code itself. That code, by some strange coincidence, disappeared from the Ksplice Inc. site just before the acquisition was announced. The Internet tends not to forget, though, so copies of this code (which was released under the GPL) were quickly located. Karanbir has posted a repository containing the ksplice 0.9.9 code as a starting place; for good measure, there are also mirrors on gitorious and github.
Getting the ksplice code is the easy part; generating the update stream will prove to be somewhat harder. Ksplice works by looking at which functions are changed by a kernel patch; it then creates a kernel module which (at runtime) patches out the affected functions and replaces them with the fixed versions. Every patch must be examined with an eye toward what effects it will have on a running kernel and, perhaps, modified accordingly. If the original patch changes a data structure, the rebootless version may have to do things quite differently, sometimes to the point of creating a shadow structure containing the new information. And, naturally, each patch in the stream must take into account whatever previous patches may have been applied to the running kernel.
Some more information on this process can be found in this article from late 2008. The point, though, is that the creation of these runtime patches is not always a simple or mechanical process; it requires real attention from somebody who understands what the original patches are doing. CentOS has not always been able to keep up with Red Hat's patch stream as it is; the creation of this new stream for kernel patches will make the task harder. It is not immediately obvious that the project will be able to sustain that extra effort. If it does work out, though, it would clearly make CentOS a more attractive distribution for a number of high-uptime use cases.
An interesting question (for those who are into license lawyering, anyway) is whether a patch in Oracle's ksplice stream constitutes a work derived from the kernel for which the source must be provided. Having access to the source for Oracle's runtime patches would obviously facilitate the process of creating CentOS patches.
Even if a credible patch stream can be created, there is another challenge to be aware of: software patents. The Ksplice Inc. developers did not hesitate to apply for patents on their work; a quick search turns up these applications:
- Method of finding a safe time to modify code of a running computer program.
- Method of determining which computer program functions are changed by an arbitrary source code modification.
The first of these has a claim reading simply:
That is an astonishingly broad claim, even by the standards of US software patents. One should note that both of the applications listed above are exactly that: applications. Chances are that they will see modifications before an actual patent is granted - if it is granted at all. But the US patent office has not always demonstrated a great ability to filter out patents that overreach or that are clearly covered by prior art.
Once again, license lawyers could get into the game and debate whether the implied patent license in the GPL would be sufficient to protect those who are distributing and using the ksplice code. Others may want to look at Oracle's litigation history and contemplate how the company might react to a free service competing with its newly-acquired company. There are other companies holding patents in this area as well. Like it or not, this technology has a potential cloud over it.
It all adds up to a daunting set of challenges for the CentOS project if it
truly chooses to offer this type of service. That said, years of watching
this community has made one thing abundantly clear: one should never
discount what a determined group of hackers can do if they set their minds
to a task. A CentOS with no-reboot kernel updates would be an appealing
option in situations where uptime needs to be maximized but there are no
resources for the operation of a high-availability cluster. If the CentOS
community wants this feature badly enough, it can certainly make it happen.
Posted Jul 26, 2011 23:42 UTC (Tue)
by felixfix (subscriber, #242)
[Link] (23 responses)
Posted Jul 27, 2011 0:06 UTC (Wed)
by ESRI (guest, #52806)
[Link]
However, I am sure they can and do evolve inadvertantly. Not here of course! ;-)
Posted Jul 27, 2011 1:09 UTC (Wed)
by mleu (guest, #73224)
[Link] (12 responses)
I think that for IT departements making applications 'cluster-aware' is sometimes seen as too much of a hassle and clusters are built at the hypervisor level. So for these companies Ksplice could be useful. Too bad it seems Oracle will keep it tied to OEL.
Posted Jul 27, 2011 1:39 UTC (Wed)
by dlang (guest, #313)
[Link] (11 responses)
it's impossible for the entire system state to be kept in sync on another machine in real-time (the bandwidth to RAM is _much_ higher than the bandwidth to the network), so if your system crashes you have state that has not been replicated.
when doing a planned outage, you can pause the system while you sync state and do a seamless failover.
but in an unplanned failure, you are out of luck.
Posted Jul 27, 2011 4:58 UTC (Wed)
by dtlin (subscriber, #36537)
[Link]
With the wider availability of 10Gb Ethernet links now, it's not that far outpaced by main memory bandwidth. Stratus Avance keeps full system state synchronized between two nodes in realtime, or something close to it; I think it was a year or two ago when I first saw a demo of unplanned hardware outage causing software failover with only a few seconds of downtime.
Posted Jul 27, 2011 13:12 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (8 responses)
It's almost like quantum mechanics: until you observe a computer its status is indeterminate. So you just keep the master computer running and replicate RAM changes only after it communicates with something external. If the master computer fails then the slave will have the state of the master after the last IO operation and will just continue from that point.
This actually works surprisingly well in practice. We use it to run legacy QNX software inside a XEN cluster.
Posted Jul 27, 2011 16:22 UTC (Wed)
by raven667 (subscriber, #5198)
[Link]
Posted Jul 28, 2011 12:59 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Jul 28, 2011 21:35 UTC (Thu)
by cmccabe (guest, #60281)
[Link] (5 responses)
It depends on how the application is written. And given that fsync still gets a blank stare from most application developers, I'm not optimistic.
Database-backed apps might have a little bit more of a chance, because database designers thought about these issues. But most database replication is still semi-synchronous rather than synchronous, because people don't want to pay the speed cost. There are also clustered databases and database engines, but they are the exception rather than the rule.
Posted Jul 28, 2011 22:17 UTC (Thu)
by dlang (guest, #313)
[Link] (2 responses)
mmap
Posted Aug 3, 2011 23:15 UTC (Wed)
by cmccabe (guest, #60281)
[Link] (1 responses)
Posted Aug 5, 2011 22:07 UTC (Fri)
by dlang (guest, #313)
[Link]
Posted Jul 29, 2011 17:38 UTC (Fri)
by giraffedata (guest, #1954)
[Link] (1 responses)
I don't see how. The principle seems sound; how would you write an application that won't fail over properly without multiple synchronizations of memory between I/Os?
The only thing I can think of is something where time itself is part of the function -- e.g. if a query has to be answered within 30 seconds and the primary has been working on it for 25 seconds before he dies, the backup won't be able to respond within 30 seconds of the query unless he knows what the primary computed between the query and the failure.
Posted Jul 29, 2011 19:27 UTC (Fri)
by raven667 (subscriber, #5198)
[Link]
http://www.youtube.com/watch?v=NCMMwGC0hD8
Fast forward to the 4-5m mark if you want to skip the explanation.
Posted Jul 27, 2011 15:15 UTC (Wed)
by mleu (guest, #73224)
[Link]
Posted Jul 27, 2011 11:19 UTC (Wed)
by jamesh (guest, #1159)
[Link]
If a critical security vulnerability is found, it can be patched in place and the proper reboot can be delayed until scheduled maintenance since it is no longer tied to the security vulnerability.
Posted Jul 27, 2011 16:33 UTC (Wed)
by ballombe (subscriber, #9523)
[Link] (3 responses)
Posted Jul 28, 2011 4:26 UTC (Thu)
by ringerc (subscriber, #3071)
[Link] (2 responses)
Alas, few programs provide any kind of checkpoint facilities, and operating systems are just too complicated for general-purpose OS-based application checkpointing to work well.
Posted Jul 28, 2011 20:15 UTC (Thu)
by Lennie (subscriber, #49641)
[Link] (1 responses)
http://lwn.net/Articles/452184/ (Checkpoint/restart (mostly) in user space)
Having a good check/point restart would be really useful.
Posted Jul 29, 2011 4:11 UTC (Fri)
by csamuel (✭ supporter ✭, #2624)
[Link]
Here's the latest report on this work in progress:
http://www.supercluster.org/pipermail/torquedev/2011-July...
and here's the BLCR website:
https://ftg.lbl.gov/projects/CheckpointRestart/
Posted Aug 4, 2011 2:20 UTC (Thu)
by zander76 (guest, #6889)
[Link] (3 responses)
Your forgetting one major aspect and that is MONEY. It costs money to do all of these things.
The other aspect your not thinking about is that NEW systems might be clustered but what about all the old systems that have been running for years and haven't been touched because they worked so why mess with it.
I worked at a telephone company that had a system running that the admin had been fired 2 years before and he was the only person that still had root access. The box continued to do its job and everybody simply looked the other way ( it had no outside access that we knew of ).
Posted Aug 4, 2011 9:37 UTC (Thu)
by elanthis (guest, #6227)
[Link] (2 responses)
I for one was rather in favor of some of the old (dead) proposals a few distros had about having a "kill switch" in the init scripts that would refuse to allow Internet connectivity if the OS release date (from /etc/foo-release) was more than X years old.
We'd all be a lot better off if Microsoft had done this for Win9x/ME. ;)
Posted Aug 4, 2011 18:51 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Aug 5, 2011 3:44 UTC (Fri)
by elanthis (guest, #6227)
[Link]
My solution would basically be to print a "you're a moron who should've upgraded two major releases ago, you don't deserve to be on the Internet" message. Granted, I'm also the kind of person who thinks that cops should pull over people who don't use their turn signals, ask them for their license and registration, and then toss both into the nearest ditch while telling the driver that he's not competent enough to be allowed the privilege of operating a two ton steel missile on wheels... ;)
Posted Jul 27, 2011 11:13 UTC (Wed)
by jamesh (guest, #1159)
[Link] (2 responses)
It would only be a problem if you wanted to use the patents under less restrictive terms, or reimplement the idea from scratch.
Posted Jul 27, 2011 19:28 UTC (Wed)
by jhhaller (guest, #56103)
[Link] (1 responses)
Telephony systems have long had the capability to support some limited changes, but typically required data structures to have some extra space in case it was needed, and typically used indirect function calls so that the pointer could be changed. Creating patches for these systems would require manual allocation of formally free space. This is the prior art which seems to cover most if not all of the ksplice kernel functionality.
Some of the problem of structure resizing is averted by staying within a Redhat 5 or 6 or CentOS 5 or 6, as kernel data structure changes are minimized, as kernel modules generally don't need to be recompiled between 5.X versions. I doubt there is a ksplice patch to go from 5.X to 6.X, and moving between arbitrary kernel versions is also likely to be difficult.
Posted Jul 28, 2011 3:30 UTC (Thu)
by jamesh (guest, #1159)
[Link]
This is different to cases where you have GPL software that infringes third party patents.
Posted Jul 27, 2011 13:26 UTC (Wed)
by dsommers (subscriber, #55274)
[Link] (2 responses)
So how will CentOS be able to provide ksplice updates *in addition* to their pending updates? And especially if they try to provide this for 6.x kernels, where there are no patch-by-patch files to work with.
I honestly believe CentOS should rather focus now on getting their current supported releases updated and improve their working methods to make releases go more smoothly before looking at something like ksplice.
Seriously! ScientificLinux have until now managed to complete there 6.0, 5.6 and 6.1_rc1, with a pending 6.1 release within the coming couple of weeks (or even earlier). CentOS have barely managed to get 6.0 out-the-door, with 5.6 a bit earlier. On their announce-list there is not much CentOS 6 updates by now [2], even though ScientificLinux have had several updates [3].
Unless CentOS manages to improve their update processes, to make them more efficient, I have not faith in CentOS being able to support ksplice in addition. It's a great goal, but I don't believe it's currently possible.
Why so harsh? I consider ksplice a poor-man's cluster solution, for users who don't have a cluster to work with. If you have services which needs to be running 24/7, you don't do that on a single box, you have a cluster where you can boot each node individually in the mean time. And such sites who can afford proper clusters, most likely can afford RHEL or OEL.
ksplice is great for short term highly critical updates, to fix an exploit instantly and to get the proper fixed kernel booted when the maintenance window appears. If CentOS continues to be slow at their kernel updates to start with, there is little benefit of having ksplice to start with. This can be twisted, if CentOS manages to get ksplice updates released quicker than normal kernel updates. But in that case, I would claim their priorities is not really well aligned with the vast majority of their users.
[1] http://en.wikipedia.org/wiki/CentOS#Release_history
Posted Jul 29, 2011 2:25 UTC (Fri)
by sciurus (guest, #58832)
[Link] (1 responses)
http://www.standalone-sysadmin.com/~matt/centos-sl-delays...
Posted Jul 29, 2011 13:08 UTC (Fri)
by dag- (guest, #30207)
[Link]
Unfortunately, CentOS makes it quite obvious that they don't plan to open up the CentOS rebuild effort, so we officially don't know what caused the 5.6 release to slip 3 months, or the 6.0 release to be 8 months behind. (QA had been waiting for several weeks before anything was there to QA, and even then known fixes were slow to materialize).
http://lists.centos.org/pipermail/centos/2011-May/111670....
Having only 3 people know how to do and perform this rebuild effort is a liability, not just for getting security releases out the door.
Posted Jul 28, 2011 11:34 UTC (Thu)
by robbe (guest, #16131)
[Link]
To my non-lawyerly mind, it most certainly does. Which means that not only has source to be provided, but the receiver of the patch stream is able to pass it on verbatim.
I guess Oracle will try to keep that from happening via the same tricks that Red Hat uses for its update service (see https://lwn.net/Articles/430098/), the shoe being on the other foot this time.
Posted Jul 29, 2011 13:13 UTC (Fri)
by dag- (guest, #30207)
[Link]
http://www.redhat.com/archives/rhelv5-list/2011-July/msg0...
about Oracle being a licensee of the Open-Invention-Network patent portfolio would mean that there are no legal threats for a similar (Open Source) implementation.
http://en.wikipedia.org/wiki/Open_Invention_Network
So the patents are likely no real issue to anyone in the community ?
Posted Jul 30, 2011 0:41 UTC (Sat)
by sbergman27 (guest, #10767)
[Link] (2 responses)
Ubuntu LTS would actually be a good platform for such a service. Depending upon the requirements of the vendor of the software that is intended to be "highly available". If the vendor requires RHEL or Oracle Linux (or whatever they are calling it these days) for support, then you're screwed whether you are using Ubuntu LTS, or CentOS, or Scientific Linux.
Posted Aug 1, 2011 19:10 UTC (Mon)
by cdmiller (guest, #2813)
[Link] (1 responses)
Posted Aug 2, 2011 0:23 UTC (Tue)
by jspaleta (subscriber, #50639)
[Link]
Are you a paying Canonical support customer in some fashion? Subscriber tp Landscape management services or perhaps paying for "Ubuntu Advantage" support with the Landscape service bundled in?
-jef
Posted Jul 30, 2011 17:04 UTC (Sat)
by sbergman27 (guest, #10767)
[Link] (2 responses)
But still, I don't see a ksplice channel being attractive to folks who might require it until the CentOS project at least does a thorough and prolonged job of demonstrating that they are alive and dependable.
Sincerely,
Posted Aug 5, 2011 3:11 UTC (Fri)
by slashdot (guest, #22014)
[Link] (1 responses)
However the GPLv2 clearly says:
So if RHEL can't be readily build, then informing both the FSF GPL violation contacts and Red Hat legal could be effective.
Posted Aug 5, 2011 4:00 UTC (Fri)
by elanthis (guest, #6227)
[Link]
What the CentOS folks are complaining about is that simply having a bunch of SRPMs does not make it particularly easy to build them all in the right order and to install them correctly to bootstrap a working OS image comprised of all those individual GPLed components. Requiring that would be similar to requiring that you provide the rsync scripts you use to mirror ftp.gnu.org so that end-users of your mirror can recreate it.
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
> You need to do it only after network communication and/or disk IO.
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Ah, but you don't need to keep RAM coherent after each operation.
You need to do it only after network communication and/or disk IO.
It depends on how the application is written.
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Who will want rebootless upgrades?
Ksplice and CentOS
Ksplice and CentOS
Ksplice and CentOS
Ksplice and CentOS
[2] http://thread.gmane.org/gmane.linux.centos.announce/
[3] http://listserv.fnal.gov/scripts/wa.exe?A1=ind1107&L=... (Look for SL6.x subjects)
Ksplice and CentOS
Ksplice and CentOS
GPL
Ksplice and CentOS
Ksplice and CentOS
Ksplice and CentOS
Ksplice and CentOS
Ksplice and CentOS
Steve Bergman
Oklahoma City, OK (A suburb of Tuttle)
Ksplice and CentOS
"For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable"
Ksplice and CentOS
