User: Password:
|
|
Subscribe / Log in / New account

Ksplice: kernel patches without reboots

Ksplice: kernel patches without reboots

Posted Apr 29, 2008 21:00 UTC (Tue) by danielpf (subscriber, #4723)
Parent article: Ksplice: kernel patches without reboots

For those wanting 100% uptime the dual (or more) computer "high availability" (HA)
configuration already exists.  This technique covers much more possible problems than kernel
patches.  

So one can wonder if the ksplice feature is really useful. It could perhaps change the uptime
from 99.995% to 99.996%, but those wanting 100% would anyway need a HA configuration. 





(Log in to post comments)

Ksplice: kernel patches without reboots

Posted Apr 29, 2008 22:16 UTC (Tue) by gravious (guest, #7662) [Link]

You are quite right.

As has been pointed out here recently, the system as a whole may achieve 100% availability by
using pervasive redundancy but the individual machines in the system won't. The associated PDF
suggests that there may be long lived processes that would be intolerant to any outages due to
their inability to save state properly or would exhibit a certain unhappiness with network
connections dying. One would suggest that a system administrator designing and implementing a
HA environment would make sure that each and every piece of software on each machine could
handle a reboot if necessary.

While I love the idea of hot updating and while I think this implementation is a fine fine
hack worthy of much admiration I would welcome a concrete use case other than, "look I have 5
years, 2 months, 4 hours and 16 minutes uptime".

Ksplice: kernel patches without reboots

Posted Apr 30, 2008 2:52 UTC (Wed) by gdt (subscriber, #6284) [Link]

There are many cases where the redundancy path to high availability is too expensive. Linux running on an closet ethernet switch would be one example. Are you really going to provision a second set of switches and cabling for the average corporate desktop computer?

Ksplice: kernel patches without reboots

Posted Apr 30, 2008 2:57 UTC (Wed) by dlang (subscriber, #313) [Link]

there is not much overlap between the set of uses where cost prohibits having redundant
hardware and the set of uses where a system cannot be down for a reboot.

Many upgrades to Cisco equipment requires a reboot, and they are used in many places that are
extremely sensitive to outages

Ksplice: kernel patches without reboots

Posted May 1, 2008 2:33 UTC (Thu) by a9db0 (subscriber, #2181) [Link]

Here's one: telecom

Big phone switches aren't usually redundant, and are frequently utilized 24x7x365(6).  The
rise in VoIP has brought this into higher relief, and I'd expect to see some of the telecom
folks looking very hard at this.

My uptime is never that good - the power around here is way too flaky.  Even for my oversized
ups.


Ksplice: kernel patches without reboots

Posted May 1, 2008 16:42 UTC (Thu) by piggy (subscriber, #18693) [Link]

The reason big phone switches have traditionally been non-redundant is that they were
staggeringly expensive when first created.

The telecom industry is still absorbing the consequences of a 1000X improvement in both price
and performance. Reliability of individual components has also dropped by a couple orders of
magnitude, so redundancy is becoming the solution of choice.

I agree with earlier assertions that the disjunction between businesses who want long uptimes
and and those willing to put in redundant equipment is vanishingly small.

Perhaps individuals after long uptime for geek-cred are a large enough population to sustain
ksplice.

Ksplice: kernel patches without reboots

Posted Apr 30, 2008 18:14 UTC (Wed) by droundy (subscriber, #4559) [Link]

Indeed.  Another example would be that of scientific computing.  If I've got a job that has
been running for a couple of weeks, and will finish in just a couple weeks more, I'd rather
not reboot the system.  Redundancy gains me nothing (since I'm utilizing all my computing
resources already).  The code could be trained to checkpoint (and some of my code is so
enabled), but that generally has a high cost (in terms of bandwidth and disk use), so you
don't want to checkpoint very often, if at all.

Ksplice: kernel patches without reboots

Posted Apr 30, 2008 20:53 UTC (Wed) by dlang (subscriber, #313) [Link]

any system like this should be isolated anyway, so delaying the security update for a week or
a month to let your job finish should not be a big problem.

remember that if a box is not exposed it doesn't need a security update.

Ksplice: kernel patches without reboots

Posted May 1, 2008 0:22 UTC (Thu) by gdt (subscriber, #6284) [Link]

any system like this should be isolated anyway

In practice that's increasingly difficult. Datasets are growing so large that the last thing you want is two copies of them, so you end up with the input data being remotely hosted and pulled across the Internet on demand. It's this sort of use that the academic community created the Internet for.

The other problem with scientific computing is simply that I might not want to reboot the system at this moment. Imagine that I've concurrently booked four radiotelescopes, which is about a six-month wait. I've got them streaming into my processing cluster. A security patch arrives. If I apply the patch and reboot then I lose resolution, and thus my experiment may be inconclusive. If I don't apply the patch and the machine is subverted then there are data integrity issues and again the experiment is inconclusive. In both cases I wait another six months and try again. My favoured choice would be to apply the patch whilst still running the telescope correlation.

I'm not saying the ksplice is the best thing since sliced bread. But it does have some use, particularly outside of the typical server application that Linux is generally used for.

Ksplice: kernel patches without reboots

Posted May 1, 2008 0:36 UTC (Thu) by dlang (subscriber, #313) [Link]

if you don't have more then one copy of your data you run the serious risk of loosing it. 

even for huge datasets, it's cheaper to keep an extra copy then to recreate the data.

I'm not saying that ksplice is worthless, I'm disagreeing with the idea that was posted that
it's required for these situations.

Ksplice: kernel patches without reboots

Posted May 1, 2008 12:57 UTC (Thu) by nix (subscriber, #2304) [Link]

Yeah, but using an extra copy for failover requires that it be online 
*now*. Using an extra copy for redundancy only does not require that (and 
is much cheaper: how will you keep an extra online copy of the ATLAS 
detector's collected data? It's far too large to keep even *one* copy at 
any one site: keeping an extra online copy means doubling the size of an 
already large collaboration...)

Ksplice: kernel patches without reboots

Posted May 8, 2008 11:40 UTC (Thu) by anandsr21 (guest, #28562) [Link]

Do you know how much data Google keeps. And they keep three copies not too. And in
Geographically separated locations. So the solution is essentially to make multiple copies.
Actually as Google has shown even two copies are not enough.

Ksplice: kernel patches without reboots

Posted May 1, 2008 13:04 UTC (Thu) by richardr (guest, #14799) [Link]

But the point about academic workloads is that often we use every desktop in the department as
a distributed supercomputer, so the nodes are both exposed to every possible attack because
people want their desktops accessible from outside (at least via ssh) and want to be able to
surf the web, and may be running background jobs for weeks at a time belonging to other people
who don't want them to be restarted. The conflict between these two factors is where this kind
of technology becomes important.

Ksplice: kernel patches without reboots

Posted May 1, 2008 21:20 UTC (Thu) by dlang (subscriber, #313) [Link]

if you are running on random desktops that are used for other things, your software had better
be able to handle reboots/crashes/power outages anyway as those events will happen.

while I see some use for live patching, I really don't see where it becomes a killer feature

Orthogonal elements

Posted Apr 29, 2008 22:26 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

These seem like orthogonal elements of a HA systems approach.

You seem to equate "there are two boxes and some HA software" with 100% availability, but
obviously it doesn't really deliver that, otherwise the phrase "belt, braces and skyhook"
wouldn't exist.

Suppose you have a two box HA system, and one day you take down box A to upgrade the kernel.
But just short while into this procedure a tiny capacitor blows up inside box B, and the smoke
leaks out. Now you have zero boxes, and your HA solution has failed.

If, in contrast, you had moved the workload from box A to box B not so that you could reboot
box A into a new kernel, but just so that it could be patched using ksplice while quiescent,
then you'd survive this disaster with only mild inconvenience (box B having failed, box A
would suddenly become very busy, most likely causing your ksplice to fail)

Availability is a trade, usually against some combination of hardware investment,
administrative overheads and system performance. For some people ksplice could let them spend
a bit less money and get the same availability.

Orthogonal elements

Posted Apr 30, 2008 6:52 UTC (Wed) by NAR (subscriber, #1313) [Link]

Obviously you can't ever achieve 100% availability - what if the Sun turns into supernova and
burns the whole Earth? Of course, this has very low probability, but I believe that hardware
failure on the live machine while the standby is being upgraded is also a pretty low
probability event. 

The other important thing: 8 fixes out of 50 couldn't be applied this way - in other words:
the user will have to reboot anyway sooner or later. So why not design the system in a way
that it could deal with reboots without service outage, if it's really important?

Orthogonal elements

Posted Apr 30, 2008 12:20 UTC (Wed) by nix (subscriber, #2304) [Link]

Because it may be really *expensive*.

Fundamentally, if ever a system is taken out of service, whatever redundancy mechanisms may be
in place may happen to fail then. Thus it is wise to reduce the time for which a machine is
out of service. ksplice can, for a large proportion of smaller patches, reduce this period to
a fraction of a second, with no loss of non-persistent state.

And that seems worthwhile to me.

Orthogonal elements

Posted Apr 30, 2008 12:48 UTC (Wed) by NAR (subscriber, #1313) [Link]

Because it may be really *expensive*.

I believe in this case there's no free lunch. I think if a user can't accept a couple of minutes downtime for a reboot, but isn't willing to invest in spare hardware won't really get high availability, only a sense of high availabilty.

I've checked CVE and found 10 linux kernel problems for this year. At this rate there could be 30 such problems in a year - if all problems need a reboot, than that's about 60-90 minutes of downtime. I guess a hardware failure would lead to a longer downtime (it takes time to get the new hardware) and a hardware failure isn't schedulable to off-business or off-peak hours (unlike the kernel change).

To me, this solution looks to be more like a toy (or a weird hack, as it was put earlier in this thread). It's nice, but I wouldn't use it for real.

Orthogonal elements

Posted May 1, 2008 3:56 UTC (Thu) by himi (guest, #340) [Link]

60-90 minutes of downtime for thirty reboots? Hell, I couldn't guarantee that with the tiny
little Dell PE860s I'm working with at the moment - it'd be more like 300 minutes for some of
the larger and more interesting hardware configurations we use in my organisation. Or far
more, when one of those reboots includes an fsck of a multi-terabyte filesystem.

ksplice sounds like a weird kludge at first but I can actually see it being very useful,
particularly for systems running unmodified vendor kernels. It might fail occasionally (and a
1 in 5 failure rate is pretty good for something like this) but that still means 80% fewer
reboots.

himi

Orthogonal elements

Posted Apr 30, 2008 12:26 UTC (Wed) by pboddie (guest, #50784) [Link]

Obviously you can't ever achieve 100% availability - what if the Sun turns into supernova and burns the whole Earth?

You'll be glad you had that redundant Sun and redundant Earth floating around! ;-)

Orthogonal elements

Posted Apr 30, 2008 23:16 UTC (Wed) by nix (subscriber, #2304) [Link]

Hence the term `hot spare'...

Orthogonal elements

Posted May 1, 2008 4:18 UTC (Thu) by lysse (guest, #3190) [Link]

> Obviously you can't ever achieve 100% availability - what if the Sun turns into supernova
and burns the whole Earth? Of course, this has very low probability

...not to mention that if your clients' main concern in such an eventuality is the
discontinuation of their service, there are a few thousand UFOlogists out there who are going
to want to examine your contract very closely...

Orthogonal elements

Posted Apr 30, 2008 14:36 UTC (Wed) by Los__D (guest, #15263) [Link]

Dammit, we lost the magic smoke again.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds