User: Password:
|
|
Subscribe / Log in / New account

Orthogonal elements

Orthogonal elements

Posted Apr 29, 2008 22:26 UTC (Tue) by tialaramex (subscriber, #21167)
In reply to: Ksplice: kernel patches without reboots by danielpf
Parent article: Ksplice: kernel patches without reboots

These seem like orthogonal elements of a HA systems approach.

You seem to equate "there are two boxes and some HA software" with 100% availability, but
obviously it doesn't really deliver that, otherwise the phrase "belt, braces and skyhook"
wouldn't exist.

Suppose you have a two box HA system, and one day you take down box A to upgrade the kernel.
But just short while into this procedure a tiny capacitor blows up inside box B, and the smoke
leaks out. Now you have zero boxes, and your HA solution has failed.

If, in contrast, you had moved the workload from box A to box B not so that you could reboot
box A into a new kernel, but just so that it could be patched using ksplice while quiescent,
then you'd survive this disaster with only mild inconvenience (box B having failed, box A
would suddenly become very busy, most likely causing your ksplice to fail)

Availability is a trade, usually against some combination of hardware investment,
administrative overheads and system performance. For some people ksplice could let them spend
a bit less money and get the same availability.


(Log in to post comments)

Orthogonal elements

Posted Apr 30, 2008 6:52 UTC (Wed) by NAR (subscriber, #1313) [Link]

Obviously you can't ever achieve 100% availability - what if the Sun turns into supernova and
burns the whole Earth? Of course, this has very low probability, but I believe that hardware
failure on the live machine while the standby is being upgraded is also a pretty low
probability event. 

The other important thing: 8 fixes out of 50 couldn't be applied this way - in other words:
the user will have to reboot anyway sooner or later. So why not design the system in a way
that it could deal with reboots without service outage, if it's really important?

Orthogonal elements

Posted Apr 30, 2008 12:20 UTC (Wed) by nix (subscriber, #2304) [Link]

Because it may be really *expensive*.

Fundamentally, if ever a system is taken out of service, whatever redundancy mechanisms may be
in place may happen to fail then. Thus it is wise to reduce the time for which a machine is
out of service. ksplice can, for a large proportion of smaller patches, reduce this period to
a fraction of a second, with no loss of non-persistent state.

And that seems worthwhile to me.

Orthogonal elements

Posted Apr 30, 2008 12:48 UTC (Wed) by NAR (subscriber, #1313) [Link]

Because it may be really *expensive*.

I believe in this case there's no free lunch. I think if a user can't accept a couple of minutes downtime for a reboot, but isn't willing to invest in spare hardware won't really get high availability, only a sense of high availabilty.

I've checked CVE and found 10 linux kernel problems for this year. At this rate there could be 30 such problems in a year - if all problems need a reboot, than that's about 60-90 minutes of downtime. I guess a hardware failure would lead to a longer downtime (it takes time to get the new hardware) and a hardware failure isn't schedulable to off-business or off-peak hours (unlike the kernel change).

To me, this solution looks to be more like a toy (or a weird hack, as it was put earlier in this thread). It's nice, but I wouldn't use it for real.

Orthogonal elements

Posted May 1, 2008 3:56 UTC (Thu) by himi (guest, #340) [Link]

60-90 minutes of downtime for thirty reboots? Hell, I couldn't guarantee that with the tiny
little Dell PE860s I'm working with at the moment - it'd be more like 300 minutes for some of
the larger and more interesting hardware configurations we use in my organisation. Or far
more, when one of those reboots includes an fsck of a multi-terabyte filesystem.

ksplice sounds like a weird kludge at first but I can actually see it being very useful,
particularly for systems running unmodified vendor kernels. It might fail occasionally (and a
1 in 5 failure rate is pretty good for something like this) but that still means 80% fewer
reboots.

himi

Orthogonal elements

Posted Apr 30, 2008 12:26 UTC (Wed) by pboddie (guest, #50784) [Link]

Obviously you can't ever achieve 100% availability - what if the Sun turns into supernova and burns the whole Earth?

You'll be glad you had that redundant Sun and redundant Earth floating around! ;-)

Orthogonal elements

Posted Apr 30, 2008 23:16 UTC (Wed) by nix (subscriber, #2304) [Link]

Hence the term `hot spare'...

Orthogonal elements

Posted May 1, 2008 4:18 UTC (Thu) by lysse (guest, #3190) [Link]

> Obviously you can't ever achieve 100% availability - what if the Sun turns into supernova
and burns the whole Earth? Of course, this has very low probability

...not to mention that if your clients' main concern in such an eventuality is the
discontinuation of their service, there are a few thousand UFOlogists out there who are going
to want to examine your contract very closely...

Orthogonal elements

Posted Apr 30, 2008 14:36 UTC (Wed) by Los__D (guest, #15263) [Link]

Dammit, we lost the magic smoke again.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds