LWN.net Logo

Who will want rebootless upgrades?

Who will want rebootless upgrades?

Posted Jul 27, 2011 1:39 UTC (Wed) by dlang (✭ supporter ✭, #313)
In reply to: Who will want rebootless upgrades? by mleu
Parent article: Ksplice and CentOS

VM failover doesn't actually work well if the system really fails, it only works well for planned outages.

it's impossible for the entire system state to be kept in sync on another machine in real-time (the bandwidth to RAM is _much_ higher than the bandwidth to the network), so if your system crashes you have state that has not been replicated.

when doing a planned outage, you can pause the system while you sync state and do a seamless failover.

but in an unplanned failure, you are out of luck.


(Log in to post comments)

Who will want rebootless upgrades?

Posted Jul 27, 2011 4:58 UTC (Wed) by dtlin (✭ supporter ✭, #36537) [Link]

With the wider availability of 10Gb Ethernet links now, it's not that far outpaced by main memory bandwidth.

Stratus Avance keeps full system state synchronized between two nodes in realtime, or something close to it; I think it was a year or two ago when I first saw a demo of unplanned hardware outage causing software failover with only a few seconds of downtime.

Who will want rebootless upgrades?

Posted Jul 27, 2011 13:12 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Ah, but you don't need to keep RAM coherent after each operation. You need to do it only after network communication and/or disk IO.

It's almost like quantum mechanics: until you observe a computer its status is indeterminate. So you just keep the master computer running and replicate RAM changes only after it communicates with something external. If the master computer fails then the slave will have the state of the master after the last IO operation and will just continue from that point.

This actually works surprisingly well in practice. We use it to run legacy QNX software inside a XEN cluster.

Who will want rebootless upgrades?

Posted Jul 27, 2011 16:22 UTC (Wed) by raven667 (subscriber, #5198) [Link]

That must be how the VMware Fault Tolerance feature works as well as it is able to keep two VMs running in lock step across different pieces of hardware with low latency failover if one of the hardware dies. In any event the point is that this feature is not impossible and exists in shipping software right now.

Who will want rebootless upgrades?

Posted Jul 28, 2011 12:59 UTC (Thu) by nix (subscriber, #2304) [Link]

Quite so. The ReVirt reversible VM worked this way, but it was a research project and died :(

Who will want rebootless upgrades?

Posted Jul 28, 2011 21:35 UTC (Thu) by cmccabe (guest, #60281) [Link]

> Ah, but you don't need to keep RAM coherent after each operation.
> You need to do it only after network communication and/or disk IO.

It depends on how the application is written. And given that fsync still gets a blank stare from most application developers, I'm not optimistic.

Database-backed apps might have a little bit more of a chance, because database designers thought about these issues. But most database replication is still semi-synchronous rather than synchronous, because people don't want to pay the speed cost. There are also clustered databases and database engines, but they are the exception rather than the rule.

Who will want rebootless upgrades?

Posted Jul 28, 2011 22:17 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

one word:

mmap

Who will want rebootless upgrades?

Posted Aug 3, 2011 23:15 UTC (Wed) by cmccabe (guest, #60281) [Link]

mmap is great, but without calling msync from time to time, you have no guarantee that the data in memory matches what is on the disk. If you crash before the kernel chooses to write those pages back to disk, that work is gone.

Who will want rebootless upgrades?

Posted Aug 5, 2011 22:07 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

my point is that if you use mmap you don't know when a write to disk takes place. it could be well before you call msync

Who will want rebootless upgrades?

Posted Jul 29, 2011 17:38 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

Ah, but you don't need to keep RAM coherent after each operation. You need to do it only after network communication and/or disk IO.
It depends on how the application is written.

I don't see how. The principle seems sound; how would you write an application that won't fail over properly without multiple synchronizations of memory between I/Os?

The only thing I can think of is something where time itself is part of the function -- e.g. if a query has to be answered within 30 seconds and the primary has been working on it for 25 seconds before he dies, the backup won't be able to respond within 30 seconds of the query unless he knows what the primary computed between the query and the failure.

Who will want rebootless upgrades?

Posted Jul 29, 2011 19:27 UTC (Fri) by raven667 (subscriber, #5198) [Link]

The secondary host is calculating the result at the same time as the primary using the same data. Maybe this demonstration will help make more sense.

http://www.youtube.com/watch?v=NCMMwGC0hD8

Fast forward to the 4-5m mark if you want to skip the explanation.

Who will want rebootless upgrades?

Posted Jul 27, 2011 15:15 UTC (Wed) by mleu (subscriber, #73224) [Link]

When you have a single application instance inside a cluster resource and the cluster node fails you also need to restart the application. It will recover faster than having to boot a whole VM, but still, it is a lot easier to encapsulate a application into a VM instead of making it cluster aware.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds