User: Password:
|
|
Subscribe / Log in / New account

TCP connection hijacking and parasites - as a good thing

TCP connection hijacking and parasites - as a good thing

Posted Aug 15, 2011 20:50 UTC (Mon) by raven667 (subscriber, #5198)
In reply to: TCP connection hijacking and parasites - as a good thing by dlang
Parent article: TCP connection hijacking and parasites - as a good thing

I think you are confused and talking about two different things. If a system dies, then the VMs which were running on it crash and are booted on other systems in the cluster, there are no snapshots or time warps involved. This has nothing to do which live migration of machines which does not in any way make machines more highly available except that one manually live migrate off of one host in a cluster to do maintenance on it. Live migration does not cause time warps, at best the system is unavailable on the network for a few moments while the switches relearn what port its MAC address is coming from.

Of course, saying that VM solves our HA/DR problems like so much secret sauce is BS without making the application resilient to the same kind of problems that can happen to bare metal such as unexpected crashes and some way to handle DR which is very different than HA.

Actually VMware does have a feature now called Fault Tolerance where a VM is run in lockstep on two different nodes in a cluster. They are kept running the same instruction for instruction AFAIK so there would be no time warp during a failover event, there are some product demos on youtube showing them playing a video file over a remote desktop and it not skipping a frame while failing over. That could work as an HA solution but would do nothing for DR.


(Log in to post comments)

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 0:08 UTC (Tue) by dlang (subscriber, #313) [Link]

I fully agree that for live migration when nothing is wrong, virtual machines (and containers with checkpoint/restore) can be very nice.

In this case the vendor was claiming that this also solved crash problems because vmware would just restart the application on the other server exactly where it left off when the first server crashed.

I doubt that the vmware Fault Tolerance would work for any system that used random numbers as I don't see how they could find all the places that generated or used the random numbers and make both machines have the exact same results.

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 20:36 UTC (Tue) by robbe (subscriber, #16131) [Link]

> In this case the vendor was claiming that this also solved crash problems because vmware would just restart the application on the other server exactly where it left off when the first server crashed.

VMware's HA feature just restarts the VM affected by a server crash on another ESX server. From the perspective of the VM it looks like a spontaneous reboot. This guarantees minimum downtime, but does not make the crash invisible to the application.

> I doubt that the vmware Fault Tolerance would work for any system that used random numbers [...]

Remember that all the HW excepting the CPU is emulated. Fault Tolerance works by replicating the external events (e.g. an incoming network packet) seen by the original VM at the same point in time on the backup VM. Usual sources of entropy will be equal to both machines.

Maybe someone could deliberately construct a program that runs differently in the original and the backup VM. The only thing I can think of right now is measuring time via a covert channel (the normal means like RDTSC are covered, of course) -- but this is too vague still to make into an "exploit".

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 20:52 UTC (Tue) by dlang (subscriber, #313) [Link]

given that the entropy sources include nanosecond timeing, and if the CPU supports it, thermal noise on the CPU, I _really_ doubt that the resulting random numbers would be the same on the two systems.

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 22:12 UTC (Tue) by raven667 (subscriber, #5198) [Link]

I think you underestimate the state of the art. Considering how often modern systems use random numbers I can't imagine this case not being handled, by disabling any hardware mechanism that could introduce non-determinism at the very least.

Looking at the VMware FT docs the hypervisor for the primary VM very thoroughly records anything that could change state and does not pass it through to the primary until it is transmitted and receipt acknowledged by the hypervisor for the secondary VM. Features such as SMP or hardware MMU are disabled as their state can't be recorded and could introduce non-determinism. Each event is injected into the secondary at the same execution point. That certainly has to work with nanosecond timing, from the point view from inside the secondary VM. According to wall clock time the secondary will always be lagging behind, the demos show lag in the millisecond range, but because events are recorded it can be brought up to current during a failover event, so no state should be lost.

If you are interested you may want to do a little research on the topic, on your own. When I get this set up in my test environment I'll definitely run though creating ssh keys and whatnot to validate my understanding that this does work.

TCP connection hijacking and parasites - as a good thing

Posted Aug 17, 2011 9:25 UTC (Wed) by Lennie (guest, #49641) [Link]

Does that mean you would expect it to work, even if I use something like http://www.issihosts.com/haveged/ as one of the sources of random ?

TCP connection hijacking and parasites - as a good thing

Posted Aug 17, 2011 13:24 UTC (Wed) by raven667 (subscriber, #5198) [Link]

That's what I would expect. I'd love to get my test environment straightened out and then I could determine one way or another.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds