LWN.net Logo

TCP connection hijacking and parasites - as a good thing

TCP connection hijacking and parasites - as a good thing

Posted Aug 13, 2011 5:00 UTC (Sat) by dlang (✭ supporter ✭, #313)
In reply to: TCP connection hijacking and parasites - as a good thing by raven667
Parent article: TCP connection hijacking and parasites - as a good thing

no, if the password gets changed on a remote machine and the time warp happens, the password is now known by no system.

even if they were smart enough to not change the remote password until they stored it (and then what happens if they can't change the password? they did _not_ keep a record of old passwords), I've seen too many cases where a dieing system scribbles on the disk to be comfortable with the idea of betting that such a thing does not happen for access to my servers.

the company abandoned the product (shut it down, didn't even sell it off) about 9 months later, so I don't think I was the only person asking such hard questions of them.


(Log in to post comments)

TCP connection hijacking and parasites - as a good thing

Posted Aug 13, 2011 22:37 UTC (Sat) by raven667 (subscriber, #5198) [Link]

I think you misunderstood my point, I wasnt clear enough. I do not disagree that in a system such as that that data loss from a rolled back transaction record of a password change during crash recovery would be very problematic I only disagree that the introduction of a vm environment meaningfully changes that risk one way or another.

I am surprised that they couldn't give you a better answer though, I can think of three or four ways to mitigate that risk off the top of my head. Maybe their sales force was undertrained or their engineers were chuckleheads.

TCP connection hijacking and parasites - as a good thing

Posted Aug 13, 2011 22:46 UTC (Sat) by raven667 (subscriber, #5198) [Link]

One other thing.. Unless you are taking snapshots of a running vm and reverting back to them I do not think you can have the "time warp" you describe. Failing to commit changes to permanent storage is a different issue.

TCP connection hijacking and parasites - as a good thing

Posted Aug 13, 2011 22:55 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

the thing is, that's exactly what vmware does when it migrates machines, it takes snapshots of the VM periodically, and then if a system dies, boots the latest snapshot on another system.

this works quite well if you aren't sensitive to time warps, and everything is in one geographic location so that you can have the snapshots stored on shared disks.

if you are sensitive to time warps, and you need to allow for your recovery to be in a different geographic location (so that there is a significant lag in the storage changes getting replicated to the new datacenter) then the approach of just saying "vmware solves our HA/DR issues, we don't need to care about them at an application design level" is a very bad approach to take.

TCP connection hijacking and parasites - as a good thing

Posted Aug 15, 2011 20:50 UTC (Mon) by raven667 (subscriber, #5198) [Link]

I think you are confused and talking about two different things. If a system dies, then the VMs which were running on it crash and are booted on other systems in the cluster, there are no snapshots or time warps involved. This has nothing to do which live migration of machines which does not in any way make machines more highly available except that one manually live migrate off of one host in a cluster to do maintenance on it. Live migration does not cause time warps, at best the system is unavailable on the network for a few moments while the switches relearn what port its MAC address is coming from.

Of course, saying that VM solves our HA/DR problems like so much secret sauce is BS without making the application resilient to the same kind of problems that can happen to bare metal such as unexpected crashes and some way to handle DR which is very different than HA.

Actually VMware does have a feature now called Fault Tolerance where a VM is run in lockstep on two different nodes in a cluster. They are kept running the same instruction for instruction AFAIK so there would be no time warp during a failover event, there are some product demos on youtube showing them playing a video file over a remote desktop and it not skipping a frame while failing over. That could work as an HA solution but would do nothing for DR.

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 0:08 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

I fully agree that for live migration when nothing is wrong, virtual machines (and containers with checkpoint/restore) can be very nice.

In this case the vendor was claiming that this also solved crash problems because vmware would just restart the application on the other server exactly where it left off when the first server crashed.

I doubt that the vmware Fault Tolerance would work for any system that used random numbers as I don't see how they could find all the places that generated or used the random numbers and make both machines have the exact same results.

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 20:36 UTC (Tue) by robbe (guest, #16131) [Link]

> In this case the vendor was claiming that this also solved crash problems because vmware would just restart the application on the other server exactly where it left off when the first server crashed.

VMware's HA feature just restarts the VM affected by a server crash on another ESX server. From the perspective of the VM it looks like a spontaneous reboot. This guarantees minimum downtime, but does not make the crash invisible to the application.

> I doubt that the vmware Fault Tolerance would work for any system that used random numbers [...]

Remember that all the HW excepting the CPU is emulated. Fault Tolerance works by replicating the external events (e.g. an incoming network packet) seen by the original VM at the same point in time on the backup VM. Usual sources of entropy will be equal to both machines.

Maybe someone could deliberately construct a program that runs differently in the original and the backup VM. The only thing I can think of right now is measuring time via a covert channel (the normal means like RDTSC are covered, of course) -- but this is too vague still to make into an "exploit".

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 20:52 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

given that the entropy sources include nanosecond timeing, and if the CPU supports it, thermal noise on the CPU, I _really_ doubt that the resulting random numbers would be the same on the two systems.

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 22:12 UTC (Tue) by raven667 (subscriber, #5198) [Link]

I think you underestimate the state of the art. Considering how often modern systems use random numbers I can't imagine this case not being handled, by disabling any hardware mechanism that could introduce non-determinism at the very least.

Looking at the VMware FT docs the hypervisor for the primary VM very thoroughly records anything that could change state and does not pass it through to the primary until it is transmitted and receipt acknowledged by the hypervisor for the secondary VM. Features such as SMP or hardware MMU are disabled as their state can't be recorded and could introduce non-determinism. Each event is injected into the secondary at the same execution point. That certainly has to work with nanosecond timing, from the point view from inside the secondary VM. According to wall clock time the secondary will always be lagging behind, the demos show lag in the millisecond range, but because events are recorded it can be brought up to current during a failover event, so no state should be lost.

If you are interested you may want to do a little research on the topic, on your own. When I get this set up in my test environment I'll definitely run though creating ssh keys and whatnot to validate my understanding that this does work.

TCP connection hijacking and parasites - as a good thing

Posted Aug 17, 2011 9:25 UTC (Wed) by Lennie (subscriber, #49641) [Link]

Does that mean you would expect it to work, even if I use something like http://www.issihosts.com/haveged/ as one of the sources of random ?

TCP connection hijacking and parasites - as a good thing

Posted Aug 17, 2011 13:24 UTC (Wed) by raven667 (subscriber, #5198) [Link]

That's what I would expect. I'd love to get my test environment straightened out and then I could determine one way or another.

TCP connection hijacking and parasites - as a good thing

Posted Aug 13, 2011 22:48 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

the 'correct' answer (and what competing products do) is to have multiple servers, and have the data replicated to all the servers.

it's fairly common (although complex) at the application level to implement updates of small amounts of information across a geographically distributed cluster of machines (lookup two phase commit). but if you rely on the entire VM state being synced, there is just too much unnecessary data involved to keep things synced in real time.

TCP connection hijacking and parasites - as a good thing

Posted Aug 15, 2011 21:00 UTC (Mon) by raven667 (subscriber, #5198) [Link]

One thing I thought of when mentally designing a system as you described is that you would both need to have complete history but also a large window of near-future passwords. If you restore a host from last year, you need to be able to log into it and if you restore last week's backup of your password management system it needs to know what the current passwords would be otherwise you have no DR, just HA.

TCP connection hijacking and parasites - as a good thing

Posted Aug 16, 2011 0:09 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

yes, you do need to have a record of the historic passwords as well.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds