> Redirect the app to an offscreen buffer (just like the compositor already does), but instead of rendering it to the screen you instead compress, motion-diff, and encode the data and push it across a channel in your SSH session (just like X already does), and then the remote end decodes and displays the result. Send input events back. Super easy
Yeah, it's entirely feasible to do this with the current X stack (start a head-less X session, run apps in there, and then attach as a compositing manager to get at the pixels in each window and ship them across SSH individually). I use this every day, and the initial version was a two weekend hack.
It turns out that all the hard parts are in dealing with X nonsense -- coordinating app<->WM interaction between the two sides is nasty, you have to juggle this ridiculous stack with your headless X server, and I'm not convinced that it's possible to get keyboard handling really right.
So I agree with those saying that X network transparency is not very interesting anymore -- we can and will accomplish network transparency somehow, and it'll likely be better for not being baked into the gui system itself. OTOH, a lot of that complexity I mention is just intrinsic in the task -- you need conventions for how apps will talk with each other, input will be configured and routed, etc., and that complexity will end up in your protocol one way or another.