That is a really interesting writeup. It sounds like a hard problem.
I wonder what the Windows 8 compositor is doing? My experience and what other people say is that it seems to have lower latency. Yet, it seems to be at least as GPU power efficient as Windows 7. It does a good job on media playback too, although I don't have any way to see if it really is writing frames at 24 fps.
I think his best option may be the one he mentioned for Wayland: having the client put a preferred display timestamp on each frame.