Ranting on the X protocol
Ranting on the X protocol
Posted Jun 2, 2010 23:25 UTC (Wed) by roc (subscriber, #30627)In reply to: Ranting on the X protocol by nix
Parent article: Danjou: Thoughts and rambling on the X protocol
X penalizes the performance of local applications by forcing you to communicate with the X server over a socket where other window systems can do the work with a user-space library, kernel module or customized fast IPC. X effectively is penalizing local applications in order to support network transparency.
But then X bungles network display by being horrendously bandwidth-inefficient and intolerant of latency, especially over WANs. That's why VNC and rdesktop are much more pleasant to use than remote X.
X11: The Worst Of Both Worlds.
Posted Jun 3, 2010 1:00 UTC (Thu)
by jonabbey (guest, #2736)
[Link]
The problem isn't the network transparency.
Posted Jun 3, 2010 9:25 UTC (Thu)
by modernjazz (guest, #4185)
[Link] (27 responses)
This myth has been debunked many times---profiling shows that socket communication accounts for an absolutely negligible fraction of the problem. For local applications, the real issues tend to be inadequacies in the way video hardware is used, which is why this area has received so much attention (but still isn't complete) over the last few years.
I agree with the latency problems over WAN, though. For anything beyond LAN I use NX, which I like much better than VNC. Back when I had a 54k modem, NX was still somewhat usable, whereas nothing else was. NX proves the (perhaps unsurprising) fact that there is some value in a protocol that doesn't just rasterize everything.
Posted Jun 3, 2010 12:02 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (26 responses)
No. Socket communication is fairly slow. X becomes fast if shared memory is used to transfer images - i.e. if you forget about network transparency.
And if you really want to do it, then moving graphics to the kernel (like Windows did) can make your system even faster. I quite distinctly remember that X was unbearably slow compared to Windows in 90-s. Today overhead is pretty small, but it's still there.
Anyway, way forward seems to be client-side direct drawing using OpenGL. In this way X effectively only manages window frames and input events.
Posted Jun 4, 2010 4:15 UTC (Fri)
by rqosa (subscriber, #24136)
[Link] (19 responses)
> Socket communication is fairly slow. Compared to what? Unix domain sockets should have about the same performance as any other type of local IPC, except for shared memory. > X becomes fast if shared memory is used to transfer images - i.e. if you forget about network transparency. But that's beside the point the two posters above were trying to make, which is this: contrary to what "roc" said, X's capability for network transparency does not "[penalize] the performance of local applications". > moving graphics to the kernel (like Windows did) Didn't it move back to userspace in Vista? > can make your system even faster It might reduce the amount of context-switches slightly, but it's bad for system stability and security. > Anyway, way forward seems to be client-side direct drawing using OpenGL. What's wrong with the AIGLX way?
Posted Jun 4, 2010 5:30 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (18 responses)
Windows-style message passing, for example.
"But that's beside the point the two posters above were trying to make, which is this: contrary to what "roc" said, X's capability for network transparency does not "[penalize] the performance of local applications"
But it does. X penalizes local application, it's just that the workaround for this is so ancient that it is though of as normal.
"Didn't it move back to userspace in Vista?"
Partially. They now moved some parts to userspace to reduce number of context switches. Vista still has user32 subsystem in the kernel.
"It might reduce the amount of context-switches slightly, but it's bad for system stability and security."
Ha. X used to be the most insecure part of the OS - several megabytes of code running with root privileges and poking hardware directly.
KMS has finally fixed this by moving some functionality to the kernel, where it belongs.
And AIGLX is a temporary hack, good DRI2 implementation is better.
Posted Jun 4, 2010 7:19 UTC (Fri)
by rqosa (subscriber, #24136)
[Link] (17 responses)
> Windows-style message passing, for example. Do you have numbers from a benchmark to support that statement? And is there any property of the API that makes it inherently faster than the send(2)/recv(2) API? (When using Unix domain sockets, all that send(2)/recv(2) does is to copy bytes from one process to another. How can you make it any faster, except for using shared memory to eliminate the copy-operation?) > X penalizes local application, it's just that the workaround for this is so ancient that it is though of as normal. You're contradicting yourself. If the "workaround" exists, then the "penalty" doesn't exist. > Ha. X used to be the most insecure part of the OS - several megabytes of code running with root privileges and poking hardware directly. > KMS has finally fixed this by moving some functionality to the kernel, where it belongs. One of the major benefits of KMS is that it can get rid of that need for the X server to run as root and access the hardware directly. Moving the windowing system into the kernel, like you're suggesting, would throw away that benefit. > And AIGLX is a temporary hack, good DRI2 implementation is better. How is it "better"?
Posted Jun 4, 2010 8:14 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (16 responses)
Quick-LPC in Windows NT ( http://windows-internal.net/Wiley-Undocumented.Windows.NT... ). It was waaaay better than anything at the time. It's phased out now, because overhead of real LPC is not that big for modern CPUs.
>> X penalizes local application, it's just that the workaround for this is so ancient that it is though of as normal.
No, penalty is right there - in the architecture. It just can be worked around.
>>One of the major benefits of KMS is that it can get rid of that need for the X server to run as root and access the hardware directly. Moving the windowing system into the kernel, like you're suggesting, would throw away that benefit.
I'm not suggesting moving the whole windowing system, it makes no sense _now_.
Vista is 100% correct in its approach (IMO). Each application there, in essence, gets a virtualized graphics card so it can draw everything directly on its surfaces. And OS only manages compositing and IO.
So we get the best of both worlds - applications talk directly to hardware (it might even be possible to make a userland command submission using memory protection on recent GPUs!) and windowing system manages windows.
>> And AIGLX is a temporary hack, good DRI2 implementation is better.
Faster, less context switches, etc. And if you try to accelerate AIGLX - you'll get something isomorphic to DRI2.
Posted Jun 4, 2010 9:58 UTC (Fri)
by rqosa (subscriber, #24136)
[Link] (15 responses)
> Quick-LPC in Windows NT ( http://windows-internal.net/Wiley-Undocumented.Windows.NT... ). It was waaaay better than anything at the time. It's phased out now, because overhead of real LPC is not that big for modern CPUs. The first reason why it says that Quick LPC is faster than regular LPC, "there is a single server thread waiting on the port object and servicing the requests", does not exist for Unix domain sockets. For a SOCK_STREAM socket, you can have a main thread which does accept(2) and then hands each new connection off to a worker thread/process, or you can have multiple threads/processes doing accept(2) (or select(2) or epoll_pwait(2) or similar) simultaneously on the same listening socket (by "preforking"). For a SOCK_DGRAM socket, (unless I'm mistaken) you can have multiple threads/processes doing recvfrom(2) simultaneously on the same socket (again, preforking). As for the second reason, "the context switching between the client thread and the server thread happens in an "uncontrolled" manner", that is also the way it is for Unix domain sockets, but is that really a problem? If "the thread waiting on the signaled event is the next thread to be scheduled", then what would prevent a pair of malicious threads from hogging the CPU by constantly sending messages back and forth? > No, penalty is right there - in the architecture. It just can be worked around. If it's possible for local clients to be fast, then there's no "penalty". ("Penalty" would mean that the capability for clients to be non-local prevents local clients from being fast. But it doesn't, because local clients can use things that remote processes can't, like shared memory, etc.) > Each application there, in essence, gets a virtualized graphics card so it can draw everything directly on its surfaces. And OS only manages compositing and IO. X can work essentially that way too, except that compositing is done by another user-space process (a "compositing window manager"). Each application using OpenGL generates a command stream which is sent to the driver and rendered off-screen, and then the compositing window manager composites the off-screen surfaces together. Alternately, 2D apps can use OpenVG instead of OpenGL (at least, they should be able to fairly soon; is OpenVG supported in current Xorg and current intel/ati/nouveau/nvidia/fglrx drivers?)
Posted Jun 4, 2010 10:18 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (13 responses)
Quick-LPC had special hooks in scheduler. You could hog CPU, of course, but that was not a big deal at that time. It's so no big deal that you still can do this in Windows and up until several years ago in Linux: http://www.cs.huji.ac.il/~dants/papers/Cheat07Security.pdf :)
Also, Quick-LPC had some special hooks that allowed NT to do faster context switches. I remember investigating it for my own purposes - it was wickedly fast, but quite specialized.
"X can work essentially that way too, except that compositing is done by another user-space process (a "compositing window manager")"
And by that time only a hollow shell of X remains. About the same size as Wayland. So it makes sense to ditch X completely and use it only for compatibility with old clients. Like Apple did in Mac OS X.
"Each application using OpenGL generates a command stream which is sent to the driver and rendered off-screen, and then the compositing window manager composites the off-screen surfaces together."
And where's the place for network transparency? Remote GLX _sucks_ big time. It sucks so hopelessly that people want to use server-side rendering instead of trying to optimize it: http://www.virtualgl.org/About/Background
"Alternately, 2D apps can use OpenVG instead of OpenGL (at least, they should be able to fairly soon; is OpenVG supported in current Xorg and current intel/ati/nouveau/nvidia/fglrx drivers?)"
OpenVG is partially supported by Gallium3D.
Posted Jun 4, 2010 11:19 UTC (Fri)
by rqosa (subscriber, #24136)
[Link] (11 responses)
> And by that time only a hollow shell of X remains. If you put it that way, then it's pretty much already the case that "only a hollow shell of X remains"; XRender has replaced the core drawing protocol, and compositing window managers using OpenGL and AIGLX are already in widespread use. So the next step is probably to replace XRender with OpenVG for 2D applications, to offload more of the work onto the GPU. (An OpenVG backend for Cairo has been around for some time already.) And maybe another next step is to have GPU memory protection replace AIGLX, like you suggested, in the case of local clients. > So it makes sense to ditch X completely and use it only for compatibility with old clients. Like Apple did in Mac OS X. An X server that supports OpenGL/OpenVG + off-screen rendering + compositing should be able to perform as well as anything else (when using clients that do drawing with OpenGL or OpenVG), while at the same time retaining backwards compatibility with old clients that use XRender and with even older clients that use the core protocol. So there's no good reason to drop the X protocol. > And where's the place for network transparency? Remote GLX _sucks_ big time. It sucks so hopelessly that people want to use server-side rendering instead of trying to optimize it: http://www.virtualgl.org/About/Background What about 2D apps using OpenVG? If desktop apps that don't require very high graphics performance (for example Konsole, OpenOffice, Okular, etc.) migrate from XRender to OpenVG, then it seems like it would be useful to make the OpenVG command stream network-transparent, because performance should be adequate to run these clients over a LAN. As for remote-side rendering for remote GLX clients: what GLX clients would anyone actually want to run remotely? It seems like most apps that need fast 3D graphics would be run locally.
Posted Jun 4, 2010 12:00 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (10 responses)
Applications skip directly to OpenGL. QT right now even almost works, try running QT applications with "-graphicssystem opengl" switch. GTK has something similar.
"And maybe another next step is to have GPU memory protection replace AIGLX, like you suggested, in the case of local clients."
AIGLX is not necessary already with the open stack drivers which use DRI2. The main reason for AIGLX was impossibility of compositing of DRI1-applications - they pass commands directly to hardware. DRI2 fixed this by allowing proper offscreen rendering and synchronization.
"What about 2D apps using OpenVG? If desktop apps that don't require very high graphics performance (for example Konsole, OpenOffice, Okular, etc.) migrate from XRender to OpenVG, then it seems like it would be useful to make the OpenVG command stream network-transparent, because performance should be adequate to run these clients over a LAN."
Makes no sense, OpenVG is stillborn. It's already obsoleted by GL4 - you can get good antialiased rendering using shaders with double-precision arithmetic.
"As for remote-side rendering for remote GLX clients: what GLX clients would anyone actually want to run remotely? It seems like most apps that need fast 3D graphics would be run locally."
A nice text editor with 3D effects? :)
The problem with X is that it's crufty. It's single-threaded and legacy code has a non-negligible architectural impact (do you know that X.org has an x86 emulator to interpret BIOS code for VESA modesetting? I'm kidding you not: http://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/... ). So IMO it makes a sense to design "X12" protocol to break away from legacy and just run rootless X.org for compatibility.
Posted Jun 4, 2010 12:58 UTC (Fri)
by rqosa (subscriber, #24136)
[Link] (7 responses)
> OpenVG is stillborn. Says who? > It's already obsoleted by GL4 And will ARM-based cell phones and netbooks be able to run OpenGL 4 with good performance? > The problem with X is that it's crufty. It's single-threaded Is there anything inherent in the X protocol that requires an X server to be single-threaded? I doubt it. Also, if the X server no longer does the rendering (replaced by GPU offscreen rendering), then does it really matter if the X server is single-threaded? > and legacy code has a non-negligible architectural impact (do you know that X.org has an x86 emulator to interpret BIOS code for VESA modesetting? I'm kidding you not: http://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/... ). That's probably not used much any more (not used at all with KMS). And if you're already not using it, why should you even care whether it exists?
Posted Jun 4, 2010 13:59 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Me, obviously.
http://www.google.com/search?q=OpenVG
Oh, and Google too.
>> It's already obsoleted by GL4
In a few years - yep. There's nothing in GL4 which makes it intrinsically slow.
>Is there anything inherent in the X protocol that requires an X server to be single-threaded? I doubt it.
And who's going to rewrite X.org? And it matters if server is single-threaded (because of input latency, for example).
And old legacy code in X.org does have its effect. For example, it's not possible to have tiled frontbuffer - because all of the code in X.org has to be rewritten. And X.org is LARGE.
Posted Jun 4, 2010 20:12 UTC (Fri)
by rqosa (subscriber, #24136)
[Link]
> About 45,600 results (0.32 seconds) How do those numbers prove anything? (Incidentally, a search for "Gallium3D" gives only "About 26,300 results".) Also, if OpenVG is useless, then why are Qt and Cairo both implementing it? > And who's going to rewrite X.org? It's being rewritten all the time. Just look at how much has changed since X11R6.7.0. > And it matters if server is single-threaded (because of input latency, for example). I don't remember ever seeing users complaining about the input latency of the current Xorg.
Posted Jun 5, 2010 18:25 UTC (Sat)
by daniels (subscriber, #16193)
[Link]
Posted Jun 8, 2010 17:09 UTC (Tue)
by nix (subscriber, #2304)
[Link] (2 responses)
It was abandoned, because the locking overhead made it *slower* than a singlethreaded server.
Perhaps it is worth splitting the input thread out from a SIGIO handler into a separate thread (last I checked the work in that direction was ongoing). But more than that seems a dead loss, which is unsurprising given the sheer volume of shared state in the X server, all of which must be lock-protected and a lot of which changes very frequently.
Posted Jun 10, 2010 12:37 UTC (Thu)
by renox (guest, #23785)
[Link] (1 responses)
Could you explain?
Posted Jun 14, 2010 20:12 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Jun 8, 2010 17:07 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
I am confused. You seem to be contradicting yourself without need for any help from the rest of us.
Posted Jun 9, 2010 11:07 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
DRI1 passes commands directly to the card, but it can't be composited (no Compiz for you with DRI1).
AIGLX (which is used _without_ DRI1) passes commands through the X-server, but can be composited.
DRI2 passes commands directly to the card and can be composited.
Posted Jun 8, 2010 17:05 UTC (Tue)
by nix (subscriber, #2304)
[Link]
That's hopeless, that is.
Posted Jun 5, 2010 20:09 UTC (Sat)
by renox (guest, #23785)
[Link]
I would argue that for latency purpose the lack of cooperation between the scheduler and current IPCs can really be a problem.
>>then what would prevent a pair of malicious threads from hogging the CPU by constantly sending messages back and forth?<<
Posted Jun 4, 2010 16:54 UTC (Fri)
by jd (guest, #26381)
[Link] (5 responses)
It's easy to say it would be faster, but context switches are expensive and therefore you need to make sure that you place the division between kernel and userspace application at the point where there would be fewest such switches. I've not seen any analysis of the code of this kind, so I'm not inclined to believe anyone actually knows where a split should be incorporated.
Personally, I remember X being considerably faster than Windows in the 1990s. I was using a Viglen 386SX-16 with maths co-processor, and X ran just fine. Windows was a bugbear. I was even able to relay X over the public Internet half-way across Britain and have no problems.
Posted Jun 4, 2010 17:43 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
It's easy to say it would be faster, but context switches are expensive and therefore you need to make sure that you place the division between kernel and userspace application at the point where there would be fewest such switches. I've not seen any analysis of the code of this kind, so I'm not inclined to believe anyone actually knows where a split should be incorporated."
Read about KMS and DRI2, please.
"Personally, I remember X being considerably faster than Windows in the 1990s. I was using a Viglen 386SX-16 with maths co-processor, and X ran just fine. Windows was a bugbear. I was even able to relay X over the public Internet half-way across Britain and have no problems."
On a fast machine - sure. However, Windows 95 could be started on a computer with 3Mb of RAM (official minimal requirement was 4Mb) and ran just fine on 8Mb. Linux with X struggled to start xterm on that configuration.
Posted Jun 4, 2010 19:01 UTC (Fri)
by jonabbey (guest, #2736)
[Link]
Horses for courses.
Posted Jun 4, 2010 19:12 UTC (Fri)
by anselm (subscriber, #2796)
[Link] (1 responses)
Hm. IIRC, in the 1990s, the previous poster's 386SX-16 wasn't a »fast machine« by any stretch of the imagination. It probably would have struggled with Windows 95, considering that Windows 95, according to Microsoft, required at least a 386DX processor (i.e., one with a 32-bit external bus).
No. In 1994 my computer was a 33 MHz 486DX notebook with 8 megabytes of RAM – not a blazingly fast machine even by the standards of the time but what I could afford. I used that machine to write, among other things, a TeX DVI previewer program (like xdvi but better in some respects, worse in others) based on Tcl/Tk, with the interesting parts written in C of course, but even so. There was no problem running that program in parallel with TeX and Emacs, and in spite of funneling all the user interaction through Tcl it felt just as quick as xdvi, even with things like the xdvi-like magnifier window.
Five years before that, a SPARCstation 1 was considered a nice machine to have, and that ran X just fine, thank you very much. I doubt that the machine I was hacking on at the time had anywhere near 8 megs of RAM.
Posted Jun 5, 2010 0:27 UTC (Sat)
by jd (guest, #26381)
[Link]
Yet I was using X11R4, with OpenLook (BrokenLook?). I could open up to 20 EMACS sessions simultaneously before experiencing noticeable degradation in performance. When gaming, I'd have a Netrek server, Netrek client and 19 Netrek AIs running in parallel on the same machine.
Compiling was no big. GateD would sometimes cause OOM to kick in and kill processes, but I don't recall any huge difficulty. It was the machine I wrote my undergraduate degree project on (fractal image compression software).
Mind you, I paid huge attention to setup. The X11 config file was hand-crafted. Everything was hand-compiled - kernel on upwards - to squeeze the best performance out of it. Swap space was 2.5x RAM.
Posted Jun 7, 2010 15:13 UTC (Mon)
by nye (subscriber, #51576)
[Link]
No.
Everyone not working for Microsoft stated a minimum of 16MB to run Windows 95. I actually tried running it on a computer with 8MB of RAM - as long as you had no applications running you might just about be able to use explorer to browse the filesystem, very slowly, but trying to launch an application would cause a 5 minute swap storm.
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
>You're contradicting yourself. If the "workaround" exists, then the "penalty" doesn't exist.
>How is it "better"?
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
>Says who?
About 45,600 results (0.32 seconds)
>And will ARM-based cell phones and netbooks be able to run OpenGL 4 with good performance?
Ranting on the X protocol
For example, it's not possible to have tiled frontbuffer - because all of the code in X.org has to be rewritten. And X.org is LARGE.Ranting on the X protocol
This will probably come as a huge surprise to the >95% of desktop X users (all Intel, most AMD, all NVIDIA beyond G80) who have a tiled frontbuffer.
Ranting on the X protocol
Ranting on the X protocol
The input thread needs to read the display state to pass the events to the correct applications, so there's also a kind of locking which must be done here: wouldn't this create the same issue as before?
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
IMHO, there should be an IPC provided by the OS which would allow a process to say: deliver this message to this other process and run it as a part of my 'runtime quota'.
This wouldn't allow CPU hogging, and would provide lower latency, but note that for this to truly work, you still need that a shared server works on the message provided by the client who give the 'scheduling time' and not on something else..
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
Ranting on the X protocol
On a fast machine - sure.
Windows 95 could be started on a computer with 3Mb of RAM (official minimal requirement was 4Mb) and ran just fine on 8Mb. Linux with X struggled to start xterm on that configuration.
Ranting on the X protocol
Ranting on the X protocol
