Ranting on the X protocol

Posted Jun 4, 2010 7:19 UTC (Fri) by rqosa (subscriber, #24136)
In reply to: Ranting on the X protocol by Cyberax
Parent article: Danjou: Thoughts and rambling on the X protocol

> Windows-style message passing, for example.

Do you have numbers from a benchmark to support that statement? And is there any property of the API that makes it inherently faster than the send(2)/recv(2) API? (When using Unix domain sockets, all that send(2)/recv(2) does is to copy bytes from one process to another. How can you make it any faster, except for using shared memory to eliminate the copy-operation?)

> X penalizes local application, it's just that the workaround for this is so ancient that it is though of as normal.

You're contradicting yourself. If the "workaround" exists, then the "penalty" doesn't exist.

> Ha. X used to be the most insecure part of the OS - several megabytes of code running with root privileges and poking hardware directly.

> KMS has finally fixed this by moving some functionality to the kernel, where it belongs.

One of the major benefits of KMS is that it can get rid of that need for the X server to run as root and access the hardware directly. Moving the windowing system into the kernel, like you're suggesting, would throw away that benefit.

> And AIGLX is a temporary hack, good DRI2 implementation is better.

How is it "better"?

Ranting on the X protocol

Posted Jun 4, 2010 8:14 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

>Do you have numbers from a benchmark to support that statement? And is there any property of the API that makes it inherently faster than the send(2)/recv(2) API? (When using Unix domain sockets, all that send(2)/recv(2) does is to copy bytes from one process to another. How can you make it any faster, except for using shared memory to eliminate the copy-operation?)

Quick-LPC in Windows NT ( http://windows-internal.net/Wiley-Undocumented.Windows.NT... ). It was waaaay better than anything at the time. It's phased out now, because overhead of real LPC is not that big for modern CPUs.

>> X penalizes local application, it's just that the workaround for this is so ancient that it is though of as normal.
>You're contradicting yourself. If the "workaround" exists, then the "penalty" doesn't exist.

No, penalty is right there - in the architecture. It just can be worked around.

>>One of the major benefits of KMS is that it can get rid of that need for the X server to run as root and access the hardware directly. Moving the windowing system into the kernel, like you're suggesting, would throw away that benefit.

I'm not suggesting moving the whole windowing system, it makes no sense _now_.

Vista is 100% correct in its approach (IMO). Each application there, in essence, gets a virtualized graphics card so it can draw everything directly on its surfaces. And OS only manages compositing and IO.

So we get the best of both worlds - applications talk directly to hardware (it might even be possible to make a userland command submission using memory protection on recent GPUs!) and windowing system manages windows.

>> And AIGLX is a temporary hack, good DRI2 implementation is better.
>How is it "better"?

Faster, less context switches, etc. And if you try to accelerate AIGLX - you'll get something isomorphic to DRI2.

Ranting on the X protocol

Posted Jun 4, 2010 9:58 UTC (Fri) by rqosa (subscriber, #24136) [Link] (15 responses)

> Quick-LPC in Windows NT ( http://windows-internal.net/Wiley-Undocumented.Windows.NT... ). It was waaaay better than anything at the time. It's phased out now, because overhead of real LPC is not that big for modern CPUs.

The first reason why it says that Quick LPC is faster than regular LPC, "there is a single server thread waiting on the port object and servicing the requests", does not exist for Unix domain sockets. For a SOCK_STREAM socket, you can have a main thread which does accept(2) and then hands each new connection off to a worker thread/process, or you can have multiple threads/processes doing accept(2) (or select(2) or epoll_pwait(2) or similar) simultaneously on the same listening socket (by "preforking"). For a SOCK_DGRAM socket, (unless I'm mistaken) you can have multiple threads/processes doing recvfrom(2) simultaneously on the same socket (again, preforking).

As for the second reason, "the context switching between the client thread and the server thread happens in an "uncontrolled" manner", that is also the way it is for Unix domain sockets, but is that really a problem? If "the thread waiting on the signaled event is the next thread to be scheduled", then what would prevent a pair of malicious threads from hogging the CPU by constantly sending messages back and forth?

> No, penalty is right there - in the architecture. It just can be worked around.

If it's possible for local clients to be fast, then there's no "penalty". ("Penalty" would mean that the capability for clients to be non-local prevents local clients from being fast. But it doesn't, because local clients can use things that remote processes can't, like shared memory, etc.)

> Each application there, in essence, gets a virtualized graphics card so it can draw everything directly on its surfaces. And OS only manages compositing and IO.

X can work essentially that way too, except that compositing is done by another user-space process (a "compositing window manager"). Each application using OpenGL generates a command stream which is sent to the driver and rendered off-screen, and then the compositing window manager composites the off-screen surfaces together. Alternately, 2D apps can use OpenVG instead of OpenGL (at least, they should be able to fairly soon; is OpenVG supported in current Xorg and current intel/ati/nouveau/nvidia/fglrx drivers?)

Ranting on the X protocol

Posted Jun 4, 2010 10:18 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

"As for the second reason, "the context switching between the client thread and the server thread happens in an "uncontrolled" manner", that is also the way it is for Unix domain sockets, but is that really a problem? If "the thread waiting on the signaled event is the next thread to be scheduled", then what would prevent a pair of malicious threads from hogging the CPU by constantly sending messages back and forth?"

Quick-LPC had special hooks in scheduler. You could hog CPU, of course, but that was not a big deal at that time. It's so no big deal that you still can do this in Windows and up until several years ago in Linux: http://www.cs.huji.ac.il/~dants/papers/Cheat07Security.pdf :)

Also, Quick-LPC had some special hooks that allowed NT to do faster context switches. I remember investigating it for my own purposes - it was wickedly fast, but quite specialized.

"X can work essentially that way too, except that compositing is done by another user-space process (a "compositing window manager")"

And by that time only a hollow shell of X remains. About the same size as Wayland. So it makes sense to ditch X completely and use it only for compatibility with old clients. Like Apple did in Mac OS X.

"Each application using OpenGL generates a command stream which is sent to the driver and rendered off-screen, and then the compositing window manager composites the off-screen surfaces together."

And where's the place for network transparency? Remote GLX _sucks_ big time. It sucks so hopelessly that people want to use server-side rendering instead of trying to optimize it: http://www.virtualgl.org/About/Background

"Alternately, 2D apps can use OpenVG instead of OpenGL (at least, they should be able to fairly soon; is OpenVG supported in current Xorg and current intel/ati/nouveau/nvidia/fglrx drivers?)"

OpenVG is partially supported by Gallium3D.

Ranting on the X protocol

Posted Jun 4, 2010 11:19 UTC (Fri) by rqosa (subscriber, #24136) [Link] (11 responses)

> And by that time only a hollow shell of X remains.

If you put it that way, then it's pretty much already the case that "only a hollow shell of X remains"; XRender has replaced the core drawing protocol, and compositing window managers using OpenGL and AIGLX are already in widespread use. So the next step is probably to replace XRender with OpenVG for 2D applications, to offload more of the work onto the GPU. (An OpenVG backend for Cairo has been around for some time already.) And maybe another next step is to have GPU memory protection replace AIGLX, like you suggested, in the case of local clients.

> So it makes sense to ditch X completely and use it only for compatibility with old clients. Like Apple did in Mac OS X.

An X server that supports OpenGL/OpenVG + off-screen rendering + compositing should be able to perform as well as anything else (when using clients that do drawing with OpenGL or OpenVG), while at the same time retaining backwards compatibility with old clients that use XRender and with even older clients that use the core protocol. So there's no good reason to drop the X protocol.

> And where's the place for network transparency? Remote GLX _sucks_ big time. It sucks so hopelessly that people want to use server-side rendering instead of trying to optimize it: http://www.virtualgl.org/About/Background

What about 2D apps using OpenVG? If desktop apps that don't require very high graphics performance (for example Konsole, OpenOffice, Okular, etc.) migrate from XRender to OpenVG, then it seems like it would be useful to make the OpenVG command stream network-transparent, because performance should be adequate to run these clients over a LAN.

As for remote-side rendering for remote GLX clients: what GLX clients would anyone actually want to run remotely? It seems like most apps that need fast 3D graphics would be run locally.

Ranting on the X protocol

Posted Jun 4, 2010 12:00 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (10 responses)

""If you put it that way, then it's pretty much already the case that "only a hollow shell of X remains"; XRender has replaced the core drawing protocol, and compositing window managers using OpenGL and AIGLX are already in widespread use. So the next step is probably to replace XRender with OpenVG for 2D applications, to offload more of the work onto the GPU. (An OpenVG backend for Cairo has been around for some time already.) "

Applications skip directly to OpenGL. QT right now even almost works, try running QT applications with "-graphicssystem opengl" switch. GTK has something similar.

"And maybe another next step is to have GPU memory protection replace AIGLX, like you suggested, in the case of local clients."

AIGLX is not necessary already with the open stack drivers which use DRI2. The main reason for AIGLX was impossibility of compositing of DRI1-applications - they pass commands directly to hardware. DRI2 fixed this by allowing proper offscreen rendering and synchronization.

"What about 2D apps using OpenVG? If desktop apps that don't require very high graphics performance (for example Konsole, OpenOffice, Okular, etc.) migrate from XRender to OpenVG, then it seems like it would be useful to make the OpenVG command stream network-transparent, because performance should be adequate to run these clients over a LAN."

Makes no sense, OpenVG is stillborn. It's already obsoleted by GL4 - you can get good antialiased rendering using shaders with double-precision arithmetic.

"As for remote-side rendering for remote GLX clients: what GLX clients would anyone actually want to run remotely? It seems like most apps that need fast 3D graphics would be run locally."

A nice text editor with 3D effects? :)

The problem with X is that it's crufty. It's single-threaded and legacy code has a non-negligible architectural impact (do you know that X.org has an x86 emulator to interpret BIOS code for VESA modesetting? I'm kidding you not: http://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/... ). So IMO it makes a sense to design "X12" protocol to break away from legacy and just run rootless X.org for compatibility.

Ranting on the X protocol

Posted Jun 4, 2010 12:58 UTC (Fri) by rqosa (subscriber, #24136) [Link] (7 responses)

> OpenVG is stillborn.

Says who?

> It's already obsoleted by GL4

And will ARM-based cell phones and netbooks be able to run OpenGL 4 with good performance?

> The problem with X is that it's crufty. It's single-threaded

Is there anything inherent in the X protocol that requires an X server to be single-threaded? I doubt it.

Also, if the X server no longer does the rendering (replaced by GPU offscreen rendering), then does it really matter if the X server is single-threaded?

> and legacy code has a non-negligible architectural impact (do you know that X.org has an x86 emulator to interpret BIOS code for VESA modesetting? I'm kidding you not: http://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/... ).

That's probably not used much any more (not used at all with KMS). And if you're already not using it, why should you even care whether it exists?

Ranting on the X protocol

Posted Jun 4, 2010 13:59 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

>> OpenVG is stillborn.
>Says who?

Me, obviously.

http://www.google.com/search?q=OpenVG
About 45,600 results (0.32 seconds)

Oh, and Google too.

>> It's already obsoleted by GL4
>And will ARM-based cell phones and netbooks be able to run OpenGL 4 with good performance?

In a few years - yep. There's nothing in GL4 which makes it intrinsically slow.

>Is there anything inherent in the X protocol that requires an X server to be single-threaded? I doubt it.

And who's going to rewrite X.org? And it matters if server is single-threaded (because of input latency, for example).

And old legacy code in X.org does have its effect. For example, it's not possible to have tiled frontbuffer - because all of the code in X.org has to be rewritten. And X.org is LARGE.

Ranting on the X protocol

Posted Jun 4, 2010 20:12 UTC (Fri) by rqosa (subscriber, #24136) [Link]

> About 45,600 results (0.32 seconds)

How do those numbers prove anything? (Incidentally, a search for "Gallium3D" gives only "About 26,300 results".)

Also, if OpenVG is useless, then why are Qt and Cairo both implementing it?

> And who's going to rewrite X.org?

It's being rewritten all the time. Just look at how much has changed since X11R6.7.0.

> And it matters if server is single-threaded (because of input latency, for example).

I don't remember ever seeing users complaining about the input latency of the current Xorg.

Ranting on the X protocol

Posted Jun 5, 2010 18:13 UTC (Sat) by drago01 (subscriber, #50715) [Link]

http://lists.x.org/archives/xorg-devel/2010-June/009571.html

Ranting on the X protocol

Posted Jun 5, 2010 18:25 UTC (Sat) by daniels (subscriber, #16193) [Link]

For example, it's not possible to have tiled frontbuffer - because all of the code in X.org has to be rewritten. And X.org is LARGE.

This will probably come as a huge surprise to the >95% of desktop X users (all Intel, most AMD, all NVIDIA beyond G80) who have a tiled frontbuffer.

Ranting on the X protocol

Posted Jun 8, 2010 17:09 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

There is nothing intrinsic in the X protocol that requires the server to be single-threaded. In fact, the X Consortium spent a lot of effort in the 1990s making a multithreaded X server.

It was abandoned, because the locking overhead made it *slower* than a singlethreaded server.

Perhaps it is worth splitting the input thread out from a SIGIO handler into a separate thread (last I checked the work in that direction was ongoing). But more than that seems a dead loss, which is unsurprising given the sheer volume of shared state in the X server, all of which must be lock-protected and a lot of which changes very frequently.

Ranting on the X protocol

Posted Jun 10, 2010 12:37 UTC (Thu) by renox (guest, #23785) [Link] (1 responses)

>> Perhaps it is worth splitting the input thread out from a SIGIO handler into a separate thread <<

Could you explain?
The input thread needs to read the display state to pass the events to the correct applications, so there's also a kind of locking which must be done here: wouldn't this create the same issue as before?

Ranting on the X protocol

Posted Jun 14, 2010 20:12 UTC (Mon) by nix (subscriber, #2304) [Link]

Yes, some locking is still necessary, but perhaps not so much that the whole thing would be a dead loss (as happens if e.g. you try to handle each X client with a separate thread in the X server).

Ranting on the X protocol

Posted Jun 8, 2010 17:07 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

Hang on. You're (IMHO correctly) promoting KMS/DRI2 over AIGLX/DRI1 because DRI1 passed commands directly to hardware, which is bad... but you're proposing direct hardware access as a way to fix the (unproven) 'slowness' of X?

I am confused. You seem to be contradicting yourself without need for any help from the rest of us.

Ranting on the X protocol

Posted Jun 9, 2010 11:07 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

No.

DRI1 passes commands directly to the card, but it can't be composited (no Compiz for you with DRI1).

AIGLX (which is used _without_ DRI1) passes commands through the X-server, but can be composited.

DRI2 passes commands directly to the card and can be composited.

Ranting on the X protocol

Posted Jun 8, 2010 17:05 UTC (Tue) by nix (subscriber, #2304) [Link]

Yeah, remote GLX sucks. So hopelessly that I just ran World of Goo remotely (which 2D app uses OpenGL to do its rendering) over a 1Gb LAN, without noticing until I quit that it was remote.

That's hopeless, that is.

Ranting on the X protocol

Posted Jun 5, 2010 20:09 UTC (Sat) by renox (guest, #23785) [Link]

>>As for the second reason, "the context switching between the client thread and the server thread happens in an "uncontrolled" manner", that is also the way it is for Unix domain sockets, but is that really a problem? <<

I would argue that for latency purpose the lack of cooperation between the scheduler and current IPCs can really be a problem.

>>then what would prevent a pair of malicious threads from hogging the CPU by constantly sending messages back and forth?<<
IMHO, there should be an IPC provided by the OS which would allow a process to say: deliver this message to this other process and run it as a part of my 'runtime quota'.
This wouldn't allow CPU hogging, and would provide lower latency, but note that for this to truly work, you still need that a shared server works on the message provided by the client who give the 'scheduling time' and not on something else..