LWN: Comments on "Ekstrand: Plumbing explicit synchronization through the Linux ecosystem"

Explicit synchronization is like systemd?

excors — Sat, 14 Mar 2020 12:54:12 +0000

As I understand it, OpenGL doesn't actually require a flush/fence/etc after each step; it just requires the driver to (mostly) behave as if there was a flush. It's then the driver's job to figure out which flushes can be safely omitted to improve performance. E.g. if the vertices, indices and texture are all stored in separate buffers it clearly doesn't matter which order it copies them into GPU memory, so it can freely rearrange them or copy them all in parallel, though it must still wait for all the copies to complete and must invalidate GPU caches etc before instructing the GPU to start rendering with those buffers.

Sometimes the driver has to be particularly clever to optimise flushes. E.g. if the application wants to modify the vertex data every frame, the driver might secretly allocate a second copy of the vertex buffer so it can do double-buffering. The CPU can be copying vertex data for frame N+1 into the second buffer, while the GPU is still rendering frame N using data from the first buffer.

Generally that works well in practice. OpenGL application developers don't have to think about synchronisation, and still get near-optimal performance in most cases. But it also means the drivers are really complicated, and it's hard for application developers to debug performance issues because all these optimisations are undocumented and hidden inside the driver, and it works decreasingly well as applications get more complex (e.g. mixing GPU compute with rendering).

OpenGL has added a few explicit synchronisation features, like some of the options in https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming . Vulkan makes pretty much all synchronisation explicit: the application is required to specify the dependencies between API calls in great detail, and optimisations like double-buffering must be implemented entirely by the application.

In both cases the hardware is scheduled using a dependency graph, not a strictly linear ordering. The difference is just whether the graph is constructed primarily by the driver (which must make a conservative approximation; better to over-synchronise and reduce performance, than to under-synchronise and produce incorrect frames) or primarily by the application.

I think the issue in this article is that the graphics API only provides part of the full dependency graph. Once the GPU has rendered the frame onto a framebuffer, it has to be passed to the display hardware (or to a compositing window manager or to a video encoder or whatever) which will use the framebuffer for some time before it's safe for the GPU to render another frame onto it. Ideally those external dependencies would be added to the graph in a fine-grained way: e.g. when the application wants to render a new frame onto a framebuffer that the display hardware is still using, the application could do the initial steps for that frame (copying data to the GPU, running GPU compute, generating shadow maps, etc), and only at the point where the GPU is about to write onto the framebuffer does it have to wait for the display hardware. That gives better performance than delaying the entire frame until the framebuffer is ready. But that requires suitable APIs between the graphics drivers and display drivers and applications, and it sounds like those APIs are currently lacking.

Explicit synchronization is like systemd?

kazer — Sat, 14 Mar 2020 04:57:32 +0000

Oh, and the same steps (with sync after each step) may happen many times per second (for each frame) depending on what data changes and what can be reused (just view transformation or entirely different mesh etc.). Shaders running on GPU are another thing and so forth.

So that really needs to be efficient since it is used so heavily.

Explicit synchronization is like systemd?

kazer — Sat, 14 Mar 2020 04:46:24 +0000

> Is the above correct, and a valid way of understanding things?

Umm, not really, you have added some extraneous things which muddle the issue.

The way something like 3D graphics works is you batch some information for rendering:
* here's my list of vertices
* here's my list of indices (the way vertices are related)
* here's my texture
-> please render this on screen.

The way something like OpenGL works (according to article) is that after each step there is sync (fence, think of it as flush). This causes overhead when you could leave the sync after all the steps.

Things get more complicated when you are not working on pure software constructs but hardware resources (memory areas) for which multiple things might be competing for. If you have glxgears on one window and video playing in another you can't really let them step on each others toes when competing for some buffer so you need to sync when there is a frame ready to show or something to render.

Of course thing gets more complex when you have software with different toolkits and rendering methods at same time.

Above is with some handwaving and simplification naturally.

Explicit synchronization is like systemd?

sdalley — Fri, 13 Mar 2020 12:50:05 +0000

Thank you Jason for this interesting article. The forest of different interlocking bits and pieces in the Linux display stack is fundamentally daunting to understand, and articles like this which cast more light on the whole thing are very welcome.

In trying to get my head around the difference between implicit and explicit sync in the display stack, it occurs to me that implicit sync is like how sysvinit works, and explicit sync is like how systemd works.

In the one case, you have a more-or-less-linear progression of "I have to initialize/complete unit A before I can start to initialize/complete units B, C, D etc, because B depends on A, F depends on C and D etc etc". This works okay so long as the dependencies are all backward-looking for all the steps. But it becomes more and more cumbersome and inefficient as the later dependencies begin to branch out and get more sparse (e.g K needs C and J but doesn't care at all about the others). Many of the leaf units have the same or similar dependencies, and doing stuff in parallel to speed things up becomes complicated and error-prone.

In the other case, each unit U has explicit "sync" dependencies on the other units it needs, units X, Y etc, no more and no less. U calls getXthingy() which does "X, can you please give me one of your <Xthingy>'s, and if it's not ready yet I'll just wait until you get it ready, whatever else you depend on to do that is of no interest to me and I don't need to know", and ditto for <Ythingy>. The beauty of this way of doing things is that every unit Ui can start in parallel with no knowledge needed of any other dependencies but its own. The sequencing is automatic, and scaling and parallelism is built in.

A sync_file thingy in the display stack is equivalent to a service socket in systemd.

Is the above correct, and a valid way of understanding things?

Ekstrand: Plumbing explicit synchronization through the Linux ecosystem

kazer — Fri, 13 Mar 2020 09:41:01 +0000

It seems like a good proposal. There's some technical debt (backlog of ancient tech in need of updating) and this might go some way in solving them.

Ekstrand: Plumbing explicit synchronization through the Linux ecosystem

jafd — Thu, 12 Mar 2020 20:41:41 +0000

You wish they did.

Ekstrand: Plumbing explicit synchronization through the Linux ecosystem

bovinespirit — Thu, 12 Mar 2020 08:18:44 +0000

This is a weighty article crossing many systems and projects and presenting a proposal that could have far reaching consequences. I expect the LWN community is carefully considering the proposal and maybe conducting it's own researches before reaching it's conclusions and contributing carefully considered comments, if appropriate.

That's how internet forums work, isn't it?

Ekstrand: Plumbing explicit synchronization through the Linux ecosystem

scientes — Thu, 12 Mar 2020 07:01:24 +0000

It is a nice demonstration of the bike shed principal that the most interesting things posted in this week's LWN has zero comments, while the article about a silly new syscall has many.