FOSDEM09: RandR 1.3 and multimedia processing extensions for X

February 18, 2009

This article was contributed by Koen Vervloesem

At FOSDEM 2009 (Free and Open Source Software Developers' European Meeting) in Brussels, your author attended a number of talks about the state of graphics in Linux. Two of them stood out: Matthias Hopf's talk about RandR 1.3 and Helge Bahmann's work on multimedia processing extensions for X.

RandR 1.3: panning, transformations and properties

Matthias Hopf of SUSE R&D gave an update about RandR 1.3 (Resize and Rotate Extension). So far, RandR 1.2 exposes an interface to dynamically set and query properties such as the displayed and known video modes, the framebuffer size, and attachment of a monitor. However, there are still some important features lacking. For example, querying the state of an output involves output probing, and there is no way for applications to distinguish between the internal panel and an external output, which could be interesting for presentation software. Panning is also lacking, just as displaying in a non-1:1 fashion. And last but not least: the framebuffer size of X is limited to its initial allocation.

RandR 1.3, which is to be released with X Server 1.6, should implement a number of these features. With the new version of the extension, it is finally possible to query the state without output probing. The function RRGetScreenResourcesCurrent is equivalent to RRGetScreenResources but does not use polling. However, you won't get notified of new monitors this way. The xrandr command to query the state of a VGA output would be:

    xrandr --output VGA --current

Other additions are multi-monitor panning and display transformations. When the mouse hits the screen borders, the viewport has to be changed. For a seamless movement without flickering, the graphics driver needs an update. The --panning option of the new xrandr command has three sets of four parameters, as in this example:

    xrandr --output VGA --panning 2000x1200+0+0/2000x1200+0+0/100/100/100/100

The first parameter set is the panning area, the second one is the tracking area, and the third one are the borders. For example, setting the right border to 100 means that panning begins if the user reaches a border of 100 pixels before the right end of the physical screen. The panning area is the area that might be visible on the screen, while the tracking area is the area in which the mouse pointer movements influence the pan. In most circumstances these two are identical.

There are still some conceptual problems to be solved, according to Hopf. He wonders what the combination of a dual-head configuration and panning could mean: should the whole space span when the user reaches the side of the virtual space, or should each physical space pan separately? Or should it be a combination of these two? Xrandr needs an update to accommodate to these possibilities. Another problem is that panning and display transformations don't fit together.

Display transformations in RandR 1.3 make it possible to transform the perspective of the CRTC content. This could be used for rotation, flipping, scaling and keystone correction. Under the hood, the code is using homogeneous coordinate transformations, implemented by a 3-component matrix-vector multiplication. The user has to specify this transformation matrix in the appropriate xrandr command, as in:

    xrandr --output VGA --transform 2,0,0,0,2,0,0,0,1

which scales the image down by a factor of 2. A more pragmatic use of this display transformation would be a keystone correction matrix, which transforms the distorted image of an incorrectly positioned projector to a perfect rectangle. It would seem that there is ample scope for the creation of more user-friendly interfaces to this functionality, though.

Distinguishing between different types of screens can be done in RandR 1.3 with standard properties, such as output and signal types. RandR will require graphics drivers to implement some mandatory properties to claim RandR 1.3 support. Hopf added that "unknown" is a valid value, so initial support is trivial. Two of these mandatory properties are SignalFormat and ConnectorType. The former describes the physical protocol format, such as VGA, TMDS, LVDS, Composite, Composite-PAL, Composite-NTSC, Composite-SECAM, SVideo, Component or DisplayPort. The graphics driver changes this property when the underlying hardware indicates a protocol change, and X clients can change this property to select a protocol. The ConnectorType property is immutable, and can have one of VGA, DVI, DVI-I, DVI-A, DVI-D, HDMI, Panel, TV, TV-Composite, TV-SVideo, TV-Component, TV-SCART, TV-C4 or DisplayPort as its value. A presentation application can use this property to detect unambiguously which is the laptop display and which is the projector display.

Other, non-mandatory properties are SignalProperties, ConnectorNumber, EDID (formerly EDID_DATA, the raw EDID data from the monitor), CompatibilityList and CloneList. Many of these properties haven't been implemented by any driver yet. A final problem that cannot be solved in RandR 1.3 is the framebuffer size limitation. The culprit is the current XAA implementation: XAA calls don't get the pitch as an argument and assume it stays the same for the whole life of the X Server.

Multimedia processing extensions for X

Helge Bahmann, a research assistant at the Technical University of Freiburg in Germany, talked about his experimental multimedia processing extensions for the X Window System. At this moment, multimedia applications either bypass X (e.g. by DRI), or they use X as a video playback service for computed images (e.g. by XVideo). The network transparency for which X is famous fails in both cases. If you want to display video remotely, you have to be able to transmit compressed media data, and you have to synchronize audio and video. For this purpose, Bahmann introduced three new experimental X extensions.

The TIME extension, part of Bahmann's master's thesis in 2002, introduces two new server-side objects: Clocks and Schedules. An X client can start, stop and query the X server's clock. The client also submits commands to the server with execution and expiration timestamps, and the scheduler executes these requests at the appropriate time. This mechanism allows the application to schedule drawing requests (using the RENDER extension). It's important to note the (non-)obligations of the client and server: the X server doesn't guarantee the timely execution of the commands, and the client thus cannot rely on a created state. At the other end, the client can "change its mind": it can retract scheduled commands and it can replace them with completely different commands. Retracting and replacing can fail if the server has already started the execution.

The other extension Bahmann introduced is the AUDIO extension: it implements a SampleBuffer object, which is server-side storage for audio samples, equivalent to pixmaps for images. A PCMContext object can serve as a clock: a client can bind an execution scheduler to it, which allows operations to be executed synchronized to audio playback or capture. This also allows a simple synchronization between audio and video. The AUDIO extension spawns a dedicated real-time thread, but the rest of the X server is completely unaware of the thread. Because of the audio thread, the X server has to be linked to a thread-safe libc.

The TIME and AUDIO extensions are the basic infrastructure for multimedia processing, but as you have to deal with huge amounts of data, this will not work well on low-bandwidth networks. That's why Bahmann introduced a third X extension: COMPRESS, which is actually a misnomer because it uncompresses data. The ImageSequenceDecompressor and AudioSequenceDecompressor can receive and buffer individual compressed JPEG frames, and convert them into an uncompressed representation which can be processed by the X server. The client must submit compressed frame data and hence has to understand the compression format.

Summarizing his talk, Bahmann stressed that his X extensions are conceptually relatively simple and reuse existing X functionality (such as communication, client/resource management and security) without duplicating existing X functionality. A drawback for the programmer is that these multimedia extensions are very low-level and hence not at all easy-to-use. The client application has a big responsibility: it has to understand, parse and partition the media data, to plan ahead, submit compressed data and schedule commands, and to handle synchronization. It also needs to back-track and reschedule commands, for example when the window size changes. Bahmann warns this can be very complex to implement.

Bahmann calls his extensions "experimental" because they work for him but probably require diving into the code if you want to use them. Audio, timing and synchronization basically work, and the protocol part of the compression is finished. However, the backend interface to plug in new decompressors is still in flux. But all in all, it works:

Simple networked media player applications work quite well across our campus network. For really low-bandwidth media, such as MPEG1 at 70 to 80 kbytes per second, it also works across my DSL connection. The overhead added by the X protocol is quite minuscule.

While implementing and thinking about these extensions, Bahmann encountered some deficiencies of the current design of the X server. There's the security problem of audio/image decompressors in a process running with root privileges. Knowing the bad security track record of media players such as VLC and MPlayer, one should be cautious with such complex code running with root privileges. Bahmann is currently auditing the decompressors, and that's the principal reason why there are so few codecs available. Secondly, the compute-intensive decompression operations of the COMPRESS extension may stall the X server. Bahmann suggests to give up the current single-threaded design of the X server, but that is an idea which has not been accepted by the X development community in the past. In the absence of multiple threads, decompression must be handled carefully so as to avoid interfering with other X operations.

Index entries for this article
GuestArticles	Vervloesem, Koen
Conference	FOSDEM/2009

FOSDEM09: RandR 1.3 and multimedia processing extensions for X

Posted Feb 19, 2009 2:44 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (1 responses)

Why not spin the decompression off into a separate process and use some kind of shared-memory ring buffer instead? You get the concurrency benefits without the security problems, since the spun-off process can have vastly reduced privileges.

FOSDEM09: RandR 1.3 and multimedia processing extensions for X

Posted Feb 23, 2009 11:57 UTC (Mon) by helge.bahmann (subscriber, #56804) [Link]

That would work, but it requires quite a bit of "protocol" between the X
server and the decompressor (select which compressed/uncompressed images
to retain or delete inside the decompressor process).

Getting the concurrency benefits is however more tricky -- if a client
calls "Decompress" in the X server, the X server must delegate the
operation to the helper process, suspend the calling client's request
queue (the X server is single-threaded!), receive completion, resume the
client's request queue etc.

One thing I dislike about the "helper process" approach is that I am not
yet sure how the interface should look like -- currently all decompressors
are for efficiency reasons (cache locality) "iterator-based": you pass
compressed data (as well as decoding dependencies) in, and what you get is
an iterator that traverses the image top to bottom and yields
horizontal "bands" of the image. Currently you receive pixels, usually in
some sort of YCrCb format, which must then be converted to RGB etc. (but
the interface should later also allow getting e.g. DCT coeffecients +
motion vectors for hardware-assisted decode). Mapping this model to the
X/decompressor process is probably not going to work due to excessive
context switches.

FOSDEM09: RandR 1.3 and multimedia processing extensions for X

Posted Feb 19, 2009 2:57 UTC (Thu) by daniels (subscriber, #16193) [Link]

I think our current position on threads could probably be summed up as 'we did it once and it went really badly; doing it again is going to be really, really hard'. It doesn't mean it's a bad idea, though. :)

FOSDEM09: RandR 1.3 and multimedia processing extensions for X

Posted Feb 19, 2009 7:20 UTC (Thu) by aleXXX (subscriber, #2742) [Link] (1 responses)

Are these multimedia extensions similar to MAS ? Does this actually still
exist ?

Regarding RandR: I was quite surprised when I noticed that suddenly
virtual screens (now called panning) didn't work anymore after upgrading
my distro (to Slackware 12.1). The keyword "virtual" is still supported
in xorg.conf, but it does something different now. It sets up a frame
buffer of the given size, but you can't see it, since you can't move to
make it visible :-/
IMO it's not very good style to keep a keyword/function with the same
name but make it do something very different. If they would have just
removed support for "virtual", X would have told me that this is not
longer supported, which would have saved me a lot of time.

Alex

FOSDEM09: RandR 1.3 and multimedia processing extensions for X

Posted Feb 26, 2009 15:10 UTC (Thu) by Duncan (guest, #6647) [Link]

AFAIK, in theory at least, RandR 1.2 supported not panning, but specifying
the viewport location. At least xrandr had parameters for it. It is/was
thus possible (in theory) to use xrandr (my usage was in a script) to set
the viewport to wherever one wanted it, and one could (in theory)
programatically pan by repeatedly invoking said script or application to
move X and Y pixels.

You'll note the "in theory"s however. Apparently not all video drivers
implemented the settable viewport functionality. The resolution change
and virtual size functions of my xrandr invoking script worked just fine,
but I've never been able to get the viewport parameters to work -- xrandr
takes them -- they just don't do anything. When the resolution is set
below the maximum virtual size, the viewport stays nailed to the top left
corner, no matter /what/ position parameters I give xrandr. This is with
xf86-video-ati, 6.9 and 6.10 at least, and I think I tried it with 6.8 but
that was far enough back IDR for sure, on a Radeon 9200SE AGP with dual
outputs, DVI-I and VGA, formerly running dual 22" CRT analog monitors in
1600x1200 stacked for 1600x2400, now running dual 24" LCDs 1920x1200
stacked for 1920x2400. (Yes, I need to upgrade video cards.)

Thus I could change resolution with the orientation staying correct, but
for the most part it wasn't of much use since all I could see at the lower
resolutions was the top left corner of the screen, no matter where I told
it to put the viewport. Still, I was able to work around that problem to
split-resolution top and bottom to play the only piece of proprietaryware
I have left, Master of Orion original DOS edition (1993 update copyright),
full-screen (single-screen) in DOSBOX at 640x480, while keeping the other
screen normal resolution to run my ksysguard graphs and a music player,
etc. That worked even tho I couldn't pan, because I could use kwin's
absolute positioning options to put dosbox/orion right under the 640x480
viewport of the one screen, while keeping the other at normal resolution.

Acceleration Architectures, state of the art.

Posted Feb 19, 2009 10:02 UTC (Thu) by ctg (guest, #3459) [Link] (11 responses)

It is a little hard to see where development is going:

is EXA a one-for-one replacement for XAA, or just for some use cases?
Is UXA a replacement for EXA, or a branch of some ideas that will be remerged?

Surely 99.999% of graphics implementations will be one of Intel, Nvida, AMD/ATI and Via (I've not come across anything else for years now, and certainly not one which allows you to use multiple adapters).

Is there a case for streamlining X down to EXA/UXA and drivers which support one of those....?

I've rootled around on xorg.freedesktop.org, but can't really find obvious answers to these sorts of questions.

Users of, say Matrox, would probably be pretty content with the feature set of Xorg 7.current.

Acceleration Architectures, state of the art.

Posted Feb 19, 2009 12:48 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

As far as I can tell the intention is for EXA to replace XAA entirely. As of (very) recently it even does core fonts nice and fast, so unless you rely on things like stipple acceleration (!) I'd say it's getting there quite well.

(downside: it looks like Speedo and Type1 core fonts are about to stop working, as the code to handle them has been torn out of libXfont 1.4.0. I doubt anyone misses Speedo, but Type1 may very well be missed, because they *do* still work client-side.)

Acceleration Architectures, state of the art.

Posted Feb 20, 2009 5:05 UTC (Fri) by jamesh (guest, #1159) [Link] (3 responses)

While the Type1 code is gone from libXfont, it still has the FreeType backend. Given that FreeType can read Type1 fonts, is it possible that they were just removing a superfluous backend and switching to the code base that everyone else uses? (including the apps that use Xft).

Acceleration Architectures, state of the art.

Posted Feb 20, 2009 10:44 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

It's possible. I'm not entirely sure *where* libXfont is used, to be
honest (something or other to do with core fonts). I used to know; I need
a prosthetic memory.

Acceleration Architectures, state of the art.

Posted Feb 20, 2009 14:47 UTC (Fri) by jamesh (guest, #1159) [Link]

LibXfont is pretty much just the core font support library. Any application that has been ported to fontconfig (using Xft or cairo) won't be using it, and should retain Type1 support through the use of freetype.

Looking at the libXfont 1.4.0 release notes, it looks like you were correct about Type1 fonts not working with the new version. That said, I totally understand the desire to reduce the size/complexity of the core font code: the feature is deprecated, but can't easily be removed as it is not an extension to the X protocol.

If they can support Type1 fonts through FreeType with minimal effort, then great. If they can't, it isn't a huge loss.

Acceleration Architectures, state of the art.

Posted Feb 23, 2009 16:54 UTC (Mon) by jcristau (subscriber, #41237) [Link]

Only the old type1 libXfont backend has been removed, support for type1 fonts from the freetype backend is still there.

Acceleration Architectures, state of the art.

Posted Feb 19, 2009 18:59 UTC (Thu) by kingdon (guest, #4526) [Link]

Those four vendors are about 92% based on the statistics at http://smolt.fedoraproject.org/static/stats/by_class_VIDE... .

Acceleration Architectures, state of the art.

Posted Feb 20, 2009 0:57 UTC (Fri) by wookey (guest, #5501) [Link]

Those manufacturers may account for nearly all PC/x86 type machines, but that is not the whole X space. Modern ARM chips, as used in things like smartphones, and soon to be seen in netbooks, now have 3D graphics cores in them, such as the powerVR from Imagination in Cortex A8-based CPUs. This looks likely to be a non-trivial fraction of devices in the future, so X shouldn't be ignoring them, although my understanding is that Imagination are about as helpful as nVidia when it comes to free drivers.

Acceleration Architectures, state of the art.

Posted Feb 21, 2009 14:00 UTC (Sat) by roblucid (guest, #48964) [Link]

> Surely 99.999% of graphics implementations will be one of Intel, Nvida,
> AMD/ATI and Via (I've not come across anything else for years now, and
> certainly not one which allows you to use multiple adapters).

Some of us still have Matrox cards, which were a well supported card if you liked Open Source drivers, not so long ago and thus reccomended. I don't know if the dual monitor output still works, but it'd be bad form to drop support of cards that were FOSS favoured, only five years ago.

Acceleration Architectures, state of the art.

Posted Feb 23, 2009 17:48 UTC (Mon) by dbnichol (subscriber, #39622) [Link] (1 responses)

It has definitely been the plan for a long time to get all drivers onto EXA. XAA is only receiving bug fixes in the server. On the other hand, UXA is used by the intel driver to play nice with GEM and KMS. It started as a straight clone of EXA, though. I don't know if the UXA features can be ported back to EXA in the server. You'd probably have to ask keithp or anholt.

Acceleration Architectures, state of the art.

Posted Feb 28, 2009 12:53 UTC (Sat) by daenzer (subscriber, #7050) [Link]

> On the other hand, UXA is used by the intel driver to play nice with GEM
> and KMS. It started as a straight clone of EXA, though. I don't know if
> the UXA features can be ported back to EXA in the server. You'd probably
> have to ask keithp or anholt.

And take their answers with a big grain of salt... There's nothing to be
ported back, really. UXA basically just removes the EXA code the intel
driver doesn't need. However, it could 'play nice with GEM and KMS' (i.e.
use buffer objects for pixmap storage) in pretty much the same way with EXA,
and the code that was removed would be inactive.

Acceleration Architectures, state of the art.

Posted Feb 28, 2009 13:10 UTC (Sat) by daenzer (subscriber, #7050) [Link]

> is EXA a one-for-one replacement for XAA, or just for some use cases?

EXA is designed for modern compositing environments, whereas XAA is
excessively optimized for every little bit of core X11 rendering operations
and hasn't really evolved much this millenium. So while EXA should be on par
or faster with modern applications (especially with a compositing manager,
even just xcompmgr -a) it'll probably never be able to match XAA in all
cases with older applications, at least not without turning into the same
kind of monster.

Why bother?

Posted Feb 19, 2009 20:15 UTC (Thu) by NAR (subscriber, #1313) [Link] (5 responses)

Exactly what's the point of these multimedia extensions? We already have at least one framework for audio over network (I've heard that this is the advantage of pulseaudio over e.g. esd) and I don't really see why would anyone decode e.g. MPEG4, then encode the result picture to JPEG, send over the network, then decode it again. Isn't that one decoding step lossy enough? On the other hand mediaplayers nowadays are usually able to play over network directly.

Why bother?

Posted Feb 20, 2009 11:03 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

esd gives you audio over the network. The advantage of pulseaudio over it
is its pluggability, its lower CPU consumption, its lower latencies, its
huge swarms of extra features, the way it can imitate or output to
virtually anything, its active development, and the fact that it generally
doesn't suck.

Why bother?

Posted Feb 24, 2009 3:20 UTC (Tue) by jlokier (guest, #52227) [Link]

Yet I still have to kill the PulseAudio daemon sometimes before some apps produce sound. I don't believe the hype about PA's ability to imitate everything else, because it plainly fails at it.

Why bother?

Posted Feb 20, 2009 17:10 UTC (Fri) by shapr (subscriber, #9077) [Link]

The point is that ssh -X would one day be able to forward sound and video across the network, as if you were sitting in front of the computer itself.

Why bother?

Posted Feb 23, 2009 12:04 UTC (Mon) by helge.bahmann (subscriber, #56804) [Link]

"Audio over network" works well with for example PulseAudio, but what
doesn't work is "audio+video+synchronization over network". While you
could tie X+PulseAudio server close enough together to make this feasible,
I suspect the result is going to be quite messy.

As for MPEG-4... the idea would of course be to provide an AVC
decompressor accessible through the X protocol as well -- reencoding at
the client is quite obviously highly undesirable (maybe useful as an
emergency fallback, but that's it).

And last time I checked, there was no media player that could play
through "ssh -X" :)

Why bother?

Posted Feb 26, 2009 11:24 UTC (Thu) by muwlgr (guest, #35359) [Link]

Exactly. Hardly anyone would want to stream a sequence fo JPEGs over the network. Streaming of MP3/MP4/AVI/OGG/Theora right to the X-server, then decoding and playing it there, would look much more savvy and tasty. I think so.

FOSDEM09: RandR 1.3 and multimedia processing extensions for X

Posted Feb 27, 2009 12:51 UTC (Fri) by phdm (guest, #56884) [Link]

For the image part, this reminds me of XIE (X11 Image Extension), that 15 years ago allowed X servers to display jpeg images sent uncompressed using the X protocol. I miss XIE, that was dropped 5 years ago by xorg (or was it xfree ?), but nowadays we should have XMME (multimedia), not XIE + audio.