Development

GStreamer Conf: Linux media subsystems

By Nathan Willis
September 6, 2012

GStreamer is a framework designed for application development, but the memory and processing demands of multimedia mean that it leans heavily on the support of the operating system's underlying media layers. At the 2012 GStreamer Conference, representatives from Video4Linux, ALSA, and Wayland were on hand to report on recent developments and ongoing work in the world of Linux media capture, sound, and display technology.

Video4Linux

Hans Verkuil presented a session on the Video4Linux (V4L) subsystem, which primarily handles video input, along with related matters. The major change in the V4L arena, he said, has been the emergence of the system-on-chip (SoC). In the desktop paradigm of years past, V4L had relatively simple hardware to deal with: video capture cards and webcams, the majority of which had similar capabilities. SoCs are markedly different; many including discrete components like hardware decoders and video scalers, and the system provides a flexible AV pipeline — with multiple ways to route through the on-board components depending on the processing needed.

Initially most SoC vendors wrote their own, proprietary modules to make up for the features V4L lacked, he said, but V4L has caught up. The core framework now includes a v4l2_subdev structure to communicate with sub-devices like decoders and scalers. Although these devices can vary from board to board in theory, he said, in practice most vendors tend to stick with the same parts over many hardware generations. There is also a new Media Controller API to handle managing multi-function devices (including USB webcams that include an integrated microphone, in addition to the flexible SoC routing mentioned above) and the 3.1 kernel introduced a new control framework that provides a consistent interface for brightness, contrast, frame rate, and other settings.

V4L's roots were in the standard-definition era, so the project has also struggled to make life easier for HDTV users. The initial attempt was the Presets API in kernel 2.6.33, which provided fixed settings for video in a handful of HDTV formats (720p30, 1080p60, etc.). That API eventually proved too coarse for vendors, and was replaced in kernel 3.5 with the Timings API, which allows custom modeline-like video settings. The Event API is another recent addition, significantly improved in 3.1, which allows code to subscribe to immediate notification on events like the connection or disconnection of an input port.

The videobuf2 framework is another major overhaul; the previous incarnation of the framework (which provides an abstraction layer between applications and video device drivers) did not conform to V4L's own API and provided a memory management framework so flawed that most drivers did not even use it. The new framework separates buffer operations from memory management operations, and by removing the need for each driver to implement its own memory management, should simplify device driver code significantly.

Other noteworthy changes include support for the H.264 codec, new input cropping controls, and the long-awaited ability for radio tuners to tune multiple frequency bands (such as FM and AM). Radio Data System (RDS) support has also been upgraded, and now includes Traffic Message Channel (TMC) coding used in many urban areas. Cisco hired a student for the summer to write a new RDS library to replace the older, broken one. Finally, a contiguous memory allocator was written by Samsung and others for kernel 3.5, which helps video hardware allocate the large chunks of physically contiguous memory they need for direct memory access.

There is further work still in the pipeline, of course, and Verkuil mentioned three topics of importance to GStreamer. The first is buffer sharing; video decoding pipelines would prefer to avoid copying large buffers whenever possible, but currently V4L's video buffers are specific to an individual video node. Integrating V4L with DMAbuf is probably the solution, he said, and is likely to arrive in kernel 3.8. The second is better support for newer video connector types like HDMI and DisplayPort — in particular hot-pluggability and signal detection, for use by embedded devices that need to set up these connections without user intervention. Finally, he hopes to complete a V4L compliance testing tool, which he describes as 90% finished. The tool is used to test device drivers against the API, and drivers are required to pass its test before they get into the kernel. Verkuil said that the tool is actually stricter than the published API, because it checks for a number of optional features which are easy to implement, and can annoy users if they are left out.

ALSA

Takashi Iwai presented an update on the ALSA subsystem. In recent years, ALSA has not seen as many major changes as the various video subsystems have, but there are still plenty of challenges. The first is that, like video, more and more hardware devices now support decoding compressed audio in hardware. Kernel 3.3 added an API for offloading audio decoding to a hardware device, though the bigger improvement is likely to be kernel 3.7's merger of compressed audio hardware decoding for the ALSA System on Chip (ASoC) layer.

ASoC accounts for the majority of ALSA code (both in terms of lines and number of commits), Iwai said, followed by the HD-audio layer used in the majority of modern laptops. The third-largest component is USB-audio, which provides a single generic driver used by all USB audio devices. But while USB devices can share a common driver, the HD-audio layer covers roughly 4000 devices, each of which has a different configuration (in regard to which pin performs which function). It is not possible for the ALSA project to maintain and update 4000 separate configuration files, he said, so it instead relies on user reports to discover differences between hardware. That is a pain point, but most of the time hardware vendors use a consistent configuration so most devices work without configuration.

Ongoing work in ALSA includes the Use Case Manager (UCM) abstraction layer, a high-level device management layer that describes hardware routing and configuration for common tasks like "phone call" or "music playback." Jack detection is another continuing development. Currently there is no API to detect whether or not a connector has a jack plugged in, so multiple methods are in use, including Android's external connector class extcon and ALSA's general controls API.

Also still in the works is improved power management, both for HD-audio devices and for hardware decoders. Improvements are expected to land with kernel 3.7. HD-audio devices might also benefit from the ability to "patch" device firmware and change the pin configuration, so that recompiling the driver can be avoided.

The biggest outstanding issue at present is a channel mapping API, which encodes the surround-sound position associated with the speaker attached to each output channel (e.g., Front Left, Center, Right Rear, Low-Frequency Effects). Each needs to receive its own PCM audio stream, but there are multiple standards on the market, and the problem becomes even trickier when the system needs to combine channels for a setup with fewer speakers. There is a proposal in the works, which was discussed at length later in the week at the Linux Plumbers Conference audio mini-summit.

Wayland

Kristian Høgsberg presented an update on the Wayland display protocol and how it will differ from X. The session was not overly GStreamer-specific, but more of an introduction to Wayland. Since Wayland is not being used in the wild yet, preparing GStreamer developers in advance should simplify the eventual transition.

Høgsberg related the reasons for Wayland's creation — namely that as separate window managers and compositors have become the norm on Linux desktops, the X server itself is increasingly doing little but acting as a middleman. Many of the earlier functions of the X server have been moved out into separate libraries, such as Freetype, Fontconfig, Qt, and GTK+. Other key functions, such as mode-setting and input devices, are handled at lower levels, and many applications use Cairo or OpenGL to paint their window contents. Compositing was the final blow, however: in a compositing desktop, each window gets a private buffer of its own, which is drawn to the screen by the compositor. In this situation, X does nothing but add cost: another copy operation for the buffer, and more memory.

He described the basics of the Wayland protocol, which he said he expected to reach 1.0 status before the end of the year. That event will not mark Wayland's world domination, however. Weston, the reference compositor, already runs on most video hardware, but the major desktop projects and distributions will each implement their own Wayland support in their existing compositors (e.g., Mutter or KWin), and that is when the majority of users will first encounter Wayland.

The more practical section of the talk followed, an explanation of how Wayland handles video content. An application allocates a pixel buffer and shares it with the compositor; the compositor then attaches the buffer to an output "surface." Whenever a new frame is drawn to the screen, the compositor sends a notification to the application, which can then send the next frame. The big difference is that Wayland always works with complete frames. In contrast, X is fundamentally a stream protocol: it sends a series of events that must be de-queued and processed.

Video support is really only a matter of extending the color spaces that Wayland understands, he said. A video buffer may contain YUV data, for example. Wayland needs to be able to put YUV data into a rendering surface, and to composite RGB and YUV data together (such as in a video overlay).

This is still a work-in-progress, with a variety of options under consideration. One would allow only RGB buffers, and require client applications to handle the conversion, which could be costly in CPU usage. Another is to decode the frames directly into OpenGL textures and let OpenGL worry about the conversions. A third is to allocate shared memory YUV buffers then require the compositor to copy them into OpenGL textures, and perform the conversion at composite-time. The entire puzzle is further complicated when one adds in the possibility of hardware-decoded video content, which is increasingly common. If the possibilities sound a tad confusing, do not worry — Høgsberg said the project still finds it unclear which approach would be best.

GStreamer's video acceleration API (VA-API) plugin already supports Wayland, so whichever path Wayland takes as it finalizes 1.0, GStreamer support should follow in short order. Of course, GStreamer itself is also preparing for its 1.0 release. But as the Wayland, ALSA, and Video4Linux talks demonstrate, multimedia support on Linux is in an ever-changing state.

Comments (none posted)

Brief items

Quote of the week

I know everything is not perfect in Gnome land and my statements are pointing the good parts only but that's what we have to point out, right? Instead of pointing failures (they are always obvious), people need to point out what others are doing better than them so we can all improve. Whatever people might say about Gnome - as a developer I can only say that they are on the right path and sooner or later the technical excellence will pay off.

— Alexander Kurtakov

Comments (2 posted)

QEMU 1.2 released

Version 1.2 of the QEMU processor emulator has been released. "Even though this was the shortest release cycle in QEMU's history, it contains an impressive 1400 changesets from 180 unique authors." New features include support for LPAE (large physical address extensions) on the ARM Cortex A-15, a new ARM i.MX31 machine type, a way to produce ELF dumps of guest memory, support for PowerPC e5500 cores, better device tree support for PowerPC, and more. See the change log for all the details.

Comments (8 posted)

Qt5 beta released

The beta release of the Qt5 toolkit has been announced. "The Qt project aims to make developers’ life easier by enabling faster creation of great Qt apps and UIs on one or multiple targets. With Qt 5 we aim to make Qt better for addressing the latest UI paradigm shifts that i.e. touch screens and tablets require." See the announcement and the Qt5 feature list for details.

Comments (7 posted)

LilyPond 2.16.0 released

Version 2.16.0 of the LilyPond music typesetting system is out. New features include support for Kievan notation (as shown on the right), a number of improved interfaces, and a lot more; see the new features page for lots of details.

Full Story (comments: 1)

Twisted 12.2.0 released

Version 12.2 of the Twisted framework for Python is available. This release drops support for Python 2.5, provides a TCP endpoint that resolves IPv6 host addresses, and introduces several new compatibility features for integrating with other packages.

Full Story (comments: none)

Cython 0.17 released

Version 0.17 of the Cython language has been released. Cython is a superset of Python that adds support for C functions and types. Among other things, this release "rounds up some rough edges of the compiler and adds (preliminary) support for CPython 3.3 and PyPy".

Full Story (comments: none)

IcedTea-Web 1.3 released!

Version 1.3 of IcedTea-Web, the free software Java browser plugin from the IcedTea project, has been released. Highlighted features of this release include cookie-writing support, clarified security warnings, and better handling of applets that refer to missing classes.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the last week

Caml Weekly News (September 4)
What's cooking in git.git (August 29)
What's cooking in git.git (August 31)
What's cooking in git.git (September 4)
Haskell Weekly News (August 30)
Mozilla Hacks Weekly (August 30)
OpenStack Community Weekly Newsletter (August 31)
Perl Weekly (September 3)
PostgreSQL Weekly News (September 3)
Ruby Weekly (August 30)
Tahoe-LAFS Weekly (September 3)

Comments (none posted)

Day: Taking GNOME 3 to the next level

Allan Day previews the upcoming GNOME 3.6 release. "I’m more excited about this release than any since 3.0. The list of major updates is impressive: new message tray, updated Activities Overview, lock screen, integrated input sources, accessibility on by default, new Nautilus. Then there are all the small changes: new style modal dialogs, bags of improvements to System Settings, a new Empathy buddy list, SkyDrive support, natural scrolling, new backgrounds, an overhauled Baobab… the list goes on and on."

Comments (56 posted)

The new Firefox command-line interface

The folks at Mozilla have concluded that what the Firefox browser really needs is a command-line interface. "The 'pagemod' command lets you quickly make some bulk changes to the page. If you’re looking at a page and there’s something flashing at you, you can nuke it using the 'pagemod remove element' command."

Comments (16 posted)

Page editor: Nathan Willis
Next page: Announcements>>