By Nathan Willis
September 6, 2012
GStreamer is a framework designed for application development, but the
memory and processing demands of multimedia mean that it leans heavily
on the support of the operating system's underlying media layers. At
the 2012 GStreamer Conference, representatives from Video4Linux, ALSA,
and Wayland were on hand to report on recent developments and ongoing
work in the world of Linux media capture, sound, and display
technology.
Video4Linux
Hans Verkuil presented a session on the Video4Linux (V4L) subsystem,
which primarily handles video input, along with related matters. The
major change in the V4L arena, he said, has been the emergence of the
system-on-chip (SoC). In the desktop paradigm of years past, V4L had
relatively simple hardware to deal with: video capture cards and
webcams, the majority of which had similar capabilities. SoCs are
markedly different; many including discrete components like hardware
decoders and video scalers, and the system provides a flexible AV
pipeline — with multiple ways to route through the on-board
components depending on the processing needed.
Initially most SoC vendors wrote their own, proprietary modules to
make up for the features V4L lacked, he said, but V4L has caught up.
The core framework now includes a v4l2_subdev structure to
communicate with sub-devices like decoders and scalers. Although
these devices can vary from board to board in theory, he said, in
practice most vendors tend to stick with the same parts over many
hardware generations. There is also a new Media Controller API to handle managing
multi-function devices (including USB webcams that include an
integrated microphone, in addition to the flexible SoC routing
mentioned above) and the 3.1 kernel introduced a new control framework
that provides a consistent interface for brightness, contrast, frame
rate, and other settings.
V4L's roots were in the standard-definition era, so the project has
also struggled to make life easier for HDTV users. The initial
attempt was the Presets API in kernel 2.6.33, which provided fixed
settings for video in a handful of HDTV formats (720p30, 1080p60,
etc.). That API eventually proved too coarse for vendors, and was
replaced in kernel 3.5 with the Timings API, which allows custom
modeline-like video settings. The Event API is another recent
addition, significantly improved in 3.1, which allows code to
subscribe to immediate notification on events like the connection or
disconnection of an input port.
The videobuf2 framework is another
major overhaul; the previous incarnation of the framework (which
provides an abstraction layer
between applications and video device drivers) did not conform to
V4L's own API and provided a memory management framework so flawed
that most drivers did not even use it. The new framework separates
buffer operations from memory management operations, and by removing
the need for each driver to implement its own memory management,
should simplify device driver code significantly.
Other noteworthy changes include support for the H.264 codec, new
input cropping controls, and the long-awaited ability for radio tuners
to tune multiple frequency bands (such as FM and AM). Radio Data
System (RDS) support has also been upgraded, and now includes
Traffic Message Channel (TMC) coding used in many urban areas. Cisco
hired a student for the summer to write a new RDS library to
replace the older, broken one. Finally, a contiguous memory allocator was written
by Samsung and others for kernel 3.5, which helps video hardware
allocate the large chunks of physically contiguous memory they need
for direct memory access.
There is further work still in the pipeline, of course, and Verkuil
mentioned three topics of importance to GStreamer. The first is
buffer sharing; video decoding
pipelines would prefer to avoid copying large buffers whenever
possible, but currently V4L's video buffers are specific to an
individual video node. Integrating V4L with DMAbuf is probably the
solution, he said, and is likely to arrive in kernel 3.8. The second
is better support for newer video connector types like HDMI and
DisplayPort — in particular hot-pluggability and signal
detection, for use by embedded devices that need to set up these
connections without user intervention. Finally, he hopes to complete
a V4L compliance testing tool, which he describes as 90% finished.
The tool is used to test device drivers against the API, and drivers
are required to pass its test before they get into the kernel.
Verkuil said that the tool is actually stricter than the published
API, because it checks for a number of optional features which are
easy to implement, and can annoy users if they are left out.
ALSA
Takashi Iwai presented an update on the ALSA subsystem. In recent
years, ALSA has not seen as many major changes as the various video
subsystems have, but there are still plenty of challenges. The first
is that, like video, more and more hardware devices now support
decoding compressed audio in hardware. Kernel 3.3 added an API for
offloading audio decoding to a hardware device, though the bigger
improvement is likely to be kernel 3.7's merger of compressed audio
hardware decoding for the ALSA System on
Chip (ASoC) layer.
ASoC accounts for the majority of ALSA code (both in terms of lines
and number of commits), Iwai said, followed by the HD-audio layer used
in the majority of modern laptops. The third-largest component is
USB-audio, which provides a single generic driver used by all USB
audio devices. But while USB devices can share a common driver, the
HD-audio layer covers roughly 4000 devices, each of which has a
different configuration (in regard to which pin performs which
function). It is not possible for the ALSA project to maintain and
update 4000 separate configuration files, he said, so it instead
relies on user reports to discover differences between hardware.
That is a pain point, but most of the time hardware vendors use a
consistent configuration so most devices work without configuration.
Ongoing work in ALSA includes the Use Case Manager (UCM) abstraction
layer, a high-level device management layer that describes hardware
routing and configuration for common tasks like "phone call" or "music
playback." Jack detection is another continuing development.
Currently there is no API to detect whether or not a connector has a
jack plugged in, so multiple methods are in use, including Android's
external connector class extcon and ALSA's general controls
API.
Also still in the works is improved power management, both for
HD-audio devices and for hardware decoders. Improvements are expected
to land with kernel 3.7. HD-audio devices might also benefit from the
ability to "patch" device firmware and change the pin configuration,
so that recompiling the driver can be avoided.
The biggest outstanding issue at present is a channel mapping API,
which encodes the surround-sound position associated with the speaker
attached to each output channel (e.g., Front Left, Center, Right Rear,
Low-Frequency Effects). Each needs to receive its own PCM audio
stream, but there are multiple standards on the market, and the
problem becomes even trickier when the system needs to combine
channels for a setup with fewer speakers. There is a proposal in the
works, which was discussed at length later in the week at the Linux
Plumbers Conference audio mini-summit.
Wayland
Kristian Høgsberg presented an update on the Wayland display protocol
and how it will differ from X. The session was not overly
GStreamer-specific, but more of an introduction to Wayland. Since
Wayland is not being used in the wild yet, preparing GStreamer
developers in advance should simplify the eventual transition.
Høgsberg related the reasons for Wayland's creation — namely
that as separate window managers and compositors have become the norm
on Linux desktops, the X server itself is increasingly doing little
but acting as a middleman. Many of the earlier functions of the X
server have been moved out into separate libraries, such as Freetype,
Fontconfig, Qt, and GTK+. Other key functions, such as mode-setting
and input devices, are handled at lower levels, and many applications
use Cairo or OpenGL to paint their window contents. Compositing was
the final blow, however: in a compositing desktop, each window gets a
private buffer of its own, which is drawn to the screen by the
compositor. In this situation, X does nothing but add cost: another
copy operation for the buffer, and more memory.
He described the basics of the Wayland protocol, which he said he
expected to reach 1.0 status before the end of the year. That event
will not mark Wayland's world domination, however. Weston, the
reference compositor, already runs on most video hardware, but the
major desktop projects and distributions will each implement their own
Wayland support in their existing compositors (e.g., Mutter or KWin),
and that is when the majority of users will first encounter Wayland.
The more practical section of the talk followed, an explanation of how
Wayland handles video content. An application allocates a pixel
buffer and shares it with the compositor; the compositor then attaches
the buffer to an output "surface." Whenever a new frame is drawn to
the screen, the compositor sends a notification to the application,
which can then send the next frame. The big difference is that Wayland
always works with complete frames. In contrast, X is fundamentally a
stream protocol: it sends a series of events that must be de-queued
and processed.
Video support is really only a matter of extending the color spaces
that Wayland understands, he said. A video buffer may contain YUV data, for example.
Wayland needs to be able to put YUV data into a rendering surface, and
to composite RGB and YUV data together (such as in a video overlay).
This is still a work-in-progress, with a variety of options under
consideration. One would allow only RGB buffers, and require client
applications to handle the conversion, which could be costly in CPU
usage. Another is to decode the frames directly into OpenGL textures
and let OpenGL worry about the conversions. A third is to allocate
shared memory YUV buffers then require the compositor to copy them into
OpenGL textures, and perform the conversion at composite-time. The
entire puzzle is further complicated when one adds in the possibility
of hardware-decoded video content, which is increasingly common. If
the possibilities sound a tad confusing, do not worry — Høgsberg
said the project still finds it unclear which approach would be best.
GStreamer's video acceleration API (VA-API) plugin already supports
Wayland, so whichever path Wayland takes as it finalizes 1.0,
GStreamer support should follow in short order. Of course, GStreamer
itself is also preparing for its 1.0 release. But as the Wayland, ALSA,
and Video4Linux talks demonstrate, multimedia support on Linux is in
an ever-changing state.
(
Log in to post comments)