The history, status, and future of audio for Linux systems was the topic of
two talks—coming at the theme from two different directions—at the
Linux Plumbers Conference (LPC). Ardour and JACK developer Paul Davis looked
at audio from mostly the
professional audio perspective, while PulseAudio developer Lennart Poettering, unsurprisingly,
discussed desktop audio. Davis's talk ranged over the full history of
Linux audio and gave a look at where he'd like to see things go, while
Poettering focused on the changes since last year's conference and "action
items" for the coming year.
Davis: origins and futures
Davis started using Linux as the second employee at Amazon in 1994, and
started working on audio and MIDI software for Linux in 1998. So, he has
been working in Linux audio for more than ten years. His presentation was
meant to provide a historical overview on why "audio on linux still
sucks, even though I had my fingers in all the pies that make it
suck". In addition, Davis believes there are lessons to be learned
other two major desktop operating systems, Windows and Mac OS X, which
may help in getting to better Linux audio.
He outlined what kind of audio support is needed for Linux, or, really, any
operating system. Audio data should be able to be brought in or sent out
of the system via any available audio interface as well as via the
network. Audio data, as well as audio routing information, should be able
to be shared between applications, and that routing should be able to
changed on the fly based on user requests or hardware reconfiguration.
There needs to be a "unified approach" to mixer controls, as
well. Most important, perhaps, is that the system needs to be
"easy to understand and to reason about".
Linux audio support began in the early 1990s with the Creative SoundBlaster
driver, which became the foundation for the Open Sound System (OSS). By
1998, Davis said, there was growing dissatisfaction with the design of OSS,
which led Jaroslav Kysela and others to begin work on the Advanced Linux
Sound Architecture (ALSA).
Between 1999 and 2001, ALSA was redesigned several times, each time
requiring audio applications to change because they would no longer
compile. The ALSA sequencer, a kernel-space MIDI router, was also added
during this time frame. By the end of 2001, ALSA was adopted as the
official Linux audio system
in favor instead of OSS. But, OSS didn't disappear and
is still developed and used both on Linux and other UNIXes.
In the early parts of this decade, the Linux audio developer community
started discussing techniques for connecting audio applications together,
something that is not supported directly by ALSA. At roughly the same
time, Davis started working on the Ardour
digital audio workstation, which led to JACK. The audio handling engine from
Ardour was turned into JACK, which is an "audio connection kit" that works
on most operating systems. JACK is mostly concerned with the low-latency
professional audio and music creation, rather than the needs of desktop users.
Since that time, the kernel has made strides in supporting realtime
scheduling that can be used by JACK and others to provide skip-free audio
performance, but much of that work is not available to users. Access to
realtime scheduling is tightly controlled, so there is a significant amount
of per-system configuration that must be done to access this functionality.
Most distributions do not provide a means for regular users to enable
realtime scheduling for audio applications, so most users are not
benefiting from those changes.
In the mid-2000s, Poettering started work on the PulseAudio server, KDE
stopped using the aRts sound server, GStreamer emerged as a means for
intra-application audio streaming, and so on. Desktops wanted "simple"
audio access APIs and created things like Phonon and libsydney, but
meanwhile JACK was the only way to access Firewire audio. All of that led
to great confusion for Linux audio users, Davis said.
Audio application models
At the bottom, audio hardware works in a very simple manner. For record
(or capture), there is a circular buffer in memory to which the hardware
writes, and from which the software reads. Playback is just the reverse.
In both cases, user space can add buffering on top of the circular buffer
used by the hardware, which is useful for some purposes, and not for others.
There are two separate models that can be used between the software and the
hardware. In a "push" model, the application decides when to read or write
data and how much, while the "pull" model reverses that, requiring the
hardware to determine when and how much I/O needs to be done. Supporting a
push model requires buffering in the system to smooth over arbitrary
application behavior. The pull model requires an application that can meet
deadlines imposed by the hardware.
Davis maintains that supporting push functionality on top of pull is easy,
just by adding buffering and an API. But supporting pull on top of push is
difficult and tends to perform poorly. So, audio support needs to be based
on the pull model at the low levels, with a push-based API added in on top,
Audio and video have much in common
OSS is based around the standard POSIX system calls, such as
open(), read(), write(), mmap(), etc.,
ALSA (which supports those same calls) is generally accessed through
libasound, which has a
"huge set of functions". Those functions provide ways to
control hardware and software configuration along with a large number of
commands to support various application styles.
In many ways, audio is like video, Davis said. Both generate a
"human sensory experience" by rescanning a data buffer and
"rendering" it to the output device. There are differences as well, mostly
in refresh rates and the effect of missing refresh deadlines. Unlike
audio, video data doesn't change that frequently when someone is just
running a GUI—unless they are playing back a video. Missed video
deadlines are often imperceptible, which is generally not true for audio.
So, Davis asked, does anyone seriously propose that video/graphics
applications should talk to the hardware directly via open/read/write/etc.?
For graphics, that has been mediated by a server or server-like API for
many years. Audio should be the same way, even though some disagree,
"but they are wrong", he said.
The problem with UNIX
The standard UNIX methods of device handling, using open/read/write/etc.,
are not necessarily suitable interfaces for interacting with realtime
hardware. Davis noted that he has been using UNIX for 25 years and loves
it, but that the driver API lacks some important pieces for handling audio
Both temporal and data format semantics are not part of that API, but are
necessary for handling that audio/video data. The standard interfaces can be
used, but don't promote a pull-based application design.
What is needed is a "server-esque architecture" and API that
can explicitly handle data format, routing, latency inquiries, and
synchronization. That server would mediate all device interaction, and
would live in user space. The API would not require that various services
be put into the kernel. Applications would have to stop believing that
they can and should directly control the hardware.
The OSS API must die
The OSS API requires any services (like data format conversion, routing,
etc.) be implemented in the kernel. It also encourages applications to do
things that do not work well with other applications that are also trying
to do some kind of audio task. OSS applications are written such that they
believe they completely control the hardware.
Because of that, Davis was quite clear that the "OSS API must
die". He noted that Fedora no longer supports OSS and was hopeful that
other distributions would follow that lead.
When ALSA was adopted, there might have been an opportunity to get rid of
OSS, but, at the time, there were a number of reasons not to do that, Davis
said. Backward compatibility with OSS was felt to be important, and there
was concern that doing realtime processing in user space was not going to
be possible—which turned out to be wrong. He noted that even today
there is nothing stopping users or distributors from installing OSS, nor
anything stopping developers from writing OSS applications.
Looking at OS X and Windows audio
Apple took a completely different approach when they redesigned the audio
API for Mac OS X. Mac OS 9 had a "crude audio architecture"
that was completely replaced in OS X. No backward compatibility was
supported and developers were just told to rewrite their applications. So,
the CoreAudio component provides a single API that can support users on
the desktop as well as professional audio applications.
On the other side of the coin, Windows has had three separate audio
interfaces along the way. Each maintained backward compatibility at the
API level, so that application developers did not need to change their
code, though driver writers were required to. Windows has taken much
longer to get low latency audio than either Linux or Mac OS X.
The clear implication is that backward compatibility tends to slow things
down, which may not be a big surprise.
JACK and PulseAudio: are both needed?
JACK and PulseAudio currently serve different needs, but, according to
Davis, there is hope that there could be convergence between them down the
road. JACK is primarily concerned with low latency, while PulseAudio is
targeted at the desktop, where application compatibility and power
consumption are two of the highest priorities.
Both are certainly needed right now, as JACK conflicts with the
application design of many desktop applications, while PulseAudio is not
able to support professional audio applications. Even if an interface were
designed to handle all of the requirements that are currently filled by
JACK and PulseAudio, Davis wondered if there were a way to force the
adoption of a new API. Distributions dropping support for OSS may provide
the "stick" to move application developers away from that interface, but
could something similar be done for a new API in the future?
If not, there are some real questions about how to improve the Linux audio
infrastructure, Davis said. The continued existence of both JACK and
PulseAudio, along with supporting older APIs, just leads to
"continued confusion" about what the right way to do audio on
Linux really is. He believes a unified API is possible from a technical
perspective—Apple's CoreAudio is a good example—but it can only
happen with "political and social manipulation".
Poettering: The state of Linux audio
The focus of Poettering's talk was desktop audio, rather than embedded or
professional audio applications. He started by looking at what had changed
since last year's LPC, noting that EsounD and OSS were officially gone
("RIP"), at least in Fedora. OSS can still be enabled in
Fedora, but it was a "great achievement" to have it removed,
There were only bugs reported against three applications because
of the OSS removal, VMware and quake2 amongst them. He said that there
"weren't many complaints", but an audience member noted the
"12,000 screaming users" of VMware as a significant problem.
Poettering shrugged that off, saying that he encouraged other distributions
to follow suit.
Confusion at last year's LPC led him to create the "Linux Audio API
Guide", which has helped clarify the situation, though there were
complaints about what he said about KDE and OSS.
Coming in Fedora 12, and in other distributions at "roughly the same
time", is using realtime scheduling by default on the desktop for
audio applications. There is a new mechanism to hand out realtime priority
(RealtimeKit) that will
prevent buggy or malicious applications from monopolizing the
CPU—essentially causing a denial of service. The desktop now makes use of the
high-resolution timers, because they "really needed to get better
than 1/HZ resolution" for audio applications.
Support for buffers of up to 2 seconds has been added. ALSA used to
restrict the buffer size to 64K, which equates to
70ms 370ms of CD quality
audio. Allowing bigger buffers is "the best thing you can do for
power consumption" as well as dropouts, he said.
Several things were moved into the audio server, including timer-based
audio scheduling which allows the server to "make decisions with
respect to latency and interrupt rates". A new mixer abstraction
was added, even though there are four existing already in ALSA. Those were
very hardware specific, Poettering said, while the new one is a very basic
Audio hardware has acquired udev integration over the last year,
and there is now "Bluetooth audio that actually works".
Poettering also noted that audio often didn't work "out of the box" because
there was no mixer information available for the hardware. Since last
year, an ALSA mixer initialization database has been created and populated:
"It's pretty complete", he said.
Challenges for the next year
There were a number of issues with the current sound drivers that
Poettering listed as needing attention in the coming year. Currently,
for power saving purposes, PulseAudio shuts down devices two seconds after
they become idle. That can
lead to problems with drivers that make noise when they are opened or
In addition, there are areas where the drivers do not report correct
information to the system. Decibel range of the device is one of those,
along with the device strings that are either broken or missing in many
drivers, which makes it difficult to automatically discover the hardware.
The various mixer element names are often wrong as well; in the past it
"usually didn't matter much", but it is becoming increasingly
important for those elements to be consistently named by drivers.
Some drivers are missing from the mixer initialization database, which
should be fixed as well.
The negotiation logic for sample rates, data formats, and so on are not
standardized. The order in which those parameters are changed can be
interpreted differently by each driver which leads to problems at the
higher levels, he said. There are also problems with timing for
synchronization between audio and video that need to be addressed at the
Poettering also had a whole slew of changes that need to be made to the
ALSA API so that PulseAudio (and others) can get more information about the
hardware. Things like the routing and mixer element mappings as well as
jack status (and any re-routing that is done on jack insertion) and data
transfer parameters such as the timing and the granularity of transfers.
Many of the current assumptions are based on consumer-grade hardware which
doesn't work for professional or embedded hardware, he said. It would be
"great if ALSA could give us a hint how stuff is connected".
There is also a need to synchronize multiple PCM clocks within a device,
along with adding atomic mixer updates that sync to the PCM clock.
Latency control, better channel mapping, atomic status updates, and HDMI
negotiation are all on his list as well.
Further out, there are a number of additional problems to be solved.
Codec pass-through—sending unaltered codec data, such as SPDIF, HDMI,
or A2DP, to the
device—is "very messy" and no one has figured out how to
handle synchronization issues with that. There is a need for a simpler,
higher-level PCM API, Poettering said, so that applications can use the
pull model, rather than being forced into the push model.
Another area that needs work is handling 20 second buffering. There are a
whole new set of problems that come with that change. As an example,
Poettering pointed out the problems that can occur if the user changes some
setting after that much audio data has been buffered. There need to be
ways to revoke the data that has been buffered or there will be up to 20
second lags between user action and changes to the audio.
Both presentations gave a clear sense that things are getting better
in the Linux audio space, though perhaps not with the speed that users
would like to see. Progress has clearly been made and there is a roadmap
for the near future. Whether Davis's vision of a unified API for Linux
audio can be realized remains to be seen, but there are lots of smart
hackers working on Linux audio. Sooner or later, the "one true Linux audio
API" may come to pass.
to post comments)