LWN.net Logo

An Introduction to Linux Audio (O'Reilly)

Advertisement
John Littler discusses the writing of Linux audio software on O'Reilly. "Now, let's have a look at what we're trying to do and the main options available for doing it. The three main things to do are capturing (recording) audio, replaying it, and altering it. All of this comes under the heading of Digital Signal Processing (DSP). We'll be looking at the first two options: capturing and replaying. What we want to do is talk to the sound card in the computer, tell it what to do, what sort of arrangement the data should have (bearing in mind the card's capabilities), and then store it somewhere."
(Log in to post comments)

An Introduction to Linux Audio (O'Reilly)

Posted Aug 2, 2007 22:41 UTC (Thu) by allesfresser (subscriber, #216) [Link]

I'd just like to say that I found this article very refreshing. Unmitigated and unapologetic technical discussion, with C code examples, very nice. Makes me nostalgic for the old days...

An Introduction to Linux Audio (O'Reilly)

Posted Aug 2, 2007 23:02 UTC (Thu) by wertigon (guest, #42963) [Link]

I just find it interesting how complicated all the Linux and/or UNIX sound stuff has become.

As a person who recently gave a shot at this myself, when I asked why everything is in the sorry mess it appears to be, I got directed to a page which is quite an eye-opener.

And you know the saddest part? The thing that sparked this finally got fixed and again released as Open Source. Linux sound has a long way to go yet unfortunately. :(

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 0:31 UTC (Fri) by rfunk (subscriber, #4054) [Link]

I followed your link to the page that opened your eyes, and while its basic
history was decent, after that I found it to be one of the more ill-informed rants
I've read in a while. (Even ignoring the all-too-common confusion of the
words "deprecated" and "depreciated".)

Too bad. I was hoping to have my eyes opened too.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 11:26 UTC (Fri) by ken (subscriber, #625) [Link]

Yes it was a rant and I do not agree with a lot he said the fact is that sound is a mess. I did not know how big a mess until I tried to do some sound work myself about 2 years ago and got seriously confused on what to use. In the end I selected alsa with libasound but it was not based on any evaluation just that since it was supposed to be the new and better way to do it and it was a "realtime" project so I did not want to put more layering than absolutely necessary in the sound path. It's newer that for sure but better ??. That project is in need of someone to make a coherent document on how to use it.

As a user with a soundblaser (emuk10) based one I have a alsamixer covering two three screens width with options most doing nothing at all as far as I know and I never ever know what knob to change.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 14:13 UTC (Fri) by rfunk (subscriber, #4054) [Link]

Seems to me that your issues are addressed in the article. Be sure to read all
the way through to the JACK part. (Despite being an additional layer, JACK is
specifically intended for realtime audio.)

BTW, the emu10k, which I also have, is actually a rather high-capability chip.
Some people like to take advantage of all those capabilities, and ALSA allows
that while OSS doesn't. Don't blame ALSA for the fact that your sound chip has
more capabilities than you need.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 14:15 UTC (Fri) by rfunk (subscriber, #4054) [Link]

Quoting myself:
"Don't blame ALSA for the fact that your sound chip has more capabilities than
you need."

... or that alsamixer doesn't have a very good/smart user interface.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 23:52 UTC (Fri) by drag (subscriber, #31333) [Link]

Alsamixer is fine.

It's a low-level way to interact directly with the mixer settings present on your sound card.

The problem is that it's currently the _best_ way to do the mixer settings. There needs to be a higher level way of interacting with it. A standardized something-or-other that effectively abstracts away the differences in audio cards.

A generic way to interact with audio cards can never effectively suit the needs for advanced users, hence the need for alsamixer and the need for something that accurately reflects the capabilities of the hardware.

Our problem is that there isn't anything for people that don't give a crap about the capabilities of their audio card and just want to make recordings.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 4, 2007 8:01 UTC (Sat) by bersl2 (guest, #34928) [Link]

The problem is that it's currently the _best_ way to do the mixer settings. There needs to be a higher level way of interacting with it. A standardized something-or-other that effectively abstracts away the differences in audio cards.
HAL, perhaps?

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 16:27 UTC (Fri) by johnkarp (subscriber, #39285) [Link]

I don't think you can rightfully blame ALSA for the proliferation of other
APIs. They generally solve problems that neither ALSA nor OSS can solve
directly. For example:

Portability to Windows & Mac: SDL, Phonon, gstreamer
A/V [de]compression and [de]muxing: gstreamer
Networking: nas
Low-latency interprocess routing: Jack
Interprocess mixing: esd, aRts

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 18:44 UTC (Fri) by rfunk (subscriber, #4054) [Link]

ALSA can solve the interprocess mixing problem (with dmix, unfortunately not enabled by default), but esd and arts came about before ALSA was common.

I'm still not quite clear on why Jack should be necessary, but since I haven't needed it I'm not in a position to know.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 19:47 UTC (Fri) by drag (subscriber, #31333) [Link]

Actually nowadays dmix is enabled by default. There is no longer any need for a typical end user to muck around with their asoundrc.

The only time Linux, generally, users are forced to deal with Alsa now is when faced by Artsd (which is easy since it can be configured to use alsa) or with OSS-only applications which can't typically use dmix.

Your behind the times by a year or so. :)

Arts is effectively dead. But Pulseaudio has replaced ESD and is API/ABI compatable while being much superior. (esd has always sounded like utter crap..)

If you have need for networked sound I strongly suggest looking at pulseaudio. It supports all sorts of fancy features... For example Zeroconf so that the services will automatically be declared over a network and be picked up by things like Avahi or other pulseaudio software.Also it can be setup to follow X. So if your accessing remote X and there is sound involved then you can use pulseaudio to automatically follow it.

Also there is a 'pulse' plugin for Alsa. This way anything that supports Alsa can be used over the network via pulseaudio.

Jack is very nice for audio work.

It's special feature is that it is low-latency. This makes it suitable for realtime audio editing with multiple audio streams and multiple applications.

This gives Linux a unique advantage with applications that support jack they and route and re-route audio and midi between applications. Software synths, recording applications, midi controllers, LADSPA plugins etc etc. It'll work with internal and external midi stuff and allow you to easily control the I/O on your sound card, both digital and analog for setting up things like monitoring audio channels and playback/record stuff.

You can connect all these things to create very sophisticated super-applications. Sort of like you can string awk, sed, cat, and other things together to make sophisticated text handling applications.

In comparison with typical proprietary Windows or OS X setup your depending on one or another company's flagship product and your options are much more limited.

Combined with special low-latency patches like Ingor Molnar's realtime-preempt this allows a Linux PC to work as a REAL audio workstation and be suitable for studio, dj, and live audio work. Whatever you want. There are currently people selling Linux-based audio workstations for professional/studio-level work.

I think that the difference between desktop audio and network audio is enough that there is enough room for Jack, Alsa, and Pulseaudio to exist besides each other.

Also keep in mind that it's possible to get Jack and Pulseaudio to work together. There are also Jack and Pulse plugins for Alsa applications to use.

For arranging your applications to work through Pulseaudio take a look at:
http://www.pulseaudio.org/wiki/PerfectSetup

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 20:01 UTC (Fri) by rfunk (subscriber, #4054) [Link]

Ah, thanks for updating me! :-) Maybe I can delete my .asoundrc then.

But despite its lack of continued development, arts won't be entirely dead until
KDE 3 is dead. Also, my understanding of dmix is that it does work with ALSA's
OSS-compat mode.

I've used arts for networked sound in the past, but now I'll have to look into
pulse.

Last time I looked at Jack (on Debian) it wanted to rip out a bunch of my
sound-related infrastructure, which didn't give me a good feeling. I hope that's
been fixed somehow. (It was probably more of a Debian library issue of the
sort that makes people jump to Gentoo.)

An Introduction to Linux Audio (O'Reilly)

Posted Aug 4, 2007 0:05 UTC (Sat) by drag (subscriber, #31333) [Link]

I don't have any problems with jack nowadays, it should play nicely with everything else.

But keep in mind that it's for audio work. You can't use it with Dmix because dmix adds latency between jack and the sound card. It's not a very convenient or desktop-friendly way of doing things.

> But despite its lack of continued development, arts won't be entirely dead until KDE 3 is dead.

Yep. It's pretty much impossible for a normal user to use any KDE application and NOT use Artsd. The KDELibs will start artsd up automaticly when you launch applications. But it's not a difficult problem to deal with since artsd can be set to use Alsa and that solves most of the problems people have with it.

> Also, my understanding of dmix is that it does work with ALSA's
OSS-compat mode.

Sorta.

With Quake3-style games, for example, requires support of 'mmap' in the audio card for this to work. Some cards simply WILL NOT work with these games and OSS and although alsa supports mmap emulation I can't seem to get it to work.
(although with artsdsp -m they will work.. so you can go artsdsp --> artsd ---> dmix ---> alsa, though with the penalty of pretty high latency)

Also normal oss applications with dmix/alsa you need to start up applications with the the aoss wrapper. Using this they can work with dmix and play well with everybody else.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 19:55 UTC (Fri) by johnkarp (subscriber, #39285) [Link]

Say you are using a computer to do audio recording in a studio. With Jack,
you could route inputs from various audio interfaces into various effects
plugins, send some to a multitrack recorder, and also combine them all
into a mix for the musicians, so they can play off the effects. During the
recording session, you can easily reroute the signals from a central
patchbay.

You could probably hack together a workalike with other Linux audio
technologies, but it would be unusable because of the latencies. (Its
difficult to play an instrument well when you can only hear what you were
playing 0.3 seconds ago.)

I don't think Jack will become the standard general-purpose audio API
though, because each user on a system requires a separate Jack daemon that
has to run with realtime priviledges.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 3, 2007 20:14 UTC (Fri) by rfunk (subscriber, #4054) [Link]

My Jack confusion is/was mostly about why ALSA couldn't do it all internally.
But I think the part about multiple audio interfaces is what I was missing.

An Introduction to Linux Audio (O'Reilly)

Posted Aug 17, 2007 21:07 UTC (Fri) by ehowland (guest, #46882) [Link]

I read those articles and am quite interested that OSS is now GPL (or CDDL depending on your operating system) as noted here before

The key questions is whether anyone will care and if the trend toward ALSA will slow or stop.

Hannu's blog ( http://www.4front-tech.com/hannublog/) suggests that the version of OSS that is now in the kernal is not very good and in the 10 years since they closed it have really improved both the API and the quality of the code. Will OSS come back with the new API or continue to fade? I cannot comment on developer's preferences as I have not done any sound development, but if this means that OSS is no longer depreciated then I guess developers will vote with their code.

Still a long way to go

Posted Aug 3, 2007 8:31 UTC (Fri) by chel (guest, #11544) [Link]

IMHO the many standards for sound API's (ALSA, OSS, JACK, ARTS, ESD, etc.) still are a problem. A mature system should have a single API that is usable for all applications (and is used by all applications) At the moment, simultanious audio from different applications and in sync with eachother or video, still is a problem.

Still a long way to go

Posted Aug 3, 2007 13:52 UTC (Fri) by beoba (guest, #16942) [Link]

I'm thinking/hoping that over time a common de-facto standard will emerge. For example, two of those APIs you mentioned (aRts, ESD) are on their way out.

Maybe something at the level of GStreamer or SDL will become the standard, since they aren't as dependant on what OS you're using. Or at least that's the impression I've been getting from their marketing.

Still a long way to go

Posted Aug 3, 2007 18:36 UTC (Fri) by oak (subscriber, #2786) [Link]

Strange that nobody mentioned PulseAudio:
http://www.pulseaudio.org/wiki/FAQ

It's supposed to soon replace OSS and ESD (as API & server) while still
providing API compatible wrapper libraries for older code depending on
those and to offer nicer API on top of ALSA. I.e. it aims to be the
standard linux sound server. It's not intended to replace Phonon,
Gstreamer or JACK, they are intended for different purposes.

Still a long way to go

Posted Aug 3, 2007 20:11 UTC (Fri) by drag (subscriber, #31333) [Link]

Yep. Pulseaudio kicks-ass.

For example it supports zeroconf stuff, Avahi and that sort of thing. It can also be configured to follow network'd X. So that you have network transparent audio and video for X terminals (full duplex for things like VoIP...). It's a drop-in replacement for ESD. And there are plugins for Alsa so normal alsa-using applications can use it automaticly.

Alsa --- low level audio API

OpenAL --- cross-platform gaming audio api. originally developed for port games to Linux by Loki it is now supported by Creative, who is trying to use it as a gaming audio API for Microsoft Vista after Microsoft destroyed the ability to use hardware acceleration for 3D audio in Direct-whatever. (and yes, of course creative is being dicks about it and keeping the hardware acceleration stuff closed). Suppose to be to audio what OpenGL is for graphics.

SDL --- cross-platform 2-d, 3-d, controller I/O, and audio api. Generally for games. Linux, OS X, Windows, and so on. OpenAL was traditionally more sophisticated with dealing with surround sound stuff, but I think SDL has caught up. Nowadays I expect that Linux games should use SDL for audio and controller and use SDL or OpenGL for graphics, depending on how much control the developer needs.

Gstreamer --- used to _build_ applications. Think of it as a collection of building blocks.

Jack --- used for combining and routing PCM audio and Midi between multiple applications for audio work. Designed specificly for low-latency it is good for realtime audio editing and such.

LADSPA --- used for creating audio plugins for audio work for use in other applications

OSS -- now widely considered depreciated in Linux (people debate that) it's a standard audio interface for Unix-like systems.

Esd -- now depreciated in Linux it's designed to solve networking audio and the classic lack of software mixing support (since effectively solved by default use of dmix in alsa, except when you throw OSS or Artsd into the mix) in Gnome stuff.

Pulseaudio --- provide desktop audio for Linux. Drop in replacement for Esd, solves it's perfomance and quality issues. Allows for network transparent audio. Used to be called polyaudio

libao --- Xiph.org's cross platform audio library. Used in simple applications such as mpg321

Arts --- designed to create a standard api for sophisticated video and audio in KDE it's was mainly used to solve the lack of support in Linux for software mixing for non-hardware mixing sound cards. Mostly depreciated, gone in KDE4.

Xine --- a nice application used for media playback. Other applications took advantage of it's functionality for their own use.

Phonon --- simple API for audio for use in KDE4 applications. Designed to abstract away low-level api issues.

That's what I can think of right now.

PulseAudio

Posted Aug 4, 2007 9:50 UTC (Sat) by wolfgang.oertl (subscriber, #7418) [Link]

I have looked at PulseAudio recently, did almost all of the "perfect setup" described on the pulseaudio.org wiki. Works nicely, but it has a high CPU overhead on my system - while doing nothing (well, a few % of CPU) but even more when playing back audio.

This inefficiency needs to be fixed before it can be a standard, especially in the light of all the power saving efforts currently under way.

PulseAudio

Posted Aug 4, 2007 20:46 UTC (Sat) by drag (subscriber, #31333) [Link]

It's configurable how much cpu time you want to allow it to use.

The lower the latency and the tighter your audio requirements the more cpu time your going to use. If you loosen things up it will use much less cpu time.

From the /usr/share/doc/pulseaudio/README.Debian on my system:

> PLEASE NOTE: PulseAudio's default configuration uses high quality sample rate conversion that may be overly CPU intensive. If PulseAudio's CPU usage is unacceptable on your hardware, please change the resample-method option in /etc/pulse/daemon.conf to either src-linear or trivial. See daemon.conf for more details.

Still a long way to go

Posted Aug 5, 2007 6:23 UTC (Sun) by njs (subscriber, #40338) [Link]

> Alsa --- low level audio API

The problem with Alsa is that it is a bit confused about what level API it is supposed to be -- on the one hand there are the kernel drivers, which export a very low level interface to userspace, and then there is libasound. In practice, libasound is the only way to use the low level kernel drivers (IIRC they even get in trouble with the kernel developers because the kernel<->libasound interface is not even stable), and libasound adds all sorts of strange non-low-level stuff. It has plugins, you can route audio in weird directions, "dmix" actually uses sysv ipc to perform software mixing between multiple libasound users, etc. etc. All this in the lowest-level stable interface available.

My vague and uninformed impression is that alsa should back off to focus strictly on exposing the hardware, then pulseaudio and jack should provide the fancy stuff on top of that, and apps should target whichever is appropriate for the app in question. (Or pulseaudio and jack could hybridize to create the Ultimate User-Space Sound System, that'd be cool too.)

Still a long way to go

Posted Aug 7, 2007 20:40 UTC (Tue) by nix (subscriber, #2304) [Link]

libsydney: another does-everything API with a silly name which PulseAudio
will eventually transition to, only I'm not sure it's actually even alpha
yet.

Still a long way to go

Posted Aug 3, 2007 21:04 UTC (Fri) by jrigg (subscriber, #30848) [Link]

Given the conflicting requirements of realtime audio recording in a professional studio versus desktop sound, I think it would be a little unrealistic to expect one API to handle everything. I make a significant part of my living from sound recording using Linux audio software. It works extremely well for this. I just hope that any changes designed to make things easier for the average user don't render it useless for professional purposes. IMO the two types of application are better kept separate.

Still a long way to go

Posted Aug 3, 2007 23:14 UTC (Fri) by johnkarp (subscriber, #39285) [Link]

What you say makes perfect sense to me.

Apparently, though, Apple uses a single API for both applications,
CoreAudio. I'd be curious to see how that is implimented, particularly how
they deal with latency.

Still a long way to go

Posted Aug 4, 2007 4:06 UTC (Sat) by drag (subscriber, #31333) [Link]

It's suppose to be very good in terms of latency. People claim sub-10msec is reliable, but people claim all sorts of stuff about OS X that they shouldn't.

Still a long way to go

Posted Aug 4, 2007 9:17 UTC (Sat) by chel (guest, #11544) [Link]

I don't think the requirements are conflicting. At the moment mixed use of e.g. Flash, Mplayer, XMMS and VLC is a big problem. Just read the HOWTO pages and you will see it is unreachable without the thorough knowledge of the many sound systems available. For my professional audio (http://www.vangennip.com) I still use the Samplitude application that came with the soundcard I bought in the previous century. For the moment I don't have working tools in Linux to check if my audio and video really are in sync. Of the list posted by Drag, 90% or more should be removed completely, the remaining parts should be fixed to get a real working sound system. BTW, there is a major difference between realtime and fast. The major requirement of realtime is "predictable timing" and that won't hurt any application.

User level comment

Posted Aug 3, 2007 22:16 UTC (Fri) by sjj (guest, #2020) [Link]

Back when this audio issue was last discussed here, I whined about Gnome's "thou shalt not have more than one sound card".

Well, lo and behold, at least on Ubuntu Gutsy (7.10), you can have separate inputs and outputs for music, video, chat, and system events. All configured from a nice GUI that recognizes my sound cards. *And* it lets you hook the system volume to multiple controls (on M-Audio you need one per channel).

I hook my good speakers to M-Audio, system sounds to the monitor speakers, headset to its own in/output. Perfect.

And setting "pipeline = musicaudiosink" in Quod Libet works!

Whoever fixed this, thank you!

Now, I'm going to install ubuntustudio and we'll see how happy I am next week.

User level comment

Posted Aug 4, 2007 4:04 UTC (Sat) by drag (subscriber, #31333) [Link]

Don't forget the commercially supported variant of Debian for audio work called 64studio.

It's commercially supported Free software. No proprietary 'value-added' items with that and they provide no-cost iso image downloads.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.