|
|
Subscribe / Log in / New account

Interview with Audacity developer Dominic Mazzoni

May 26, 2004

This article was contributed by Dave Fancella

As a long time musician, or so I like to call myself, and a free software enthusiast, I have personally found Audacity to be an indispensable tool for mastering mixes. Other people find a variety of uses for it, including deployment into public radio stations, restoring LPs for CD-burning, and more. Audacity has been in continuous development since 1990. It is a multi-track recorder, mixer, wave form visualization tool, and editor all rolled into one. It's the Free Software equivalent of Protools, Soundforge, and Cakewalk, albeit without the midi portion of any of those programs.

Recently I exchanged some email with Dominic Mazzoni, the Lead Developer and founder of Audacity. As a long-time lurker on the Audacity-devel mailing list, I've come to be familiar with Dominic as one of those kind, gentle spirits who leads first with his coding and second with his ideas, and is an inspiration to us all. Here, then, is the email interview with Dominic Mazzoni:

Q: In the Audacity FAQ, it says "Audacity was started in the fall of 1999 by Dominic Mazzoni while he was a graduate student at Carnegie Mellon University in Pittsburgh, PA, USA. He was working on a research project with his advisor, Professor Roger Dannenberg, and they needed a tool that would let them visualize audio analysis algorithms. Over time, this program developed into a general audio editor, and other people started helping out." Would you provide some information on the nature of the tool? How did it turn into a general audio editor? Was it a graded assignment, and if so, what grade did you get?

A: I was in a Ph.D. program at CMU, and the way it works there is that grad students are supposed to work on independent research right from day one, even while we're taking classes. My dream was to develop automatic music transcription software that could take any recording and turn it into sheet music. This was too difficult, of course, so I was working on monophonic pitch transcription and melody matching, which eventually led to some reasonably successful research in how to retrieve a melody from a database of songs based on a sung/hummed query. While I was trying to visualize pitch transcription algorithms, I started developing my own tool. Since there weren't any other audio editors for Linux that I liked, and I couldn't afford any good editors for the Mac (my two preferred platforms), I thought it would be fun to turn my project into a complete editor.

My advisor, Roger [DF: Roger Dannenburg is the mastermind behind the Nyquist scripting library which is now embedded in Audacity and is just one way to extend Audacity's sound processing capabilities], was very supportive of the project, and convinced me to turn the editor into a Computer Science research project. So I came up with an interesting data structure that could do editing operations quickly, and we wrote a paper on it. By the end of that year, though, I was having a lot of fun with the audio editor and was spending more and more time on it outside of my official research. I came up with the name "Audacity" and released it on Sourceforge. It was pretty limited at the time, but it was cross-platform, which was a big deal, and it worked well enough to generate interest. From that point on I worked on it mostly as a hobby, rather than as a part of my research, though I did find it useful for my research, too.

By the way, I never finished my Ph.D. - I decided I need a break from grad school after a couple of years so I moved back to California to work for a while. I received my Master's degree from CMU last year after completing an additional course on my own time.

Q: What do you do when you're not working on Audacity? Family? Kids? I seem to remember reading somewhere that you were working out at NASA Jet Propulsion Laboratories. Do you program there, or actually build rockets?

A: I work in the Machine Learning Systems group at the Jet Propulsion Laboratory. I split my time between theoretical research in machine learning algorithms and applied research, using machine learning algorithms to find patterns in data from NASA science instruments. In particular I've focused on using Support Vector Machines (a relatively new machine learning technique that can be used in place of Artificial Neural Networks) to develop pixel classifiers - for example to distinguish between different types of clouds in satellite images so that scientists can develop better global climate models.

I'm single, but I try to keep an active social life. When I'm not at JPL or working on Audacity, you might find me playing the piano in a jazz band, cooking vegetarian food, ballroom dancing, playing board games, or riding my unicycle.

Q: Audacity has been gaining a lot of traction in the market, lately. How do you feel about that? Do you ever get the "15-minutes of fame" feeling, or is it something you ever really think about?

A: I've thoroughly enjoyed all of the attention that Audacity has gotten. I enjoy working on something that people find useful, and I would choose fame over fortune any day. I've invested so much time into Audacity that it can affect me pretty seriously - seeing a good review or getting an email full of praise can give me an emotional high that lasts all week, but unfortunately bug reports, especially serious ones where people have lost work because of a bug in Audacity, can really make me feel depressed. Recently I had to take a step back and give myself a vacation from responding to emails to audacity-help for my own sanity (thankfully, other developers and users have done a great job of answering the mail).

Q: Do you envision a day when you might turn away from Audacity as Lead Developer and pursue something else? Do you visualize yourself in your retirement, with a wheelchair, oxygen tanks, and a laptop still hacking on Audacity (or something to that effect)?

A: DM: Ha, that's a good one. I'll keep on working on Audacity as long as I enjoy it. It does get in the way of me pursuing other large programming projects, because it's hard to get involved in too many large projects at once. Between the programming I do at work and Audacity, I get pretty burned out. But I take the time to work on little projects on the side often.

If somebody else wanted to take over as the project leader, I would probably let them (assuming I think they'd do a good job, of course). Until then, I'll put in as much time as I can and don't plan to retire anytime soon if nobody else would be taking over.

Q: Occasionally, users post thank-you messages to the developer's list, thanking you and the other developers. Frequently they provide links to their own music that they recorded and mastered with Audacity. Do you ever follow the links? If so, what kind of diversity do you find among the users of Audacity?

A: I follow quite a few links and really enjoy hearing music and reading about projects created with Audacity. Some things have been very touching; someone sent me a CD of interviews high-school students recorded (using Audacity) of their personal heroes; someone else told me he used Audacity to record an interview with his father before he died.

Q: It can be said that many Free Software projects are just Free as in Speech knock-offs of popular commercial applications. Is Audacity just a knock-off of commercial applications?

A: I have been inspired by many commercial programs, in particular the original "SoundEdit" for the Macintosh, and also CoolEdit, Sound Forge, Cubase, ProTools, and several other programs. Many aspects of Audacity bear a superficial resemblance to many of these programs, but no more so than they all resemble each other.

Q: So I take it this is a 'no'?

A: No, we're not a free knockoff of any other software. That doesn't mean that we shouldn't implement features that other programs have. It means that when a lot of users request a feature, instead of copying another program, I prefer to think about it, survey users, and come up with what I think is the best solution for Audacity. Sometimes that ends up being similar to other programs, but often it ends up quite different, but equally good.

Q: How do you feel, and how do you respond when users show up that want specific features found only in specific commercial applications?

A: Actually I don't think that anyone has ever said they wouldn't use Audacity if it didn't work exactly like their favorite proprietary application. Most people are perfectly happy to do things a different way as long as it's equally intuitive and powerful. Sometimes we're able to satisfy users by making Audacity as customizable as possible - for example you can edit all of Audacity's keyboard shortcuts and make them the same as some other program if you want. The other Audacity developers and I came up with our own keyboard shortcuts based on what we thought would be the most intuitive and useful, but users are free to modify that (and they can even save their keyboard layouts as XML and share them with other users).

I've been a Mac user since the very beginning (my parents bought an original Macintosh in 1984) so I've always been a fan of intuitive, "discoverable" interfaces. My main complaint with other audio editors is that too often they are trying to emulate the interfaces of analog mixing boards, which I didn't think was very intuitive for the rest of us. I wanted to create an interface that anyone computer-literate could figure out how to use on their own.

Q: For that matter, even the digital mixing boards are trying to emulate the analog interfaces when they don't really have to. :) Are there any specific areas where you think Audacity could really take advantage of the fact that it's software for a general use computer to make some really nice interface?

A: There are lots of areas where an audio editor could be "smarter" than it is now to save users time. I'd like to see Audacity do automatic beat detection and have an option to snap the selection to the nearest beat boundary, making it easier to cut an entire chorus out of a song without breaking the tempo, for example. I'm sure there are hundreds of other things like that.

If you look closely, you'll see lots of subtle differences in the way that Audacity operates. Unlike almost every other audio program I've seen, Audacity lets you have multiple tracks, each with a different sample format (16-bit/32-bit) and sample rate (44100 Hz, etc) - and Audacity automatically mixes them on the fly. It also has a rather unique built-in amplitude envelope editor, and one of the best frequency analysis views.

Q: How would you define Audacity's target market?

A: Well, it's free, so everyone. Seriously. I'd like Audacity to be good enough to meet the needs of 90% of the users who just want to record a song or an interview, create a mix, convert a tape or LP to CD, etc. Then for everyone who has more advanced needs than that, there are plenty of other tools available - but there's no reason not to keep Audacity around also for the few things that Audacity might do best.

Audacity is a particularly good choice when it's helpful to have a truly cross-platform tool, such as in a mixed-operating-system school computer lab - or when the licensing cost of other tools is prohibitive, such as in third-world countries or at public radio stations.

Q: If I recall correctly, Audacity still doesn't play well with other platforms when exchanging project files across platforms. Is that still true?

A: Actually the projects open fine on different platforms, but if you switch from a big-endian to a little-endian platform or vice-versa (usually that means to or from a Mac) the waveform display will look wrong on-screen, even though it will play fine. As a workaround, you can just apply an effect that does nothing (like Amplify 0) and it will fix the display. This was a design flaw in our blocked-file format, but we might try to work around this in the future. It's not a big deal, though, because aside from the cosmetic problem the project copies perfectly from one platform to another, and there's an easy workaround for the cosmetic problem.

Q: What innovations do you think Audacity brings to the table?

A: It's not necessarily obvious, but I think that perhaps the greatest innovative feature of Audacity is that it's truly cross-platform. Instead of being developed for one platform and then ported later (often badly) to another platform, Audacity has been developed right from day one to run on Mac OS, Windows, and Linux. The two most important libraries we're using to help with this are wxWidgets and PortAudio - Audacity would not be possible without them - but it's taken a lot of work (and a lot of patches to wxWidgets and PortAudio) to make Audacity really run smoothly on all platforms. Anyway, I think that this is a really big deal, because there are very, very few graphical/multimedia programs that really run natively on Mac, Windows, and Linux. The biggest ones I can think of other than Audacity are RealPlayer and Netscape/Mozilla. Note that I'm not counting any software written in Java because it doesn't use the native look and feel, nor do I count programs that run natively on Linux and Windows but only run on Mac OS using X11, because they don't look or feel like native programs.

I also think that Audacity has lots of subtle innovations hidden in surprising places. For example, when you select a note in Audacity and then open the "Change Pitch" dialog, Audacity analyzes the selected audio and automatically fills in the fundamental frequency in the "from" box, letting you type in something else in the "to" box. Another innovation is the "Import Raw" command that can automatically figure out the format of any (uncompressed) audio file, even in a weird unsupported format, by determining what interpretation of the bits results in the most continuous-looking signal. I also believe that the double-handles on the amplitude envelopes (which let you amplify a signal beyond 100%) and the multi-mode tool are all innovations that are unique to Audacity.

Q: Have you had to make many compromises in Audacity's interface just so it can be consistent with specific platform expectations that differ across platforms?

A: I don't think we ever cut out a feature because we couldn't get it to work on one platform. Instead we just leave it out of one platform for a while. For example, I couldn't figure out how to support dragging toolbars out of the main window on the Mac for a while, so many versions of Audacity didn't support that feature. But later, after some bugs in wxWidgets were fixed, we added it. Similarly, Mac cursors are 16x16 pixels, while on newer versions of Windows and Linux they can be up to 32x32 pixels. Since we wanted a few of Audacity's custom cursors to be larger than 16x16 where possible, we designed two sets of cursors. Also, early versions of Audacity for Linux did not support full-duplex recording (playing existing tracks while recording new ones). We released Audacity anyway and added that feature later.

Audacity would definitely be easier if we only had to support one platform. Windows has a nice toolbar widget that we could have used instead of writing our own. Mac OS X has a great preference dialog we could have used. And on Linux there are lots of libraries we could link to that would provide all sorts of useful functionality, but it wouldn't be cross-platform.

Q: I understand that Audacity uses a block file approach, where instead of manipulating each track as one large file you guys have broken each track down into many small files. Would you tell us more about this setup? Why did you chose it over other methods? What are the benefits and drawbacks with using block files?

A: Well, to be honest, when I started Audacity I didn't know about Edit Decision Lists. My only experience was with tools like SoundEdit and (early versions of) CoolEdit, both of which were very slow at doing things like Cut, Copy, Paste, and Undo, because they rewrote the entire audio file on disk after each operation.

Q: How about some more information on Edit Decision Lists?

A: An edit decision list is a list of all of the modifications you made to the original audio. The original audio file is left alone, and when you press play, the computer applies all of the edits in real-time to render the audio. This makes editing very fast, since the program is just manipulating a list of edits, but it can increase the amount of processing power required to playback audio in real-time. These days, though, you can do hundreds of edits before you even begin to slow down a modern PC.

I knew I could do better using my Computer Science knowledge, and soon I had worked out a method that involves splitting each track into small pieces - say about 2 MB each. If you allow each piece to be any size from 1 MB to 2 MB, but no smaller or larger, then it turns out you can implement all of the basic editing operations (cut, paste, etc.) without ever having to modify more than 5 pieces ("blocks") at a time. This was what I ended up writing a paper on.

In doing the research for the paper, I learned about Edit Decision Lists and other techniques for nondestructive audio editing. In the end I decided while there were some advantages to EDLs, there were just as many advantages to the blocked-file approach, so it would be better to keep Audacity unique and capitalize on the strengths of this approach, rather than switch to EDLs just to copy everyone else.

One advantage of the blocked-file approach is that you can have multiple "references" to the same data in multiple places. So duplicating a track in Audacity, or creating a loop (using the Repeat effect), are both virtually instantaneous. Also, because Audacity never splits files smaller than about a megabyte, it doesn't slow down trying to playback a region that contains hundreds of edits, which can be a problem with EDL-based editors.

Q: So does Audacity use a Copy On Write method, then? Or is it somewhat beyond something as rudimentary as Copy On Write?

A: It's very much like the Copy On Write method used by your operating system's virtual memory system, except that in VM all of the pages are the same size, and in Audacity the blocks can vary in size.

There are definitely some problems with the approach, though, that we're trying to work around. The idea that an Audacity project is actually a file plus a data folder is confusing to many people at first.

Q: More recently, there has been a bit of buzz over a new back end implementation of Audacity's work code in a library that has been named "Mezzo". Would you tell us a bit about Mezzo?

A: We've been talking about something like Mezzo for years, but Joshua Haberman (one of the earliest Audacity developers) and I finally started working on it a couple months ago. We did a lot of redesigning and rewriting together early on, but now that we're mostly happy with the new design, Joshua has been doing most of the work.

Mezzo is a rewrite of all of the major core features of Audacity aside from the graphical interface. While Audacity is distributed under the terms of the GNU General Public License, which means that the source code can only be borrowed for use in other GPL or GPL-compatible programs, Mezzo will be released under a very unrestrictive BSD-like license that will allow it to be used by almost anyone. We hope that this will encourage many more people to use Mezzo in projects unrelated to Audacity, including commercial products, which will lead to Mezzo being much more robust and stable.

Q: Are there any plans to support Mezzo with bindings to other languages, such as Python or Perl? On a related note, if you take all the work code out of Audacity and leave just the GUI, what are the potential ramifications to Audacity itself if you were to look at a GUI-oriented language, such as wxPython, in order to facilitate GUI development? Would the C++ dependency be lessened enough to make it feasible to switch the GUI to a different language, one that arguably would in fact facilitate development?

A: Mezzo would definitely be the appropriate place to create Python or Perl bindings. It would definitely be interesting to consider Audacity in wxPython. I'm a big fan of Python in general. Note, however, that it would still require a fair amount of C++ code in order to be efficient - in particular all of the track drawing code (which is a lot more complicated than it looks) and most effects processing code would still need to be in C++, otherwise it would be abysmally slow. Using Numerical Python could help a lot, but that's not a standard part of the Python install for most people yet.

I think it will be years before we could consider creating the GUI for Audacity in wxPython. But much sooner than that I'm sure that somebody could create a very simple prototype editor using Mezzo and wxPython. That would be a fun exercise. I was thinking of writing a very simple text-only audio editor sometime just for fun.

Right now the biggest advantage of Mezzo over the equivalent part of the Audacity code is that Mezzo is much cleaner and easier to read. We've already started to add some enhancements, though, like more flexible blocked-file formats.

Q: Any ideas how soon we'll get a release that uses Mezzo?

A:It will be many months before there's a version of Audacity out that depends on Mezzo, and probably a year or more before there's a stable release. In the meantime, though, I'll be devoting a lot of my time to working on incremental improvements to the current version of Audacity. For example, I'm working on VU meters now, which was our most often requested feature.

Well, thank you very much Dominic for your time, both in this interview and your time spent bringing us Audacity. It definitely fills a hole for many of us, and as usual, there isn't really any way to properly thank you other than continuing to use and support Audacity.

Audacity can be found at audacity.sourceforge.net. Information on Mezzo can be found in the Audacity Wiki.

Index entries for this article
GuestArticlesFancella, Dave


to post comments


Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds