LWN.net Logo

Stretching and squishing audio files with rubberband

By Forrest Cook
March 18, 2009

The Rubber Band Library and the associated rubberband audio processing utility have been developed by Breakfast Quay, creators of the dssi-vst VST audio plugin adapter. The software is dual-licensed, with GPLv2-licensed source code and a commercially licensed edition. The Rubber Band description states:

Rubber Band Library is a high quality software library for audio time-stretching and pitch-shifting. It permits you to change the tempo and pitch of an audio recording or stream dynamically and independently of one another. Rubber Band Library is intended for use by developers creating their own application programs rather than directly by end users, although it does also include a simple command-line utility program of its own that you can use for simple adjustments to the speed and pitch of existing audio files.

The features document discusses the capabilities of the library in more detail and the Rubber Band Technical notes explains some of the underlying software methods:

Rubber Band Library is a block-based phase vocoder with phase resets on percussive transients, an adaptive stretch ratio between phase reset points, and a "lamination" method to improve vertical phase coherence. It is implemented in portable C++, and it requires separate library support for the FFT and resampling implementations: for the Free Software edition, this means FFTW and libsamplerate (the proprietary edition supports other options as well).

See the Rubber Band API documentation for more information on the library's components. Version 1.3 of Rubber Band Library was announced on March 16, 2009, no new features were added but a number of build and runtime bugs were fixed.

The source code for rubberband can be downloaded and built, or a pre-compiled executable is available for the curious (and trusting). The Usage document explains the various options available for rubberband. A simple example batch run such as: rubberband -T1.5 infile.wav outfile.wav takes infile.wav and produces outfile.wav with a tempo that's 1.5 times faster, but with the same pitch. There are a number of additional options that can be used to select the other types of audio conversions and to fine-tune the processing methods. For the curious, a number of example audio files are available for listening to.

A number of interesting uses for rubberband come to mind. The software could be used in radio production for making those annoying compressed legalese notices that show up at the end of pharmaceutical ads. It could be used to greatly speed up the time it takes to listen to audio books and podcasts, or the producers of those files could use it for compressing their sound files to reduce bandwidth usage. Recordings can be pitch-shifted to correct the speed problems that can be caused by older analog recording equipment. Those who are learning a new language could use the software to slow down the speed of the foreign speech. The software could be useful for producing musical sound effects or amusing answering machine messages.

Rubberband addresses a fairly narrow range of audio processing needs, but gives the user more control when compared to built-in pitch-shifting and/or tempo-shifting functions found in software such as the popular Audacity and Ardour audio editors. It makes a useful addition to a collection of open-source audio processing utilities.


(Log in to post comments)

Stretching and squishing audio files with rubberband

Posted Mar 19, 2009 1:57 UTC (Thu) by lordsutch (guest, #53) [Link]

Another useful application: transcoding audio when converting film sources (24p fps) to PAL (25p/50i fps) or vice-versa; usually pitch-shifting the audio is more desirable and less noticeable than altering the frame-rate, which creates artifacts like stuttering pans or blurry frame interpolations.

PAL TV in particular is amenable to transcoding to 23.976 fps "NTSC-Film" DVD format; just re-encode the video to a DVD-legal resolution and fixup the audio for DVD frame rates.

Stretching and squishing audio files with rubberband

Posted Mar 19, 2009 10:55 UTC (Thu) by nettings (subscriber, #429) [Link]

You write:
The software [...] could be used to greatly speed up the time it takes to listen to audio books and podcasts, or the producers of those files could use it for compressing their sound files to reduce bandwidth usage.
that wouldn't be very clever. using a psychoacoustic codec (whether mp3 or vorbis) will provide vastly better results than time-shrinking a file and ask the user to expand it later. and as to listening to the compressed version, whointheirrightmindwouldwanttodothat(andthewfewweirdosthatwillcanalwaysshrinktheaudiothemselves).
Recordings can be pitch-shifted to correct the speed problems that can be caused by older analog recording equipment.
not the case. tapes (or records, for that matter) running at the wrong speed will also run at the wrong pitch - all that's required is simple resampling, which will yield cleaner results than timestretching.
Those who are learning a new language could use the software to slow down the speed of the foreign speech. The software could be useful for producing musical sound effects or amusing answering machine messages.
another important use of this software is to time-align unsynchronized audio sources. say i have two-camera footage from a live event, and since i'm using consumer equipment, there is no way to use wordclock. cam1 gets the p.a. sound, cam2 takes ambience. now i can timestretch the ambience track so that it exactly aligns with the dry sound, even though the two cams may have drifted significantly over the course of a few hours.

Stretching and squishing audio files with rubberband

Posted Mar 19, 2009 15:51 UTC (Thu) by johnkarp (subscriber, #39285) [Link]

If you have two recordings, made with drifting sample clocks, the differences in sample rate would affect the recorded pitch, not just the timings. So I think you'd want to use resampling in that case too.

Stretching and squishing audio files with rubberband

Posted Mar 20, 2009 1:25 UTC (Fri) by jwoithe (subscriber, #10521) [Link]

Correct, resampling is what is required. I have a similar situation with my AV rig. In this case we have a "prosumer" camera without wordclock and a multichannel audio interface. The audio clock in the interface is therefore not locked to the video frame clock. In essence, the audio interface is recording at a rate somewhat different than 48 kHz relative to the video timebase (it might be 47998.1 Hz for example). So to synchronise the audio interface's recording to the video frames (or even the audio recorded by the camera) the audio from the interface simply needs to be resampled in post-production so it is at 48 kHz relative to the video timebase. What this does is gives us an approximation of the audio samples which would have been recorded had the audio clock been locked to the video timebase originally.

This works best if the individual time references are stable for the duration of the recording - otherwise you have different effective audio sampling rates for different sections of the recording, which involves more work to fix up.

Note we synchronise to the video timebase because video can't be "resampled" to synchronise to something else.

Of course the best outcome is to just use a camera with a clock input, but I can't afford that.

Stretching and squishing audio files with rubberband

Posted Mar 19, 2009 16:26 UTC (Thu) by cook (subscriber, #4) [Link]

>that wouldn't be very clever. using a psychoacoustic codec (whether mp3 or >vorbis) will provide vastly better results than time-shrinking a file and >ask the user to expand it later.

If you first squished the file with rubberband, *then* converted the
output to mp3, it should produce a smaller mp3 file. There is a
tradeoff between listenability and size reduction. One thing rubberband
does not do is eliminate the "audio whitespace", or silent parts
between words. I'm not aware of any open-source tools that can do that.

Stretching and squishing audio files with rubberband

Posted Mar 19, 2009 16:52 UTC (Thu) by johnkarp (subscriber, #39285) [Link]

If you're really concerned about the space/bandwidth taken by speech, you should use a codec designed for it, like speex. And with those codecs, I don't think increasing the tempo would save any space, since they'd have to encode more phonemes per second to keep up.

Stretching and squishing audio files with rubberband

Posted Mar 21, 2009 0:25 UTC (Sat) by dododge (subscriber, #2870) [Link]

and as to listening to the compressed version, whointheirrightmindwouldwanttodothat

Actually I do something like that pretty regularly. I often watch things like lectures and videogame trailers in xine with the "stretch" filter applied. This speeds up playback but tries to preserve the audio pitch. When you just want to get through the content quickly and don't care that the visuals are running too fast, it usually works pretty well. In fact after a lot of this I sometimes find "normal" speed to seem plodding and tedious.

So how does it compare?

Posted Mar 19, 2009 14:07 UTC (Thu) by smurf (subscriber, #17840) [Link]

Don't Ardour / Audacity have their own pitch/tempo-shifting tools?

If so, is there a quantifiable difference between these?

In any case, I'd assume that a plug-in for one or both of the above editors would be preferable to writing a separate tool.

So how does it compare?

Posted Mar 19, 2009 15:56 UTC (Thu) by johnkarp (subscriber, #39285) [Link]

For my uses, the tempo-shifters that come with the linux audio editors have been unusable; they create a slap-back echo effect, the fidelity is very bad. It sounds like 'rubberband' might be a better implementation.

If there's going to be widespread use, a C library is much more useful than a single-application plugin. You can always write a plugin to encapsulate the library, but the reverse is less practical.

So how does it compare?

Posted Mar 22, 2009 17:48 UTC (Sun) by alankila (subscriber, #47141) [Link]

There is also the venerable choice of autocorrelating the sample to find an approximation of its shortest repeating period, and then skipping or replaying earlier blocks of data when such a correlation has been found.

Speech is one of the simpler sound sources to modify in this fashion because it is stable and has a relatively high base frequencies, which means that correlated blocks of audio can usually be found only a few dozen samples away. For offline work, constraints on latency are also very flexible, so it doesn't really matter if this kind of program delays the audio by 100 ms or so.

Stretching and squishing audio files with rubberband

Posted Mar 21, 2009 11:29 UTC (Sat) by lbt (subscriber, #29672) [Link]

I've been looking for this kind of thing for a while - adjusting the tempo of some music without changing the pitch is incredibly useful in dance.

I wonder if the Mixxx (http://www.mixxx.org/) guys know about it...

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds