June 29, 2011
This article was contributed by Nathan Willis
Acoustic fingerprinting has been given a tremendous boost by the mobile
smartphone business. You have probably seen the basic scenario in
television commercials, if not in person: a user holds up a phone to
capture a few seconds of audio playing nearby, and the application computes
a "fingerprint" of the track, which is then used to query a remote database
for the mystery artist and track name. The space has been dominated by
proprietary software, but a new — and open source — project was
unveiled last week, named Echoprint.
Fingerprints on the databases
Despite the name, acoustic fingerprinting has little in common with hash-based digital fingerprinting techniques used to detect alteration of a file. While a hash function is sensitive to changes in individual bits, an acoustic fingerprint function must robustly analyze the way the audio sounds, in a manner independent of the codec used, bitrate, or even static and ambient noise. Acoustic fingerprints focus on extracting perceptual data from the audio track, such as its tempo, average spectrum, and pattern of recurring tones. The canonical use is to discover the track information of an unknown audio clip, but other uses are possible as well, such as finding similar-sounding music (on any number of factors supported by the algorithm).
The proprietary software market is home to several acoustic
fingerprinting services, most famously Shazam, SoundHound, and Gracenote.
Gracenote is known to many in the free software community due to the
controversy that erupted a decade ago when its corporate parent suddenly
restricted the usage of CDDB, its user-built compact disc identifying
database. Many felt betrayed by the policy change, because the CDDB data
was submitted voluntarily by users when playing or ripping CDs, not
entered or harvested by CDDB's owners themselves. The database was an
early example of Internet crowd-sourcing, and many saw the sudden
cut-off of access as outright exploitation of their effort.
Fast-forward to 2011, and most open source applications use competing,
"open content" services instead, such as MusicBrainz, which is managed by the
501(c)(3) nonprofit MetaBrainz Foundation. For the past several years,
MusicBrainz has supported acoustic fingerprinting through the closed-source
MusicDNS service provided by MusicIP (later renamed AmpliFIND).
Although MusicBrainz had a perpetual contract with AmpliFIND for the service, it was never considered a good fit, since MusicBrainz's social contract requires it to remain "100% free." Recently, a handful of open source acoustic fingerprinting projects started picking up steam — such as Luká Lalinský's Acoustid — and MusicBrainz decided to start looking for open source and open content replacements for MusicDNS. Around the same time, the team at acoustic fingerprinting startup Echo Nest decided that its best strategy was to take its entire product open source and attempt to commoditize acoustic fingerprint services, rather than attempt to take on the entrenched players head-to-head.
Echo Nest and MusicBrainz had collaborated in the past on projects such
as Rosetta
Stone — a utility to match artist and track IDs between
various music services' ID databases — so the mutual decision to
launch Echoprint as an open project and begin integrating it with
MusicBrainz was a good fit from both perspectives. But it did not hurt
that AmpliFIND also sold off its intellectual property holdings —
including MusicDNS and the portable unique identifier (PUID) database
— to none other than Gracenote.
The Echoprint release
The Echoprint system consists of three components. The Codegen
fingerprint generator takes an audio file (or audio sample) as input, and
generates a fingerprint based on the Echo Nest Musical Fingerprint (ENMFP)
algorithm. The Echoprint server maintains a database of fingerprints
indexed to track information, and supports remote queries as well as
inserting new fingerprints and tracks. The Echoprint database itself
contains publicly-accessible track and fingerprint data. The database contains fingerprint codes for the entire
duration of each track, but as in most acoustic fingerprinting
techniques, only a shorter segment is usually sent for comparison. Echo
Nest claims that Echoprint provides accurate matches for fingerprint
blocks computed from samples of at least 20 seconds in length.
In practical usage, an application would sample audio (either captured
or from a file), use the Codegen library to compute a fingerprint, and query a compatible Echoprint server. The server would return any matching track records in JSON format. Alternatively, if there are no decent matches, the application could submit its fingerprint information to the server's database along with track metadata acquired through some other means.
The code for Codegen, the server, and various utilities (including an example iPhone app) are hosted at GitHub. The Codegen application and shared library are available under the MIT license, while the server (which is based on Apache Solr and Tokyo Tyrant) are under the Apache License 2.0.
The public Echoprint database is provided under its own terms, dubbed the "Echoprint
Database License." It allows for commercial and noncommercial
usage, and requires that anyone who downloads the data and adds to it
contributes the additional data back to Echo Nest. That clause is something
less than a Creative Commons-style "Share Alike" requirement, because it
requires sending the data to Echo Nest alone. The preamble to the
license seems to indicate that all such contributions will be shared with
the public, but Echo Nest assumes no obligation to share the data.
The initial
release is "seeded" with approximately 13 million fingerprints
generated from online music vendor 7Digital's digital holdings, with
metadata provided by MusicBrainz.
There are some other potentially worrisome terms in the agreement,
including a requirement to use Echoprint "powered by" logos in any
application that accesses the data. In addition, the agreement is not
clear about how Echo Nest can modify or terminate the agreement down the
road. For those who were burned by the CDDB debacle, this agreement should
give them pause as it is not at all
clear that the same couldn't happen with the Echoprint database.
At the moment, Echo Nest has not published the details of its algorithm in a form suitable for casual reading. The source code for Codegen is provided, of course, but a white paper is supposed to be released shortly that will explain the process at length. Unfortunately, the current legal documents do not explicitly address patent grants in relation to the software (the MIT license is very brief), which might concern some developers. Acoustic fingerprinting is a patent-laden field, and indeed a little searching reveals several relevant filings in the name of Echo Nest and its founders Brian Whitman and Tristan Jehan. On the plus side, all of the proprietary acoustic fingerprinting services are in roughly the same position.
Currently Echo Nest's own "song/identity" server is the only up-and-running Echoprint database, although obviously any application authors could set up their own servers for testing purposes. The Codegen command-line application will build on any reasonably modern Linux system; the only significant dependencies are TagLib, Boost, and FFmpeg. The application generates a fingerprint from a file argument (optionally followed by a start time and duration, both in seconds). The output is a JSON object including ID3 tag information from the file and a base64-encoded representation of the fingerprint. This output can be posted directly to the Echo nest server with cURL or a similar tool, as documented in the Codegen README file.
Play or Pause
MusicBrainz's Robert Kaye said that the project plans to retain support
for PUIDs and MusicDNS in the MusicBrainz database for the foreseeable
future (or until "people pester me to get rid of it."). The
project is running a test
server that uses Echoprint in lieu of MusicDNS, but there is no time frame to add tables to the main database to support Echoprint.
Kaye said that he expects more tuning to be done to the Echoprint
product before it is ready for widespread adoption, but he observed that
"critical mass" is the most important factor — meaning
support in client applications and a sizable database of reliable
fingerprints. The 13 million tracks pre-loaded with 7Digital's help may
sound like a lot, but for comparison, Shazam claims
more than one billion songs in its database to have
identified more than one billion songs.
Given the number of open source audio projects that use MusicBrainz, it is safe to say that Echoprint has its foot in the door. It is the first entirely open source acoustic fingerprinting system to hit the market in "ready to use" form, so it may spawn considerable development of song recognition in open source mobile applications. Without the burden of licensing fees, the technology could spread beyond stand-alone song-recognition-apps, open or closed.
Nevertheless, Kaye emphasized that MusicBrainz post-MusicDNS move is
meant to make the project agnostic to acoustic fingerprinting algorithms.
Acoustid is still in active development, too, has documented
the details of its algorithm, and does not require changing the MusicBrainz
database format for support.
Whether the two fingerprinting techniques overlap, complement, or compete
may ultimately be up to the users to determine. Echoprint is so new that it is
difficult to predict where it will go from here. The MusicBrainz
support is naturally a big boost, but better technical documentation and
clarification of the fuzzy legal questions may be required before
application authors can be expected to pick up the technology in large
numbers. But without doubt, it is poised to fill a visible hole in
open source mobile software. An open solution that works
well with the crowd-sourcing techniques needed to build the fingerprint
database will likely have staying power in a niche with so many similar
proprietary offerings.
(
Log in to post comments)