The eSpeak Speech Synthesizer
[Posted July 6, 2006 by cook]
Your author has been interested in computer speech synthesis since
the late 1970s, when he interfaced a
Votrax SC-01A
speech synthesizer chip to his
Imsai 8080 computer with some wire-wrap wire.
News of the recently created
eSpeak project
naturally piqued his long-time interest in speech synthesis.
eSpeak is a compact
phoneme-based
speech synthesis system that is available under version 2 of the
Gnu General Public license.
eSpeak is a software speech synthesizer for English, and potentially other languages.
eSpeak produces good quality English speech. It uses a different synthesis method from other open source TTS engines, and sounds quite different. It's perhaps not as natural or "smooth", but I find the articulation clearer and easier to listen to for long periods.
eSpeak is a much simpler system than
Festival,
a popular speech synthesis project from the University of Edinburgh's
Centre for Speech Technology Research. Unfortunately, the Festival
project has been
stuck at version 1.95 (2.0 beta) for the last two years.
The
installation and usage document explains how to set up the software. Installation is trivial, if somewhat different
than for most applications. It involves copying the binary
speak file to an executable directory and moving a
library directory to /usr/share.
The combined executable and library files weigh in at under 500 Kb,
making it suitable for use in embedded systems.
Source code for eSpeak
is available for those who wish to compile the software locally.
Using the software is trivial, typing "speak 'what you want to say'"
causes the desired speech to be rendered and output to the speaker.
Speaking the contents of a file can be done with the command:
speak -f filename. eSpeak can also read its input from stdin,
allowing it to be used with other applications.
There are currently nineteen
English phoneme sets available which provide a variety of
British accents, male/female voices and tonal characteristics.
German and Esperanto phoneme sets are also available.
Other languages can also be supported, but the work has not yet been done.
eSpeak can output directly to the sound driver, it can also create
.wav files, and send the audio to stdout. The -x option causes the
program to output phoneme mnemonics to the screen.
The speech quality is quite mechanical, but is fairly easy to understand.
It is not as refined as the output of Festival, but should suffice for
many applications. As with most speech synthesis applications,
mispronunciation is fairly common, English pronunciation rules
involve many special exceptions and ambiguities, accurate text to
speech conversion is a non-trivial software task.
The most recent release of eSpeak is version 1.10,
released on April 29, 2006. The
change log file indicates recent work on UTF-8 encoding, support for
embedded pitch and amplitude modulation, improvements to numerical
pronunciations, several new command line capabilities and more.
If you need a decent open-source speech synthesis application for
your latest project, or simply want to play with some interesting
software, give eSpeak a try.
(
Log in to post comments)