Ken Starks and the text-to-speech dilemma
Ken Starks is best known in free-software circles for working with a charity that repurposes computers for low-income schoolchildren. But, in recent years, the loss of his voice has turned him into a campaigner for a different cause: improving the quality of text-to-speech (TTS) software. At Texas Linux Fest (TXLF) 2015, Starks came down hard about the awfulness of today's free-software speech-synthesis options, and addressed the difficult road ahead for users who need text-to-speech functionality.
Starks is the founder of the Austin-area charity Reglue, which rebuilds donated computers, installs GNU/Linux on them, and gives them to local children in underprivileged families. In March, Reglue was awarded the Free Software Foundation's Award for Projects of Social Benefit.
In 2012, he was diagnosed with throat cancer, and surgery in January of 2015 left him unable to talk. But, as Starks—through a recording played from his laptop—told the TXLF crowd, he knew well ahead of the scheduled surgery that he would come out on the other side needing some sort of software-based assistance to speak. So before the surgery, he set out to find the best free-software solution.
The results, however, were a severe disappointment—as his talk title, "How much do text-to-speech in Linux sucketh? Leteth us counteth the ways" communicated. That title and the session itself were peppered with a heavy dose of Starks's trademark sardonic humor, but he told the crowd that the shortcomings of free-software TTS were quite serious.
For starters, the voice quality available in free-software TTS systems is lacking. The voices typically cannot vary their pitch and tempo, which makes them sound monotonous and arrhythmic to listeners. In most cases, the voices also produce a robotic tone that does not approach the natural timbre of a human speaker. At that point, Starks said, "you might ask why this text-to-speech app I'm using here sounds just fine." But in fact, he explained, the audio that the audience was listening to was generated by a proprietary web service that Starks pays $100 a year for a subscription to. His search for a free-software solution had ended only in frustration.
In addition to the shoddy voice quality, he said, most of the free-software TTS programs are incomplete. He does not blame the developers, he said. In most cases, they did everything they could, but published their incomplete work or research in the hope that someone else would continue it. Unfortunately, it leaves the average user without a viable solution. "We're always quick to promote open source as something available to everyone, but it's not pixie dust." That means users face a harsh reality check when they discover that the free TTS systems are not ready for use.
Worse still, he said, in many cases the packages in question are broken or misconfigured. For example, he noted that the Gespeaker package in Ubuntu has a dependency on eSpeak (so that it can use the latter's MBROLA voices), but that the package is configured to look for eSpeak's data in the wrong directory. To the average user, it will look like all of the dependencies are met and Gespeaker fails to run anyway. Starks, of course, is familiar enough with desktop Linux systems to find and fix many such bugs, but the average user would likely find the challenge insurmountable.
The alternative is installing the necessary packages from source, he said, which is also a pain. He noted that installing a recent version of Festival from source required downloading and building eight separate packages, in a particular order and arranged into a specific directory hierarchy. Even at the end of that process, users still had to manually edit Scheme configuration files.
Lest anyone think the problems originated with him rather than with the software, Starks related how he had issued a public challenge in January looking for anyone who could get TTS working well enough for daily use on a desktop Linux box. The responses he received ranged from "I have failed everyone; I lay my sword down at your feet" to recommendations that he give up on a desktop solution and use a mobile app instead.
Suggesting the use of mobile apps or browser plugins is a common response whenever he laments the quality of desktop Linux TTS, Starks said. But that is a cop out, and it comes with limitations for the user—such as requiring an active Internet connection and (in the mobile case) an expensive phone.
Starks does use mobile apps some of the time, he said, but it still bothered him that the mobile market had decent offerings that desktop Linux lacked. So he set out to recruit volunteers interested in improving on the status quo. A public appeal in June eventually led to a team of three volunteers, who have started working on a GUI front end for MaryTTS.
In that respect, he said, "I was lucky. Asking developers to donate their personal time is not a solution." The team settled on MaryTTS as the most viable option after examining Festival, eSpeak, and many others. MaryTTS is a Java application, which makes it a controversial pick to some people, but the volunteers decided it was the easiest program to work with.
The front end, which is still in heavy development, is called SpeechLess. If anyone doubts that there is a market for the tool, Starks noted that he had accidentally posted a blog entry pointing to an early release, and the resulting traffic brought down the server.
In the meantime, however, Starks said he is continuing to learn how to adapt to voicelessness. As good as a software solution is, he said, it still doesn't help you if you break your arm downhill skiing and have to call out for help. So he has made compromises, including using the one thing he swore he would never resort to: a "buzz box" electrolarynx device. They may not have changed in the last fifty years, he said, but that is a good reminder that there are precious few working solutions for people with disabilities.
Starks closed out the session by thanking the community. As it turned out, he said, "losing my voice wasn't the end of anything. It was a doorway into understanding what many people face when they lose the ability to verbalize what they want to say. I hope you can help me build a better application for those yet to come."
The TTS landscape does not sit still, of course, and Starks remains
an active advocate on the subject even as he helps push the SpeechLess
effort forward. He recently weighed
in on Intel's open-source release of the TTS system built for Stephen Hawking,
for example. The Hawking speech software is an immense codebase, he
said, and while developers will find it helpful to study, it does not
change much for Linux users, since it is Windows-only.
| Index entries for this article | |
|---|---|
| Conference | Texas Linux Fest/2015 |
