|
|
Log in / Subscribe / Register

Ken Starks and the text-to-speech dilemma

By Nathan Willis
September 2, 2015

TXLF

Ken Starks is best known in free-software circles for working with a charity that repurposes computers for low-income schoolchildren. But, in recent years, the loss of his voice has turned him into a campaigner for a different cause: improving the quality of text-to-speech (TTS) software. At Texas Linux Fest (TXLF) 2015, Starks came down hard about the awfulness of today's free-software speech-synthesis options, and addressed the difficult road ahead for users who need text-to-speech functionality.

Starks is the founder of the Austin-area charity Reglue, which rebuilds donated computers, installs GNU/Linux on them, and gives them to local children in underprivileged families. In March, Reglue was awarded the Free Software Foundation's Award for Projects of Social Benefit.

In 2012, he was diagnosed with throat cancer, and surgery in January of 2015 left him unable to talk. But, as Starks—through a recording played from his laptop—told the TXLF crowd, he knew well ahead of the scheduled surgery that he would come out on the other side needing some sort of software-based assistance to speak. So before the surgery, he set out to find the best free-software solution.

[Ken Starks at TXLF]

The results, however, were a severe disappointment—as his talk title, "How much do text-to-speech in Linux sucketh? Leteth us counteth the ways" communicated. That title and the session itself were peppered with a heavy dose of Starks's trademark sardonic humor, but he told the crowd that the shortcomings of free-software TTS were quite serious.

For starters, the voice quality available in free-software TTS systems is lacking. The voices typically cannot vary their pitch and tempo, which makes them sound monotonous and arrhythmic to listeners. In most cases, the voices also produce a robotic tone that does not approach the natural timbre of a human speaker. At that point, Starks said, "you might ask why this text-to-speech app I'm using here sounds just fine." But in fact, he explained, the audio that the audience was listening to was generated by a proprietary web service that Starks pays $100 a year for a subscription to. His search for a free-software solution had ended only in frustration.

In addition to the shoddy voice quality, he said, most of the free-software TTS programs are incomplete. He does not blame the developers, he said. In most cases, they did everything they could, but published their incomplete work or research in the hope that someone else would continue it. Unfortunately, it leaves the average user without a viable solution. "We're always quick to promote open source as something available to everyone, but it's not pixie dust." That means users face a harsh reality check when they discover that the free TTS systems are not ready for use.

Worse still, he said, in many cases the packages in question are broken or misconfigured. For example, he noted that the Gespeaker package in Ubuntu has a dependency on eSpeak (so that it can use the latter's MBROLA voices), but that the package is configured to look for eSpeak's data in the wrong directory. To the average user, it will look like all of the dependencies are met and Gespeaker fails to run anyway. Starks, of course, is familiar enough with desktop Linux systems to find and fix many such bugs, but the average user would likely find the challenge insurmountable.

The alternative is installing the necessary packages from source, he said, which is also a pain. He noted that installing a recent version of Festival from source required downloading and building eight separate packages, in a particular order and arranged into a specific directory hierarchy. Even at the end of that process, users still had to manually edit Scheme configuration files.

Lest anyone think the problems originated with him rather than with the software, Starks related how he had issued a public challenge in January looking for anyone who could get TTS working well enough for daily use on a desktop Linux box. The responses he received ranged from "I have failed everyone; I lay my sword down at your feet" to recommendations that he give up on a desktop solution and use a mobile app instead.

Suggesting the use of mobile apps or browser plugins is a common response whenever he laments the quality of desktop Linux TTS, Starks said. But that is a cop out, and it comes with limitations for the user—such as requiring an active Internet connection and (in the mobile case) an expensive phone.

[Ken Starks at TXLF]

Starks does use mobile apps some of the time, he said, but it still bothered him that the mobile market had decent offerings that desktop Linux lacked. So he set out to recruit volunteers interested in improving on the status quo. A public appeal in June eventually led to a team of three volunteers, who have started working on a GUI front end for MaryTTS.

In that respect, he said, "I was lucky. Asking developers to donate their personal time is not a solution." The team settled on MaryTTS as the most viable option after examining Festival, eSpeak, and many others. MaryTTS is a Java application, which makes it a controversial pick to some people, but the volunteers decided it was the easiest program to work with.

The front end, which is still in heavy development, is called SpeechLess. If anyone doubts that there is a market for the tool, Starks noted that he had accidentally posted a blog entry pointing to an early release, and the resulting traffic brought down the server.

In the meantime, however, Starks said he is continuing to learn how to adapt to voicelessness. As good as a software solution is, he said, it still doesn't help you if you break your arm downhill skiing and have to call out for help. So he has made compromises, including using the one thing he swore he would never resort to: a "buzz box" electrolarynx device. They may not have changed in the last fifty years, he said, but that is a good reminder that there are precious few working solutions for people with disabilities.

Starks closed out the session by thanking the community. As it turned out, he said, "losing my voice wasn't the end of anything. It was a doorway into understanding what many people face when they lose the ability to verbalize what they want to say. I hope you can help me build a better application for those yet to come."

The TTS landscape does not sit still, of course, and Starks remains an active advocate on the subject even as he helps push the SpeechLess effort forward. He recently weighed in on Intel's open-source release of the TTS system built for Stephen Hawking, for example. The Hawking speech software is an immense codebase, he said, and while developers will find it helpful to study, it does not change much for Linux users, since it is Windows-only.

Index entries for this article
ConferenceTexas Linux Fest/2015


to post comments

Ken Starks and the text-to-speech dilemma

Posted Sep 3, 2015 21:14 UTC (Thu) by riddochc (guest, #43) [Link] (5 responses)

This may not be the best forum, but I haven't found a better one, so here goes.

I have extensive experience with linux and computational linguistics. I want to be working on improving the state of accessibility software for linux, including text-to-speech, speech recognition, alternative I/O mechanisms, and so forth. I want to continue to be able to pay for rent and food while doing so. I don't want to be working on random useless websites any more.

If you know who I should talk to about getting funding to spend my time doing this, please contact me. evqqbpup@tznvy.pbz

Ken Starks and the text-to-speech dilemma

Posted Sep 3, 2015 21:20 UTC (Thu) by riddochc (guest, #43) [Link]

Sorry to reply to myself, I just realized the proliferation of TLDs potentially makes it less than obvious that you should use ROT13 to get my contact information there.

Ken Starks and the text-to-speech dilemma

Posted Sep 4, 2015 10:35 UTC (Fri) by dgm (subscriber, #49227) [Link] (1 responses)

Have you considered a crowdfunding campaign? Try to find a realistic, well-defined goal and ask the Internet if they are interested. You could ask the accessibility people at major projects (Gnome, KDE) and distributions (Debian, Ubuntu and Gentoo seem to have the most organized efforts) for hints on the most pressing needs.

Ken Starks and the text-to-speech dilemma

Posted Sep 7, 2015 10:42 UTC (Mon) by Shugyousha (subscriber, #93672) [Link]

I would definitely be willing to donate to a crowdfounding compaign for a functioning OpenSource solution and I don't think I would be the only one.

It would be important to properly market the campaign. Putting links here and on r/linux would most likely not be enough but it would be a start...

Ken Starks and the text-to-speech dilemma

Posted Sep 7, 2015 18:20 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Joey Hess did a successful Kickstarter for git-annex[1] development and also did a campaign[2] for further development. Now it seems the NSF and DataLad are funding[3] git-annex development. Something with more wide-spread interest such as good TTS could certainly get bootstrapped in a similar way.

[1]https://www.kickstarter.com/projects/joeyh/git-annex-assi...
[2]https://campaign.joeyh.name/
[3]https://git-annex.branchable.com/thanks/

Ken Starks and the text-to-speech dilemma

Posted Sep 8, 2015 2:17 UTC (Tue) by cry_regarder (subscriber, #50545) [Link]

Ken Starks and the text-to-speech dilemma

Posted Sep 4, 2015 7:35 UTC (Fri) by Felix.Braun (guest, #3032) [Link]

I do believe that the open source model is best adapted for certain kinds of problems. Specifically those that are common and have a low barrier to entry to begin making progress with. That increases the likelyhood that among all the people that are affected by that problem, there are some that have the ability and time to begin solving it. These first visible steps usually attract other problem solvers so that a project can start to take off.

MaryTTS-Frontend

Posted Sep 16, 2015 15:13 UTC (Wed) by arne (guest, #67053) [Link]

I showed this post to a colleague and he was quick to web-startify(?) one of his demos. Therefore, you can try out MaryTTS using Java Webstart (or icedTea netx for that matter). It is a simple gui where you just type and it will speak each word you type. Note: This is research software, it may crash :-)


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds