Leading items

Ken Starks and the text-to-speech dilemma

By Nathan Willis
September 2, 2015

Ken Starks is best known in free-software circles for working with a charity that repurposes computers for low-income schoolchildren. But, in recent years, the loss of his voice has turned him into a campaigner for a different cause: improving the quality of text-to-speech (TTS) software. At Texas Linux Fest (TXLF) 2015, Starks came down hard about the awfulness of today's free-software speech-synthesis options, and addressed the difficult road ahead for users who need text-to-speech functionality.

Starks is the founder of the Austin-area charity Reglue, which rebuilds donated computers, installs GNU/Linux on them, and gives them to local children in underprivileged families. In March, Reglue was awarded the Free Software Foundation's Award for Projects of Social Benefit.

In 2012, he was diagnosed with throat cancer, and surgery in January of 2015 left him unable to talk. But, as Starks—through a recording played from his laptop—told the TXLF crowd, he knew well ahead of the scheduled surgery that he would come out on the other side needing some sort of software-based assistance to speak. So before the surgery, he set out to find the best free-software solution.

The results, however, were a severe disappointment—as his talk title, "How much do text-to-speech in Linux sucketh? Leteth us counteth the ways" communicated. That title and the session itself were peppered with a heavy dose of Starks's trademark sardonic humor, but he told the crowd that the shortcomings of free-software TTS were quite serious.

For starters, the voice quality available in free-software TTS systems is lacking. The voices typically cannot vary their pitch and tempo, which makes them sound monotonous and arrhythmic to listeners. In most cases, the voices also produce a robotic tone that does not approach the natural timbre of a human speaker. At that point, Starks said, "you might ask why this text-to-speech app I'm using here sounds just fine." But in fact, he explained, the audio that the audience was listening to was generated by a proprietary web service that Starks pays $100 a year for a subscription to. His search for a free-software solution had ended only in frustration.

In addition to the shoddy voice quality, he said, most of the free-software TTS programs are incomplete. He does not blame the developers, he said. In most cases, they did everything they could, but published their incomplete work or research in the hope that someone else would continue it. Unfortunately, it leaves the average user without a viable solution. "We're always quick to promote open source as something available to everyone, but it's not pixie dust." That means users face a harsh reality check when they discover that the free TTS systems are not ready for use.

Worse still, he said, in many cases the packages in question are broken or misconfigured. For example, he noted that the Gespeaker package in Ubuntu has a dependency on eSpeak (so that it can use the latter's MBROLA voices), but that the package is configured to look for eSpeak's data in the wrong directory. To the average user, it will look like all of the dependencies are met and Gespeaker fails to run anyway. Starks, of course, is familiar enough with desktop Linux systems to find and fix many such bugs, but the average user would likely find the challenge insurmountable.

The alternative is installing the necessary packages from source, he said, which is also a pain. He noted that installing a recent version of Festival from source required downloading and building eight separate packages, in a particular order and arranged into a specific directory hierarchy. Even at the end of that process, users still had to manually edit Scheme configuration files.

Lest anyone think the problems originated with him rather than with the software, Starks related how he had issued a public challenge in January looking for anyone who could get TTS working well enough for daily use on a desktop Linux box. The responses he received ranged from "I have failed everyone; I lay my sword down at your feet" to recommendations that he give up on a desktop solution and use a mobile app instead.

Suggesting the use of mobile apps or browser plugins is a common response whenever he laments the quality of desktop Linux TTS, Starks said. But that is a cop out, and it comes with limitations for the user—such as requiring an active Internet connection and (in the mobile case) an expensive phone.

Starks does use mobile apps some of the time, he said, but it still bothered him that the mobile market had decent offerings that desktop Linux lacked. So he set out to recruit volunteers interested in improving on the status quo. A public appeal in June eventually led to a team of three volunteers, who have started working on a GUI front end for MaryTTS.

In that respect, he said, "I was lucky. Asking developers to donate their personal time is not a solution." The team settled on MaryTTS as the most viable option after examining Festival, eSpeak, and many others. MaryTTS is a Java application, which makes it a controversial pick to some people, but the volunteers decided it was the easiest program to work with.

The front end, which is still in heavy development, is called SpeechLess. If anyone doubts that there is a market for the tool, Starks noted that he had accidentally posted a blog entry pointing to an early release, and the resulting traffic brought down the server.

In the meantime, however, Starks said he is continuing to learn how to adapt to voicelessness. As good as a software solution is, he said, it still doesn't help you if you break your arm downhill skiing and have to call out for help. So he has made compromises, including using the one thing he swore he would never resort to: a "buzz box" electrolarynx device. They may not have changed in the last fifty years, he said, but that is a good reminder that there are precious few working solutions for people with disabilities.

Starks closed out the session by thanking the community. As it turned out, he said, "losing my voice wasn't the end of anything. It was a doorway into understanding what many people face when they lose the ability to verbalize what they want to say. I hope you can help me build a better application for those yet to come."

The TTS landscape does not sit still, of course, and Starks remains an active advocate on the subject even as he helps push the SpeechLess effort forward. He recently weighed in on Intel's open-source release of the TTS system built for Stephen Hawking, for example. The Hawking speech software is an immense codebase, he said, and while developers will find it helpful to study, it does not change much for Linux users, since it is Windows-only.

Comments (8 posted)

Using Python to investigate EFI and ACPI

By Jake Edge
September 2, 2015

LinuxCon North America

In a talk that could easily be seen as a follow-on to his PyCon 2015 talk, Josh Triplett presented at LinuxCon North America on using Python to explore the low-level firmware of today's systems. The BIOS Implementation Test Suite (BITS) provides an environment that hearkens back to the days of BASIC, PEEK, and POKE, as he demonstrated at PyCon in Montréal in April, but it is much more than that. In Seattle at LinuxCon, he showed that it can also be used to look at and use the Extensible Firmware Interface (EFI) and Advanced Configuration and Power Interface (ACPI) code in a system—all from Python.

Triplett started his talk with a bit of nostalgia: pictures of various home computers from the 1980s (e.g. Commodore 64, TRS-80, Apple II) in his slides [PDF]. He polled the room to see which were the first computers used by those in the room before showing a picture of the first IBM PC, which was his first computer. There was a common element to all of those early home computers, he said: they provided access to the low-level hardware of the system. These days, we have lost a lot of that access because the operating system mediates access to the hardware.

The IBM PC ran DOS, which accessed the hardware through the Basic Input/Output System (BIOS). There were fixed data tables and addresses in the BIOS for accessing the hardware. Various system services (e.g. disk, display, serial ports) were available via interrupts. If the BIOS did not know about the hardware, the system couldn't talk to it.

EFI and ACPI came along "to solve every problem BIOS ever had and quite a few it didn't", Triplett said. The key concept behind both is "extensibility", but they also have a reputation for being "subtle, complicated, and quick to anger", he said. The operating system and bootloader both use the facilities provided by EFI and ACPI, but it is mostly done from C code.

BITS came about because of a need to access BIOS, EFI, and ACPI without writing any C code. Triplett (and his father, Burt Triplett, both of whom work for Intel) ported Python to run in the GRUB bootloader, which allowed using the language to poke at the low-level firmware. As with his PyCon presentation, his slides were displayed and his demos were run from within the BITS environment in a virtual machine (VM) on his laptop.

He uses KVM with Open Virtual Machine Firmware (OVMF), which provides Unified EFI (UEFI, the successor to EFI) from Tiano for use in a VM. It is much safer to play with EFI and ACPI in a VM, he said, so that if things "blow up", they won't also take your system with it.

BITS has a full Python interpreter that runs in ring 0 on x86 systems. That gives it the same privilege level that an operating system running on x86 has. In addition, many of the Python standard library modules are available in BITS, along with a few modules that provide useful types and functions for BITS, EFI, ACPI, and so on.

Triplett then demonstrated using the BITS Python to access memory from a specific address in the firmware, which showed the path where he had built OVMF:

    >>> import bits
    >>> from ctypes import *

    >>> mem = (c_char * 128).from_address(0xf1390)
    >>> print bits.dumpmem(mem)
    00000000: 2f 68 6f 6d 65 2f 6a 6f 73 68 ... /home/josh...
    ...

The ctypes module provides access to C data types (like c_char) from Python. The code creates an array of 128 bytes from the address specified (found using strings on the binary), which can then be manipulated by the Python code—and dumped to the screen using a utility function from the bits module.

He then moved on to look at ACPI:

    >>> import acpi
    >>> acpi.get_table_list()
    ['APIC', 'DSDT', 'FACP', 'FACS', 'HPET', 'RSDP', 'RSDT', ...
    >>> print bits.dumpmem(acpi.get_table('RSDP'))
    00000000: 52 53 ...			    RSD.PTR..BOCHS..
    ...

RSDP is the ACPI Root System Description Pointer and the "RSD PTR" string—represented as "RSD.PTR" in the memory dump—is how BIOS finds that data structure in the firmware. RSDP contains information about the ACPI version and the original equipment manufacturer (OEM) that provided it (for QEMU, its ACPI descends from the Bochs emulator, thus "BOCHS" as the OEM ID). The RSDP also has a pointer to the Root System Description Table (RSDT), which points to the rest of the system description tables, including those that describe hardware and other features that are specific to that particular system. The acpi.parse_table() function can be used to examine these tables, as Triplett demonstrated.

In the classic-BIOS world, serial ports can be found by consulting a table at a fixed address but, in a modern system, hardware is found differently. Resources are discovered using identifiers such as "COM1" (for the first serial port). Those identifiers lead to various descriptors in the ACPI tables that specify everything needed to talk to the device (address, interrupts, etc.). In today's systems, this is how all of the built-in hardware is discovered. By using acpi.display_resources() and acpi.parse_descriptor(), Triplett was able to show some of the "guts" of the COM1 "current resource settings" (_CRS), including its 0x3f8 address—which he then used to output a string from a loop in Python using bits.outb().

Both ACPI and UEFI are huge specifications, each with thousands of pages of documentation. ACPI is largely concerned with how to find the hardware in the system, while UEFI is the way to get modern "BIOS" services. Triplett proceeded to demonstrate accessing UEFI using the "efi" module that comes with BITS.

He started by printing out the EFI system table data structure, which is accessed in BITS with efi.system_table. That table provides a bunch of information about the system and its EFI firmware, including things like pointers to the standard input and output as well as to the boot and runtime services provided by EFI. The system table is what gets passed to an EFI program (e.g. a bootloader) as one of its arguments.

In EFI, objects are identified using globally unique identifiers (GUIDs). Triplett showed how to retrieve and use those GUIDs to access various pieces of information. For example, the ACPI table GUID can be used to retrieve a pointer to the ACPI RSDP, which is how that data structure is found on modern systems (rather than searching for the magic "RSD PTR" string as BIOS does).

Triplett then demonstrated how to clear the screen using the ClearScreen() function in the "text output protocol" that is attached to the standard output channel (i.e. ConOut) in the system table. He also showed using the "file system protocol" to access files in the EFI filesystem.

That was all a prelude to his final act, however. He noted that he had done graphics before in talks about BITS (e.g. the Mandelbrot set in his PyCon presentation), so that was now "boring". This time, he used the "input text protocol" along with graphics to build an simple, interactive Tron-inspired game. It drew a line that would change direction based on arrow-key input and would halt when the line intersected itself. Triplett clearly knows how to finish up a conference talk as, once again, there was nice round of applause to acknowledge his amusing hackery—and the talk as a whole.

[I would like to thank the Linux Foundation for travel assistance to Seattle for LinuxCon North America.]

Comments (16 posted)

The next weekly edition will be on September 11

The US Labor Day holiday falls on Monday, September 7. As we have for the past few years, we will be taking that day off, which will delay next week's edition by one day. The holiday is usually something of an "end of summer" celebration—or lament—at least in the northern hemisphere. Whether you celebrate the holiday or not, best wishes from all of us here at LWN!

Comments (6 posted)

Page editor: Jonathan Corbet
Next page: Security>>