User: Password:
|
|
Subscribe / Log in / New account

A speech framework and a GUI for automotive systems

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Nathan Willis
July 9, 2014
ALS 2014

At the 2014 Automotive Linux Summit (ALS) in Tokyo, several sessions highlighted new work from the Automotive Grade Linux (AGL) and Tizen IVI projects, including a flexible speech recognition and generation framework and a graphical user interface (GUI) for in-dash head units. In addition, AGL offered teasers of several upcoming new releases and put out a call for application developers interested in open-source automotive software.

Formally speaking, AGL is a working group of the Linux Foundation focused on the task of increasing Linux adoption in vehicles. But as a practical matter, this has meant group members putting resources into developing open source software. Just prior to ALS 2014, AGL announced the release of its reference Linux platform, which is built on top of the in-vehicle infotainment (IVI) version of Tizen.

The AGL release contains several components not found in the contemporaneous Tizen IVI release, but there is clearly a close working relationship between the two projects. Some of the AGL release's additions may not make it upstream into Tizen IVI in the foreseeable future, either because they are contributed by member companies who have not yet shown an interest in Tizen, because there are licensing issues, or because they are evidently intended only as proof-of-concept code with less general appeal. The exact reasons, though, are not always clear.

For example, the AGL release includes support for controlling a MOST-connected audio amplifier. MOST (Media Oriented Systems Transport) is an automotive industry standard data bus that runs over fiber-optic cable; it provides a number of benefits compared to other vehicle buses (such as high throughput and resistance to electrical noise), but the standard is proprietary and there is reportedly no interest from MOST's governing organization to the idea of opening up the specification or loosening its licensing restrictions. There is, therefore, little chance that general-purpose MOST support will come to Tizen IVI, but AGL has an interest in demonstrating that MOST integration is possible.

The Modello user interface

[Geoffrey van Cutsem at ALS]

On the other hand, Intel's Geoffrey van Cutsem gave a talk about Tizen IVI's new GUI project, Modello, which actually started off as an AGL add-on project but is now developed within Tizen IVI. Modello is a suite of free-software HTML5 applications that cover basic GUI functionality. There is a "home" screen, a dashboard that shows vehicle statistics and sensor readings, a media player, a heater/air-conditioner controller, a phone-tethering application for hands-free usage, and a navigation tool that connects to Google Maps.

The Modello system is completely modular, Van Cutsem said; the home screen launcher can launch any application, not just those already mentioned. But the official Modello applications are all designed to look the same; they pick up the same UI elements from a central theme. As of right now, there are just two themes to choose from—and they differ only in color—but the theming engine is a flexible one. Someone could create a "nighttime" theme, he said, and have it activated automatically when the car's light sensors indicate that it is getting dark.

The Modello applications are also designed to run on 720p portrait-orientation screens, which are not the norm in today's vehicles. Van Cutsem explained the rationale: Modello is targeting the IVI systems of the future, when larger screens are expected to be commonplace. Most center consoles are "portrait-shaped," he said, and if the screen replaces many of the physical controls in use today (including climate-control knobs), users are likely to expect the biggest screen that will fit. The Tesla Model S, he said, is a good example: it sports a 17-inch portrait display.

The Modello project has also been filling in some miscellaneous missing pieces in Tizen IVI; it implements a GUI system settings utility, which has been prominently missing from prior Tizen releases. Perhaps most importantly, it allows GUI configuration of Bluetooth and WiFi networking, which, up until now, had only been configurable with command-line tools.

There is still more to come, he said. The navigation application is still quite rough; as of today it only supports pre-set destinations. Although Van Cutsem did not discuss it, navigation is in a state of flux in both Tizen and AGL at present. Tizen IVI dropped the navigation application Navit from its builds in 2013. The word around the project is that either Navit or some other free-software routing application will return in due course; the Google Maps tool may not last due it its reliance on a single, proprietary data provider.

Also still to come in Modello is a port from Tizen IVI's older web runtime to the newer Crosswalk, support for localization, and integration with the Wayland-based Layer Manager. A new release is expected within the week.

[Matt Jones at ALS]

Van Cutsem also noted that the Modello project would be working to add support for "twenty plus" new applications written by AGL. Jaguar Land Rover's Matt Jones provided a preview of that application collection in part of his ALS keynote talk. The new additions being developed include Modello-compatible versions of older software, such as the SmartDeviceLink mobile device tethering and screen-sharing tool. But they also include several entirely new applications, such as fingerprint recognition and voiceprint recognition utilities, a weather application, and a news carousel.

Jones pointed out that Jaguar Land Rover was interested in funding open-source projects like these AGL reference applications, and told anyone interested in contributing to get in touch. The company has found working with open-source developers to be in its best interests, he said. The average time from concept to deployment in a car is 39 months, but the average software startup only has a lifespan of 18 months. So pairing with startups is not a strategic option.

In contrast, he said, for every dollar that the company puts into Tizen and open source, it estimates that it generates at least 20. He now hopes that the company can start working on more interesting new applications, such as the biometric systems mentioned above. "I hope we're done with implementing Bluetooth profiles and FM radio, and can start doing the unique stuff."

At last: the talking car

Intel's Jaska Uimonen provided a look at one of those possible new developments in his presentation on Winthorpe, an open-source framework for adding speech support to Tizen IVI applications. Winthorpe supports both speech recognition for input and speech synthesis for output, and it provides both as a system-wide service.

This design is distinct from most of the other speech recognition systems on the market, he said. The others tend to either be a standalone, "assistant" application like Apple's Siri, or else each individual app (search, navigation, etc.) is its own "silo"—linked internally to a third-party provider's speech recognition module.

The assistant model can be linked to other apps (such as voice dialing and web searching), but adding new features to these apps requires making changes to the assistant. The close linking approach may also mean multiple apps have speech support, but it has serious drawbacks: the apps are not aware of each other, so they cannot cooperate, and their fate depends entirely on the continued support of the third-party speech engine supplier. In addition, he said, most of the popular speech recognition services (including Google's and Apple's) rely on an active network connection to a remote cloud service.

[Jaska Uimonen at ALS]

Winthorpe attempts to improve on these shortcomings. It provides a platform-level API service with multiple back-ends, so that speech-enabling an application is a one-time process—you do not need to rewrite your code to start using a different speech engine. The API also lets applications stay simple, offloading the speech processing to the service.

The process of speech-enabling an application is straightforward, he said. The program registers itself with the Winthorpe process and declares a set of commands that it wants to listen for. Winthorpe listens for speech input, then notifies its registered client if it recognizes a command—delivering the notification event and, if requested, passing the speech input buffer to the application.

For deciding which registered application gets "voice focus" for a recognized command when there are multiple options, Winthorpe delegates the decision to Tizen IVI's Murphy policy manager—though how Murphy makes that decision is up to the system implementor. Winthorpe is context-aware, he said. When the user makes or answers a phone call, all audio is sent to the phone application and speech recognition is switched off.

The Winthorpe architecture is modular; there can be multiple speech-recognition plugins installed, and there are plugins for disambiguation and for speech synthesis. Currently the plugins include only one open-source recognition engine, Carnegie Mellon University's Pocketsphinx. There are two open-source speech synthesis plugins, one based on Emacspeak and one based on Festival. The Winthorpe team has written demo extensions for media players and for simple web searching.

In addition to registering for callbacks to specific commands, applications can also make use of some special Winthorpe tokens, Uimonen said. One is the wildcard operator * for free-form input. An application can use it to have Winthorpe send the raw audio input rather than having Winthorpe process it as speech. This might be useful for recording notes or calls. Another is "dictionary switch" command, which tells Winthorpe to match speech input against a special dictionary rather than the general-purpose one. This can dramatically improve recognition quality, he said. For instance, if one knows that the speech input will be numeric, switching to a "digits" dictionary will reduce the error rate.

Speech output is considerably simpler than speech recognition, Uimonen said. Winthorpe supports selecting from among multiple installed "voices," multiple languages, and includes commands to adjust the voice's rate and pitch.

One of the weaknesses of the system is how few open-source speech projects there are, he said. Pocketsphinx is currently the only open-source recognition engine because there are few others available, although he said the project is working with the Julius engine that is designed for Japanese. Between the two synthesis engines, Festival is noticeably weaker than Emacspeak. He added that most existing IVI systems use a proprietary speech recognition back-end.

Future work for the project includes Julius integration, improvements to the Murphy integration, the ability to reconfigure the speech-decoding pipeline on the fly, and tools to better pronounce unrecognized words.

Together, the AGL and Tizen IVI projects appear to be making progress on multiple fronts. While some of the work (such as Winthorpe) is of interest primarily to developers, the details of the project indicate that the team is trying to improve on the status quo available in other IVI systems. And other new pieces, such as Modello, indicate that polished, end-user code is within reach for the first time, which is good news for those who are interested in seeing an open-source IVI platform reach the market.

[The author would like to thank The Linux Foundation for travel assistance to attend ALS 2014.]


(Log in to post comments)

A speech framework and a GUI for automotive systems

Posted Jul 10, 2014 18:41 UTC (Thu) by jospoortvliet (subscriber, #33164) [Link]

Does anyone have a clue as to how this relates to Simon (the speech recognition project)? Is it lower level or doing the same?

A speech framework and a GUI for automotive systems

Posted Jul 11, 2014 22:12 UTC (Fri) by Jandar (subscriber, #85683) [Link]

Simon can use Sphinx as a backend. How Pocketsphinx relates to Sphinx I don't know.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds