The open source speech recognition project Simon unveiled version 0.4.0 on December 30, 2012, after two years of development. The new release boasts some significant architectural changes, so the project advises users not to replace existing versions on production systems. But the changes make Simon noticeably easier to work with, which will please new users. Conversing freely with one's Linux PC is still a ways off, but speech recognition with free software is no longer the exclusive domain of laboratory research.
"Speech recognition" can encompass a range of different projects, such as dictation (e.g., transcribing audio content) or detecting stress in a human voice. Simon is designed to function as a voice interface to the desktop computer; it listens to live audio input, picks out keywords intended as commands, and pipes them to other applications.
Beginning with the 0.3.0 series released in 2010, Simon has based its command-recognition framework on the idea of separate "scenarios" for each application or use case. Scenarios can be as specific as the developer wishes to make them; a general web-browsing scenario for Firefox may be designed to handle only opening links and scrolling through pages, but another could be tailored to work with GMail functionality and keyboard shortcuts. Simon 0.4.0 builds on this approach by adding context awareness: it will activate and deactivate different scenarios depending on which applications the user has open and which have focus. The scenarios still need to be manually installed beforehand, though, so there is little risk Simon will start erasing your hard drives if you happen to walk by and utter the word "partition."
Simon can use any of several back-ends to perform the speech-recognition part of the puzzle. Earlier releases relied on either the BSD-licensed Julius or the better — but non-free licensed — Hidden Markov Model Toolkit (HTK). Version 0.4.0 adds support for another free software recognition toolkit, CMU Sphinx.
The Sphinx engine is highly regarded for its quality, and provides functions that Julius does not, such as the ability to create one's own acoustic speech model. An acoustic model is the statistical representation of the sounds that correspond to the parts of speech that the engine is trying to recognize; it depends on both a "corpus" of audio samples of the speaker or speakers and on a grammar model for the language being spoken. Free sources for acoustic speech models have historically been hard to come by, because most were created by proprietary projects or had no clear licensing at all.
Luckily this situation is changing; the Voxforge project collects GPL-licensed speech models and enables users to create and upload their own. Like a lot of less-well-known free data projects, it could always use more contributions, but it is possible to download decent base models for a variety of languages. Simon 0.4.0 introduces a new internal format for its speech base models, but it is Voxforge compatible, and the English Voxforge model is included in the download. Simon 0.4.0 also includes tools allowing users to create and upload their own speech models to Voxforge.
Despite being voice controlled, Simon comes with a graphical front-end for setting up the framework, managing scenarios, and working with speech models. The front-end is KDE-based, and building Simon pulls in a lot of KDE package dependencies. Packages for 0.4.0 have yet to appear, but compiling from source is straightforward. It is important to have CMU Sphinx installed beforehand in order to build a completely free Simon framework, though. Simon's modularity means the build script will simply compile Simon without Sphinx support if the engine is not found.
At first run, the Simon setup window will walk users through the process of installing speech models and scenarios, as well as testing microphone input settings and related details. Speech models and scenarios are tracked using the Get Hot New Stuff (GHNS) system, so the available options can be searched through and installed directly within Simon itself. The scenarios currently available include general desktop utilities like window management and cursor control, applications like Firefox, Marble, and Amarok, and a smattering of individual tasks like taking a screenshot. Installing them is easy, and Simon's interface allows each to be activated or deactivated with a single click.
Arguably the biggest hurdle is finding the model one wants; they are language-dependent and only English, Dutch, and German scenarios appear to be published, plus there are frequently several options for each application with essentially the same description. Some descriptions are detailed enough to indicate that they were built with a specific acoustic model (Voxforge or HTK), but some are clearly old enough that they may have compatibility problems (such as the OpenOffice.org scenarios that come from the Simon 0.3.0 days). Some, like the Firefox scenario, also require installing other software (e.g., a Firefox add-on).
The main Simon window shows which scenarios are active and which acoustic speech models are loaded, and it displays the microphone volume level and the most recently recognized spoken words. The latter two items are useful for debugging. By default, the setup wizard steers the user toward a generic Voxforge speech model, but to really get good results the user needs to devote some time to training Simon. Most of the scenarios come with a bundled "training text" for this purpose: a list of words that the scenario is listening for. At any time, the user can click on Simon's "Start training" button and record new samples of the important words. These recordings are ingested by the speech recognition engine and added to a user-specific speech model. Simon layers this user-specific model over the base model, hopefully improving the results.
The training interface is painless and provides a lot of hand-holding for new users. This is good news, since it is clear that at least a few training sessions are to be expected before Simon 0.4.0 is usable for daily tasks — even for those of us with perfect elocution. There are simply a lot of variables in human speech, and even more when one throws in the vagaries of cheap PC sound cards and microphones. The trainer prompts the user to speak each of the keywords, reports instantly whether the speaker's voice is too loud or too soft to be useful, and does the rest of the computation in the background.
The nicest thing about Simon 0.4.0, though, is that it moves speech control out of the "theoretical only" realm, where experienced researchers and laboratory conditions are required, and at least makes it possible for everyday users to get started. There is still a long way to go before speech control can offer a constant user interface option as it is depicted in Star Trek or (perhaps more troublingly) in 2001. But the scenario-specific set of commands makes Simon more usable than other open source speech recognition tools, and Simon's built-in training interface makes the necessary grunt work (no pun intended) of tailoring the speech model to one's actual voice about as painless as it can be.
The research into speech recognition will continue, of course. But Simon's new-found modularity will make it easier to incorporate theoretical advances into the desktop application without rewriting from scratch. For users, the next important stage is some development work on new scenarios to hook more applications into Simon. The trickiest part of the stack, though, is likely to remain training the speech recognition engine to recognize the specific user's voice. But no amount of software will eliminate that; just a good microphone and some patience.
In some circles, installing custom or aftermarket firmware like CyanogenMod on a $200 phone is enough to garner street cred, while in others, such minor trifles are fit only to be scoffed at. For those who do not flinch at danger, there is Magic Lantern, a GPL-licensed replacement firmware for high-end Canon digital SLR cameras. The current release is version 2.3, which offers a wealth of improvements for shooting video, plus a growing list of enhancements for still photographers.
Magic Lantern regularly makes releases for a fixed list of Canon models, at the moment including most of the models from the EOS 600D and up. The supported list focuses on cameras using Canon's DIGIC 4 chip and newer models. Recent DIGIC chips include an embedded ARM core which makes writing custom software possible, and the cameras can load and run firmware from an inserted memory card without overwriting the existing firmware. Consequently, projects like Magic Lantern and CHDK (which targets point-and-shoot models) can provide firmware that adds new functionality with minimal risk of bricking the camera — or of voiding the warranty and losing out on Canon's much-loved hardware service offerings. There is still risk involved, however, particularly for new camera models.
Magic Lantern was initially focused on improving video recording functionality. The first model supported by the project was the EOS 5D Mark II, a camera which started a minor revolution by allowing high-quality HD recording in a compact form. But for some budding filmmakers, the stock firmware simply left out too much. Magic Lantern added usability features like crop marks in the preview window, more precise control over ISO speed, white balance, and shutter speed, and a number of miscellaneous add-ons like on-screen sound meters for the audio input.
The current development work is focused on the EOS 5D Mark III, for which the third alpha release was unveiled on January 6. Installation requires unpacking the build onto a supported Compact Flash or SD card, making the card bootable, and loading it into the camera. The download package includes the firmware image plus several folders full of auxiliary files such as the focusing-screen overlays. Normally, the card can be set to automatically boot the camera into Magic Lantern, but this feature has not been enabled in the pre-release builds for the EOS 5D Mark III.
The 5D Mark III release is still incomplete in other areas as well; a good portion of the features enabled for other camera models are still unimplemented for the 5D Mark III. The issue is that some Magic Lantern features (for example, changes to live preview and information display) can work without touching any of the camera's persistent settings, but others require altering properties saved in onboard memory. The team has simply encountered too many unsolved problems with accessing and setting the 5D Mark III's stored settings. Developer a1ex reported that the stability test froze the camera and required a cold reboot and clearing all of the camera settings to restore functionality. For a piece of hardware with a four digit price tag, some caution is understandable.
Still, there is a long list of features which are enabled in the 5D Mark III builds of 2.3. As is to be expected in light of the project's emphasis on digital film-making, most are related to video, but not all of them are so esoteric that a semester of cinematography class is required. The gradual exposure function, for example, allows the user to switch from one exposure setting to another while still filming; Magic Lantern will smoothly transition through the intermediate shutter and ISO speed settings, so that the change fades in (so to speak), instead of hitting all at once.
But there are more unusual features, too. The HDR video mode, for example, shoots twice as many frames as normal, alternating the exposure of each: one set to properly expose the highlights, and one set to properly expose the shadows. Combining the results into a single video stream is not easy, though, and needs to be done in post-production software. So far no tool exists for Linux users, although there is a script using the open source VirtualDub and Enfuse applications.
The majority of the Magic Lantern features enabled for the 5D Mark III at the moment are of the display or composition aide variety, though. But this is not to say that they are merely cosmetic; some offer important enhancements. For instance, the "display gain" feature brightens the live preview window so that items in frame are visible even if it is pitch black outside. That allows the user to compose a decent-looking foreground when doing night shooting or astrophotography, which is a nearly impossible task otherwise.
As a still photographer, I am more interested in some of Magic Lantern 2.3's features that are not yet available on the 5D Mark III. To be honest, though, there are so many features these days that nearly every user will find some of them useful given a random subset. That is a testament to the development team's creativity. More important, of course, is that such aftermarket firmware allows the camera owners to do more (and better) creative work. To Canon's credit, the company has not cracked down on magic Lantern or CHDK — in fact the company adeptly steps around the issue of whether using either project is a warranty violation. Those users with camera models supported by stable builds of 2.3 should consider giving Magic Lantern a try — but should do so with open eyes. With a well-tested model, there is relatively little risk of doing damage to one's camera, but there is virtually no recourse should something go horribly wrong. Perhaps the best advice is to say cowboy up, but do your reading first.
Version 12 of the XBMC media-playback application is currently in the final stages of development; release candidate 3 was released on January 3. There are multiple enhancements to the codebase, but one of the biggest stories is that XBMC v12 will officially add support for Android. An Android port naturally makes XBMC available on tablets and handsets, but, just as importantly, it enables running on numerous set-top boxes, "smart TVs," and the increasingly-popular smart TV dongle — device classes currently dominated by proprietary applications produced by entertainment companies.
Binary builds RC3 of XBMC v12 are available for download from xbmc.org. The Android build is an .apk package that is installable on any device on which the user has enabled installation of non-Play Store software. The project site says that XBMC will eventually come to the Play Store, but not during the pre-release phase. The XBMC wiki has an Android hardware page outlining which devices have tested well with which media types — as one might expect, there is a significantly higher hardware threshold required to enable 1080p video playback.
The target platform for the initial Android release is set-top boxes, in particular the Pivos XIOS DS, which is a compact ARM Cortex A-9 device that the team used as the reference development platform. The project offers a few guidelines for assessing the suitability of other devices, including a note that practically speaking, any Android device that does not have the NEON-compatible coprocessor (or does not have it enabled) will probably be unable to play back HD video. Nevertheless, there are unsupported NEON-free builds linked to from the Android hardware wiki page. The final caveat is that thus far the porting effort has not addressed power consumption, so users of battery powered mobile devices may find XBMC to be quite draining — although the project assures users that this, too, will be addressed in the future. Wall-powered set-top boxes, of course, may not find high power consumption as problematic.
I tested the new release on a Nook Tablet running CyanogenMod 7 (CM7), and the battery-draining issue is indeed no joke. The device boasts a 4000 mAh battery, which XBMC managed to drain completely in a little over 3 hours, even though video playback only accounted for a small portion of the time. Granted, CM7 is an unofficial port for this particular device and comes with its own share of power consumption problems. Still, it is clear that there is considerable room for improvement. Nevertheless, even on year-old hardware and a less-than-up-to-date version of Android, XBMC runs remarkably well.
Feature-wise, the good news is that the Android port is nothing short of the full XBMC experience — this is not a "light" or "mobile" version of the software. All of the media formats, network protocols, and add-ons supported in desktop XBMC are available in the Android edition. NFS access was missing from some of the early betas of XBMC v12, but as of now, there are no major gaps in player functionality. Video playback from standard-definition web sources was smooth, and a significantly better experience than accessing the same sites through either the stock Android browser or Firefox. Audio playback rarely stress-tests modern devices, so it gets less attention in reviews, but all of the audio add-ons tested worked like a charm as well.
There are, however, still hiccups to be encountered in individual plug-ins. To some degree this is unavoidable; a huge subset of the video playback add-ons, for example, are "screen scraper"-style hacks to retrieve content from specific Web-based video services, such as the many cable and broadcast TV channels that offer a subset of their programming online. The authors of these add-ons must rewrite their page parsing code every time the target site alters its layout, but one of XBMC's strengths is that add-ons are installable from within the XBMC interface, and updates to restore service can be pushed out quickly.
But reliance on third-party add-on developers has its downside; there are other add-ons available for desktop Linux XBMC that do not seem to work for the Android build, such as the D-Bus based notifications, some of which may never work because of platform limitations. Still others offer functionality that depends on external factors, such as the MythBox add-on, which allows XBMC to play back content from a MythTV back-end. But the add-on only supports MythTV 0.24, which is two releases out-of-date.
A far more significant problem with XBMC v12 on Android is navigating the user interface. XBMC has long had navigation "trap doors;" spots where it is possible to navigate into a menu or tool, but it is either impossible to navigate back out, or it is only possible to navigate back out through different means (for example, menus where the left-arrow key allows you to enter a screen, but the screen can only be exited by hitting Escape). These trapdoors are usability warts under the best of circumstances, but on an Android device they can literally leave the user stranded if the device does not have a hardware keyboard. Android phones might have a keyboard; tablets will not. Some set-top boxes come with wireless keyboards, although they are largely looked down on, and there is always the possibility of pairing Bluetooth keyboards. But users seem to loathe putting down the directional remote with its single-thumb driveability.
Trapdoors are not the only interface difficulty, however. Many of XBMC's screens and onscreen controls assume the presence of either a traditional pointer or a touchscreen. Jumping directly to a specific point in the timeline of a song or video, for instance, requires a pointing device to be at least marginally accurate. There may not be a one-size-fits-all solution, considering the variety of content types XBMC plays (and the variety of caching/streaming challenges that accompany them), but some more work will probably be required to optimize for the Android set-top box, which is often touch-free (and may be pointer-free as well).
But the bigger question that XBMC needs to answer for potential Android users is how it offers an improvement over getting at the same content through other applications. Quite simply, the answer it gives is "it depends" — entirely on the type of content. Consuming Internet-delivered video and audio is significantly better through XBMC than it is through a browser. The difference is not quite as stark when compared to a dedicated Android application for a particular service (such as Grooveshark). And XBMC is far less compelling for content that requires more manual searching and browsing.
Take podcasts, for example. XBMC supports managing podcasts, but its interface for subscribing and listening to them is no better than any other on the market. In fact, when coupled with the difficulties of using the UI without a keyboard, it may actually be slightly worse. The same is true for watching or listening to files from local storage — there is no compelling advantage to using XBMC for this task over the stock Android tools, and in some places the interface makes the task more difficult.
As a result, XBMC for Android works well as an Internet content front-end, where a set-top box must compete against the rapidly growing stack of commercial streaming boxes from Roku, Netgear, and everyone else at the big consumer electronics shows. Some of these commercial products also offer an interface into the owner's local music and video collection (typically through UPnP/DLNA). XBMC can match that experience, although with a large enough collection no DLNA solution is particularly pleasant — all eventually fall back on scrolling through page after page of track titles.
Where XBMC has a clear advantage is that it will always be able to offer access to more online content than these proprietary competitors, because the community writes its own add-ons and updates them without the need to call in lawyers and negotiate complex multi-year distribution deals. This is probably where XBMC will make the biggest splash, if and when users of commercial Android set-top boxes can install XBMC through the Google Play store. The do-it-yourself crowd will probably find a desktop Linux-based XBMC set-top box both easier to build and more flexible — but the average consumer may very well discover a new world through seeing XBMC available as a one-click installation option.
The application may also end up being a handy option on handheld Android devices (once the power-consumption issues are fixed). There will probably be more and better options for podcasts and locally stored content, but XBMC's unified front-end to a wealth of Internet-delivered services is likely to be a hit even on phones. If nothing else, it saves users the trouble of scrolling through dozens and dozens of extra application launchers.
Page editor: Jonathan Corbet
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds