LWN.net Logo

LCE11: UMMS, an audio/video abstraction layer

November 2, 2011

This article was contributed by Nathan Willis

Geoffroy van Cutsem from Intel presented work on the Unified Multimedia Service (UMMS) on the first day of LinuxCon Europe. UMMS is a high-level abstraction layer for audio/video operations, which is meant to provide an API for application developers that is independent of both the playback engine used on the back end and of the output target. Van Cutsem described it as analogous to CUPS for printing or SANE for scanners.

UMMS was initially developed for the MeeGo Smart TV user experience (UX) and, although it will be developed from this point forward as a framework for the Tizen successor to MeeGo Smart TV, it could also be useful on desktop systems and other Linux environments. At present, the design covers media playback and basic media capture features, although more advanced features needed for the smart TV platform are on the roadmap.

As most end-users know, there is no shortage of video playback engines available for Linux: GStreamer, FFmpeg, Xine, MPlayer, and so on. However, they provide no uniform API: front-end applications must be specifically written to tie in to each back-end. Most application projects choose just one, and those that choose several undertake a major duplication in effort. For any particular engine there also need to be language-specific bindings and, in most cases, the application is still responsible for low-level details like constructing its own GStreamer pipelines.

Video capture is similarly fragmented. It requires the application to manage the capture hardware, and in the case of TV tuners, to know the format (DVB, ATSC, QAM, IPTV, etc) and manage the frequency tables and other tuning details. Commercial smart TV and set-top box OEMs also face a challenge when implementing video-on-demand (VOD) applications and playback engines for protected content streams like Blu-Ray in their products, because content-production companies demand isolating that code from GPL and LGPL modules.

UMMS is an attempt to solve all of these problems at once by constructing a consistent playback and capture API. It provides a D-Bus service, so it is both language-independent and capable of providing license isolation. There are playback, recording, and time-shifting functions, as well as methods to query media properties. Applications can access media by URI, without regard to whether the source is local or remote, whether it uses a protected VOD playback engine or GPLed media framework, or the output format used — including the availability of hardware acceleration for the file in question. The latter capability is intended to future-proof UMMS so that it can provide applications with transparent access to new advances in video processing; Van Cutsem cited processing video as OpenGL textures as one example.

Initial API

The project is currently hosted in the MeeGo build service, but Van Cutsem said it will be migrating to Gitorious soon. Unfortunately online resources for the work are still on the scarce side at the moment — there is a draft version of the requirements document on the MeeGo wiki, but the best documentation of the API itself is contained inside the spec/ directory of the source repository.

The API provides a way to create and manipulate "MediaPlayer" objects. Two types are available, "attended" or "unattended" MediaPlayer objects. For attended objects, the application must remain active during execution (as in most video playback scenarios), and can manipulate the video. With unattended objects, the application registers an event with UMMS, then shuts down. The canonical unattended example is scheduling a DVR recording: the application provides a "time to execute" to the UMMS service, along with an input URI and a destination file name.

The code from the MeeGo build service stands at version 0.0.1, and implements sample applications of each type. There is a media player using GStreamer as the back-end framework, and a video recorder that can schedule recordings from a DVB video source. UMMS is licensed under the LGPL v2.1.

Each MediaPlayer object supports methods to report the codec used, the height and width, the playback rate, whether the content is seekable, allowed (by the copyright holder) to be displayed at full-screen resolution, and the presence and location of all audio, video, and subtitle tracks (although much of the focus deals with video content, UMMS supports audio-only media just as easily). Applications can use UMMS to query or set the playback position, adjust volume or playback speed, and do basic fast-forward/reverse scrubbing.

UMMS defines a "target" as the output destination of any MediaPlayer stream. For PC usage, this would be either an X window or an OpenGL (or other hardware acceleration) pipeline. For direct connection to TVs, there are other considerations that mandate handling HDMI and other output signals differently — non-square pixels, overscan, and so on. But a target could also be a UI element inside of another application, for example a <video> object on a web page, in the case of a browser.

Although UMMS is designed to abstract away many of the details of a media file from the application, it may not always be possible. In a discussion on the meego-dev list in March, developer Dominig ar Foll explained that some content sources will still demand that the application inspect codec settings, bit rates, buffer depths, and other specifics — in order to manage hardware resources on the device. For example, some sources are expected to provide multiple video tracks using different codecs all in a single multiplexed stream, allowing the application to choose between them. The plan is for UMMS itself to also support automatically selecting the codec in such a situation, based on a pre-defined policy — such as whether an unoccupied hardware decoder for one codec is available, or whether hardware-decoding of a codec would consume less power than software-decoding of another.

Extending the concept

Beyond the basic playback and scheduled recordings already set out in the reference applications, the plan is to extend UMMS to cover a few other TV-specific features, starting with additional functions for DVR applications. There will need to be methods to work with electronic program guides (EPG), as well as to support time-shifting and conditional-access restrictions (think pay-per-view and VOD features, where the content provider might want to ensure that a media file is not watched multiple times).

Smart TVs must also implement industry standards like parental controls and channel locks. In some countries, this is not merely an issue of conforming to expectations, but of adhering to mandatory regulations. Finally, as in the codec-selection example above, one of the goals of UMMS is to provide a framework for managing hardware resources for access by multiple applications. Thus it will need to be able to report status back to applications when there are no tuners or decoders currently available, as well as distinguish between the capabilities of various playback and capture resources, and prioritize requests based on policies.

Although UMMS is designed to meet the needs of the smart TV UX, both Van Cutsem and the developers on the MeeGo lists emphasized that it will hopefully provide useful functionality to other form factors as well — in-vehicle systems and tablets in particular. But it fills in a gap in PC-based Linux systems, too. The ability to abstract away the playback engine would simplify development of desktop media players, especially those wanting to use hardware video decoding. TV capture cards are more of a specialty item, at least for now. However, the slow pace of development in the open source DVRs MythTV and Freevo could probably benefit from an abstraction layer like UMMS as well.

To be sure, a portion of the free software community may always grate at the prospect of building a framework that explicitly enables proprietary playback engines and applications, but UMMS is not substantially different from CUPS or other system frameworks in that regard. In this case, supporting the needs of set-top box makers who are beholden to the content industry bears dividends for open source, too, by clearly defining an API layer Linux has been missing for too long.

Van Cutsem ended his talk promising that more details would be coming online soon — UMMS happened to land at an awkward point in the transition between MeeGo and Tizen, after all. The MeeGo build system, lists, and wiki are slated to be taken offline in one year, but the Tizen project infrastructure has not yet rolled out. "Stay tuned" seems like the appropriate message.

[The author would like to thank the Linux Foundation for assisting with his travel to LinuxCon Europe 2011.]


(Log in to post comments)

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 5:51 UTC (Thu) by josh (subscriber, #17465) [Link]

So, this new framework ignores the existing work in this area as insufficient rather than contributing to them (https://www.xkcd.com/927/), claims to solve all problems at once, explicitly supports anti-features that shouldn't exist (like DRM), and tries to abstract away copylefted libraries so that proprietary software can benefit from it. Clearly we've got a winner here.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 15:35 UTC (Thu) by rossburton (subscriber, #7254) [Link]

You can disagree with DRM as much as you want, but when the use-cases include DVD, Blu-ray and scrambled DVB-T you can't really leave DRM out...

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 20:33 UTC (Thu) by josh (subscriber, #17465) [Link]

There's a difference between bypassing DRM (as in libdvdcss and the ongoing work on Blu-Ray) and accepting DRM.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 21:51 UTC (Thu) by rossburton (subscriber, #7254) [Link]

Good luck selling that in a box with the Blu-ray and DVD logos on the side.

UMMS -- and by extension MeeGo TV and Tizen TV -- is designed for use in real products for use by real people. The spiritual ancestor of UMMS is being deployed in a number of set-top boxes right now.

Anyway, UMMS's DRM support mainly involves hitting the right hardware registers so that the hardware decoders kick in.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 5, 2011 13:21 UTC (Sat) by tpm (subscriber, #56271) [Link]

Why is another abstraction framework required to accommodate DRM/Blu-ray/DVD ?

Other people seem to have no problem doing these things (DRM, DVD) with existing frameworks (in a commercial context, conforming to the various licenses etc.).

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 5, 2011 13:16 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

From this comment I take it the other criticism (mainly 'duplicating existing tools instead of contributing to them') is valid? Or are there good technical reasons why it was decided to start all over instead of contribute to a project like Phonon or QtMultimedia?

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 5, 2011 14:14 UTC (Sat) by rossburton (subscriber, #7254) [Link]

Part of the design is DBus for license separation -- to use say Phonon with a bluray decoder you'll probably need a license exception (like GStreamer has) and I suspect Dolby approval would be impossible.

As I said earlier, the design is mostly based on real world legal and technical concerns for the target use-case (dvb, dvd, bluray, dolby) that no other projects cope with.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 6, 2011 23:36 UTC (Sun) by idupree (subscriber, #71169) [Link]

License-separation is an annoying concern from a technical perspective. But I wonder if the separate process can be useful for stability/security too. (I suspect stability: easier and security: hard, but I'm just guessing.)

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 6:39 UTC (Thu) by clicea (subscriber, #75492) [Link]

One word: Phonon?

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 15:36 UTC (Thu) by yokem_55 (subscriber, #10498) [Link]

Nah, that's a dirty QT/KDE thing. Can't have that.

I know it's gotten a lot of complaints, but since the primary backend was switched to vlc, Phonon has worked perfectly for playback purposes of nearly everything under the sun.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 9:18 UTC (Thu) by juliank (subscriber, #45896) [Link]

GStreamer itself is a framework, not a video playback engine. It is modular and implements codecs and other stuff via plugins. GStreamer is an abstraction layer in itself, and UMMS only adds another more restricted layer on top of it.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 4, 2011 9:56 UTC (Fri) by xav (guest, #18536) [Link]

Agreed. I fail to see how applying another indirection layer (with its inevitable share of bugs) will make things better, especially for embedded systems.
Moreover Gstreamer is already a layer above libav (ffmpeg) for most playback cases.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 9:53 UTC (Thu) by hadess (subscriber, #24252) [Link]

> The ability to abstract away the playback engine would simplify
> development of desktop media players

No, it would not. It would add an abstraction of bugs and lowest common denominator functionality.

Totem-xine. Phonon. No thank you.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 15:48 UTC (Thu) by aginnes (subscriber, #81011) [Link]

Given that Intel announced that they're dropping their TV chips on Oct 12th, is this effort going to continue?

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 16:17 UTC (Thu) by dashesy (subscriber, #74652) [Link]

I do not know if this has something to do with "IPP Unified Media Classes (UMC)".
If UMMS comes from the same background and provides highly optimized libraries that can dispatch not only according to the CPU capabilities, but also other decoder/encoder chips available, it sure will be a great product. I hope it is not yet another level of abstraction. Phonon is cross platform and QT API is just wonderful, if I need abstraction.

LCE11: UMMS, an audio/video abstraction layer

Posted Nov 3, 2011 21:54 UTC (Thu) by rossburton (subscriber, #7254) [Link]

It's nothing to do with IPP. UMMS's original reference platform is the CE41xx "Sodaville" board, which has hardware encoding/decoding for pretty much every format used in the real world.

The software paths can use GStreamer, which has its own processor optimization magic based on liboil.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds