Development

Matterhorn brings integrated open source video distribution

September 8, 2010

This article was contributed by Nathan Willis

The Opencast project unveiled the 1.0 release of Matterhorn at the end of August. Matterhorn is an integrated video recording, processing, and distribution platform. Its primary goal is to let educational institutions set up a streamlined process for recording video in classrooms and releasing it in web or podcast form, but anyone who captures video on a recurring basis may find it useful.

The 1.0 code is available for download both as source and as pre-packaged "all in one" binaries for Linux servers. Matterhorn is written in Java, implemented on top of the Apache Felix service framework (which is included in the binary distribution). The all-in-one distribution is packaged and configured to be run from /opt/matterhorn on an Apache server. There are pre-install scripts for RPM-based and Debian-based systems that install a handful of third-party utilities, but otherwise the binary distribution is meant to be run out of the box, serving up its administration interface on http://localhost:8080.

Alternatively, Matterhorn can be installed from source, using the Apache Maven build system. In addition, the Matterhorn platform includes several components — from program scheduling and video transcoding to feed publishing — which can be installed and run on separate servers. This design allows institutions to deploy multiple "worker" nodes to scale up a Matterhorn system for increased video processing volume. Separate instructions for multiple-server installation are available on the project's documentation site.

The documentation is quite thorough, on top of which the project has taken pains to make the system friendly for non-IT-experts to manage; in addition to coming "pre-configured" in the binary distribution, almost all user-facing configuration options are exposed in the web interface of the administration server. Lower-level options like database settings and paths to system executables still need to be edited in text configuration files, but an inexperienced developer can unpack the binary distribution and run it relatively painlessly. For those evaluating the system for their own institution, the documentation even provides a detailed list of the recommended hardware for video capture, encoding, and media distribution.

A bird's eye view of Matterhorn

Matterhorn 1.0 is built to be an end-to-end video deployment system, with a workflow that consists of four basic components. In the video capture and administration component, users can upload video files that they have produced elsewhere, or schedule automated capture from pre-defined video resources, such as lecture hall cameras. The list of configured video capture resources is maintained inside Matterhorn, so that teachers can schedule a recording by date, time, and room, including recurring classes.

The ingest and processing component takes uploaded video content and processes it in a variety of ways. It can be automatically transcoded for several resolutions and codecs, automatically segmented on scene transitions, watermarked or branded with video overlays, or (in the case of presentation slide videos) scanned with optical character recognition (OCR) to extract text. OCRed slide text can then be used as captioning for screen readers. Caption text can also be added for any video content by human transcribers. The system also allows each video to be marked as "hold for review" so that an administrator can examine it for quality or wait to publish it until it has been submitted to a transcriber and fully captioned.

The distribution management component consists of publishing tools, including local storage and web delivery, uploading to public off-site services (e.g., YouTube), DVD-ready output, and RSS feeds designed to integrate with other content management systems (CMSes). Because of the educational focus, the supported CMSes include Sakai, Blackboard, Moodle, and other academic courseware suites. The engagement tools component includes a web video player, the search interface, and annotation tools for the end user.

The recommended hardware configuration is modest, consisting of an MPEG encoder board, an ITX motherboard-based machine, and SATA storage. A separate VGA capture unit is listed as well, presumably to capture overhead projector content separately from live video. The result is roughly equivalent to a standard desktop-caliber PC in price. Intel's low-power Atom chip was initially the target, but had to be dropped from the recommendation in order to keep up with full-rate video.

Accessibility all around

Perhaps the most striking thing about Matterhorn is its integrated accessibility support. Support for human transcription is built in, OCR (via the open source Ocropus library) is built in, and the development trunk includes work to enable automatic speech-to-text transcription through CMU Sphinx. Seven captioning formats are supported, to enable compatibility with the widest range of video players and hosting services. The embeddable web video player itself is accessible with keyboard and screen reader commands.

Few other open source video delivery systems come close. The hybrid open/closed source Kaltura video system, for example, allows for end-user annotations that could be used to store transcriptions, but it is not part of the automated workflow. A commercial plugin is available for human transcription, but it is tied to a specific, paid transcription service.

Of course, as accessibility experts will always tell you, making video accessible for sight-impaired users actually increases accessibility for sighted users as well. Matterhorn's caption text is fully searchable, via both the site search engine and within the embedded video player. The video player will display updated captions as a user scrolls back and forth in the timeline, making it easier to locate points of interest in a video.

Educational use

To educators, the lecture scheduling and courseware distribution features are important. If the institution has dedicated video recording hardware in each classroom, setting up a recurring recording is as straightforward as selecting its weekly times in a web calendar widget. The administration interface maintains video recordings in "series" that share basic metadata. For a recurring lecture series, this means that each new recording is automatically tagged with the correct metadata when it is captured, before automatic processing and publishing to the eventual feed or output system.

From the student's perspective, one of the more interesting features is the two-pane video player, which allows two time-synchronized videos to play together: a camera on the speaker, and a second track showing presentation slides. Whether a specific recording represents slides or a live speaker is flagged when the video is uploaded; slide tracks are automatically queued for OCR scanning. Matterhorn also attempts to automatically detect abrupt changes in a slide video — which generally signal slide transitions — and creates thumbnails of each slide to serve as video bookmarks.

Integration with Moodle, Sakai, and other courseware projects is almost a given. The Matterhorn video player is embeddable, so a course instructor or administrator can simply drop it into a page, but the RSS output allows a Matterhorn video series to be automatically synchronized with Moodle or Sakai course materials, or made available to other applications. The documentation mentions Apple's iTunes U educational service, but any podcast application will work.

Outlook

The turn-key design of Matterhorn will likely appeal to schools, who may not wish to expend time training dozens or hundreds of teachers to record and encode video on their own. That said, there is nothing in Matterhorn's design that is unique to the education market. Any group, community, or project that produces video content regularly could benefit from the ability to automate the process to some degree, not to mention the niceties of free slide OCR and thumbnailing — dozens of open source conferences and events could benefit from those features alone.

Considering how often web video issues are in the news these days, it is genuinely surprising that there are not more open source video workflow and distribution projects out there. The aforementioned Kaltura is fairly well known, although only a subset of its products are open source (the familiar "Community Edition" play). But Kaltura is actually more focused on entities building stand-alone video delivery sites, such as their own branded "channels." Hence, a great deal of emphasis is placed on search engine optimization and web advertising frameworks in Kaltura's feature set.

The Plumi project is closer in scope to Matterhorn, providing a decent video workflow from upload to distribution, but it works entirely within Plone sites. One of Matterhorn's advantages over Plumi is it CMS-neutral output capabilities.

Now that version 1.0 is out the door, it will be interesting to see how the education and non-education markets respond. The Opencast project Planet feed shows several real-world universities using Matterhorn for video distribution, including many in Europe and the North America that are listed as sponsoring partners in the project.

Plans are well underway for the next development cycle, which may integrate the automated speech-to-text transcription feature, and will definitely feature the OpenCaps web-based caption editing tool. From a fully open source video distribution toolchain, to a fully accessible web video player, Matterhorn offers much that schools — and open source projects — could find educational.

Comments (6 posted)

Brief items

bzr 2.2.0 released

Version 2.2.0 of the bzr distributed source code management system is available. "This is primarily a bugfix and polish release over the 2.1 series, with a large number of bugs fixed (>120), and some performance improvements."

Full Story (comments: none)

Cairo 1.10.0 available

The 1.10.0 release of the Cairo graphics library has finally been released. "One of the more interesting departures for cairo for this release is the inclusion of a tracing utility, cairo-trace. cairo-trace generates a human-readable, replayable, compact representation of the sequences of drawing commands made by an application. This can be used to inspecting applications to understand issues and as a means for profiling real-world usage of cairo." The profiling feature has evidently been used to improve performance in a number of areas. There is also improved printing support, better 16-bit buffer support, and better use of hardware acceleration.

Full Story (comments: 1)

Firefox and SeaMonkey updates released

The Mozilla project has released firefox 3.6.9 and 3.5.12 and SeaMonkey 2.0.7. These updates fix a relatively long list of scary security problems; the firefox 3.6.9 update also add support for X-Frame-Options, which can be used by web sites to prevent their content from being trapped inside another site's frames.

Comments (none posted)

Thunderbird 3.1.3 and 3.0.7 security updates now available

Mozilla has released Thunderbird 3.1.3 and Thunderbird 3.0.7 with security and stability updates. See the release notes for details (3.1.3 and 3.0.7).

Full Story (comments: none)

GDB 7.2 released

Version 7.2 of the GDB debugger is out. New features include support for the D language, some C++ improvements, better Python support, better tracepoint support, and more; see the announcement for the details.

Full Story (comments: 1)

Mozilla Labs Gaming launches

The Mozilla Labs Gaming project has announced its existence. "Modern Open Web technologies introduced a complete stack of technologies such as Open Video, audio, WebGL, touch events, device orientation, geo location, and fast JavaScript engines which make it possible to build complex (and not so complex) games on the Web. With these technologies being delivered through modern browsers today, the time is ripe for pushing the platform. And what better way than through games?" The project is starting with a competition to see who can build the best web-based game.

Comments (5 posted)

PostgreSQL 9.1alpha1 released

Most PostgreSQL users are waiting for the 9.0 release; meanwhile, the first alpha 9.1 release has just been announced. There are a number of enhancements in performance, statistics gathering, XML support, and more; click below for the details.

Full Story (comments: none)

Vignatti: X Census (for 1.9)

Tiago Vignatti has put together a report on the development X.org 1.9. In the tradition of the kernel statistics reported on LWN, and the more recent GNOME census, he ranks developers and employers based on the number of changes made to various pieces of the X.org tree during the development of 1.9 (April 2 to August 20). The statistics are broken up along functional lines into several categories: X implementation, X input drivers, user space video drivers, Pixman, X11 conformance testing, and X documentation. "Of course lines of code and changeset are far from being a good metric to see actually how the development happened. But still, it does represents something."

Comments (63 posted)

Newsletters and articles

Development newsletters from the last week

Caml Weekly News (September 7)
PostgreSQL Weekly News (September 5)

Comments (none posted)

Introduction to ClamAV's Low Level Virtual Machine

The ClamAV project has put up an introductory post showing how to make use of an interesting new feature: integration of an LLVM just-in-time compiler for malware-scanning bytecode. "Here's a case study to see how ClamAV bytecode can come in handy (this is an integer overflow vulnerability in a old version of OpenOffice CVE-2008-2238)".

Comments (none posted)

Graesslin: Driver dilemma in KDE workspaces 4.5

Martin Graesslin looks at problems with the interaction between KWin and some graphics drivers. "Now that I have explained all our checks we did to ensure a smooth user experience, I want to explain how it could happen that there are regressions in 4.5. In 4.5 we introduced two new features which require OpenGL Shaders: the blur effect and the lanczos filter. Both are not hard requirements. Blur effect can easily be turned off by disabling the effect and the lanczos filter is controlled by the general effect level settings which is also used for Plasma and Oxygen animations. Both new features check for the required extensions and get only activated iff the driver claims support for it. So everything should be fine, shouldn't it? Apparently not when it comes to the free graphics drivers (please note and remember: we do not see such problems with the proprietary NVIDIA driver!)." (Thanks to Jos Poortvliet)

Comments (78 posted)

systemd for Administrators, Part II

Lennart Poettering has posted the second installment in his series on system administration with systemd. "In systemd we place every process that is spawned in a control group named after its service. Control groups (or cgroups) at their most basic are simply groups of processes that can be arranged in a hierarchy and labelled individually. When processes spawn other processes these children are automatically made members of the parents cgroup. Leaving a cgroup is not possible for unprivileged processes. Thus, cgroups can be used as an effective way to label processes after the service they belong to and be sure that the service cannot escape from the label, regardless how often it forks or renames itself. Furthermore this can be used to safely kill a service and all processes it created, again with no chance of escaping."

Comments (none posted)

Page editor: Jonathan Corbet
Next page: Announcements>>