April 1, 2009
This article was contributed by Nathan Willis
Xiph.org achieved a milestone last
week, unveiling the
first public release of its new encoder for Theora video. The new encoder is codenamed
Thusnelda to distinguish it from previous work, and makes several big
improvements, including fixes to constant bitrate and variable bitrate
encoding.
Theora is derived from a video codec called VP3 created by On2
Technologies. On2 donated the code to VP3 and to the public under an open
source license in 2001, and agreed to help Xiph.org develop Theora as its
successor. The specification for the Theora codec's format was finalized in
2004, but the reference encoder itself — the actual binary that
converts a video file into Theora format — only reached 1.0 in
November of 2008. Work on Thusnelda began shortly thereafter, spearheaded
by Xiph.org's
Christopher Montgomery, but was bolstered by a grant
from Mozilla and the Wikimedia Foundation that allowed lead Theora
developer Tim Terriberry to focus on improving the encoder to coincide with
the built-in Theora support slated for Firefox 3.5.
What's new
The Thusnelda encoder is denoted 1.1 alpha, and is available
for download from Xiph.org in several formats: source code for the
libtheora library, binaries of the ffmpeg2theora command-line conversion
utility, and even a Mac OS X Quicktime component.
According to Xiph.org's Ralph Giles, the most noticeable improvement in
1.1 is proper rate control, particularly for fixed bit rate encoding, where
the user specifies either the number of bits per second desired in the
output (a common use case for streaming applications), or the desired file
size. "The 1.0 encoder relies a lot on heuristics, instead of trying
to optimize directly the trade-off between quality of the coded images and
the number of bits used to represent them," he said, "More
significantly, the fixed bitrate mode in the 1.0 reference encoder didn't
really work; it just guessed how to meet its target and often missed the
requested bitrate, sometimes by quite a bit, which was a problem for
streaming and fixed-size encodes."
But Montgomery's work — supported for a year by his employer Red
Hat — also included extensive refactoring of the code, which should
result in improvements today and allow for easier changes moving
forward. "The older encoder was structured as a bunch of nearly
independent passes," Giles said, "[it] made something like 8
passes over each frame. This made some forms of decision making hard,
i.e. if an earlier decision caused you problems (higher bitrate) in a later
stage you were out of luck. The new encoder collapses most of the
passes."
The restructuring also allows Thusnelda to take advantage of features in
the Theora specification that had never been implemented before, such as
"4MV" macroblocks, a motion compensation scheme that adaptively chooses
whether to encode motion information for an entire segment of the picture,
for a sub-segment, or for none of the segment. "Theora always breaks
each image up into square blocks," Giles explained, "one of
those blocks then can be split into four motion vectors, or use an average,
and if any of those four don't need to be coded, the alpha encoder can skip
coding a corresponding motion vector. Making a change like that was too
difficult with the 1.0 codebase."
Measuring success
Naturally, real-world performance and not a feature list is the primary
means of assessing an encoder. Theora has been the object of criticism in
years past, especially when compared against proprietary offerings such as
H.264. Reader comments on news stories at Slashdot often dismissed Theora
as a poor alternative, producing larger files than the competition for the
same subjective quality.
Codec testers are always at the mercy of the encoder, however, and as
noted above Theora's 1.0-series encoder had significant flaws, especially
with respect to constant bitrate encoding. In the oft-cited doom9.org 2005 codec
shootout, the Theora encoder performed poorly by failing to meet the
target file size due to poor rate control; the very feature targeted in the
1.1 branch. Similarly, Eugenia Loli-Queru's 2007 Theora versus
H.264 test for OSNews repeatedly cited problems with the encoder that
made direct comparison close to impossible.
Both tests pre-date the 2008 release of
the final 1.0 encoder, much less the 1.1 alpha. Shortly after the
Thusnelda alpha, Jan Schmidt posted the results of his personal
tests on his blog, indicating a 20% reduction in file size and 14%
reduction in encoding time over the 1.0 encoder. Those are significant
numbers, even without accounting for better rate control and other encoding
parameter improvements. As commenters to the blog pointed out, Schmidt's
test was not scientific, particularly as it involved re-encoding an H.264
file rather than a lossless original, and showed example still frames
rather than video results.
Video quality is ultimately a subjective, human-centric measure.
Although there are attempts to quantify video encoding quality, such as peak
signal-to-noise ratio (PSNR) and structural similarity index
(SSIM), they rarely replace subjective evaluations of quality. Xiph's
Gregory Maxwell said that Thusnelda improves on Theora's PSNR, but that it
was a mistake to assume that that equated to a subjective improvement for
any particular use case.
To an extent the objective metric problem is
equal to the coding problem. If we had a perfect metric we could probably
make a perfect encoder (ignoring a lot of engineering details) ... If we
could objectively know what 'looks good' then we could make a coder which
uses that metric to decide what to code. Then the problem of coding
largely reduces to efficiently packing information, which is well
understood. So in any case, objective metrics are usually useful for
measuring the results of small changes which are mostly 'objective' in
nature; they aren't very useful for measuring perceptual changes, nor are
they useful for comparing dramatically different codecs.
Terriberry concurred, noting that none of the simple objective metrics
take any kind of temporal effects into account, and they are still less
trustworthy than the processing done in the brain. "Like most
things, it's a matter of knowing what the limitations of your tools
are. PSNR and SSIM are useful for monitoring day-to-day changes in the code
to identify regressions and optimize parameters. But for evaluating
fundamentally different approaches, there's currently no substitute for
using real humans."
Theora took hits from critics on subjective quality in the 2005 and 2007
tests, too, points which Montgomery responded to in 2007 with a page on
his personal Web site. Although some subjective quality issues like
discernible blockiness are not the result of problems with the 1.0 encoder,
he argued, many of the most visible problems are, and he urged readers to
watch the progress made in the 1.1 series.
What's next
There are several improvements still to come before 1.1 is declared
final, according to the Theora team. Giles said the next major feature
will be per-block quantizers, the functions that simplify a block of input
data into smaller values for output. "[Theora precursor] VP3 used a
fixed set of quantizers, and the "quality" knob was the only way you could
change things. When VP3 became Theora, back 2004, we added support for
varying those quantizers both per video, and per frame type. The 1.0
encoder was able to support alternate quantizer matrices, because you just
switch them out, but there were some tuning issues."
"1.1alpha1 is still using the same set, but we expect that the
change soon," Giles said. The newly-restructured codebase makes it
easy to vary the quantizer used, not just on a per-file or per-frame basis,
but block-by-block. Terriberry added that the new code will support 4:2:2
and 4:4:4 pixel formats, which will allow higher color quality, and the
ability to use different quantization
matrices for different color channels and frames.
Giles and Terriberry agreed that 1.1 final will be significantly better
than even the current alpha release once all of the changes are
incorporated. Terriberry noted that many of the remaining improvements are
"minor things" but that added together they will be substantial. "And
that's not even mentioning things like speed optimizations, which also have
real practical benefits."
"There are other things still on the docket as well — we're
not done yet!" added Montgomery, "However, we're finally to
the point of putting together a release solidly better than 1.0 in every
way, along with a much higher future ceiling."
Between now and then, the team is soliciting user input from real-world
encoding tests. "We put it out to show what we've been up to, and to
make it easier to give it a try," said Giles. "We're
interested in samples where it really does poorly, especially relative to
1.0, compatibility testing with current decoders, and general build and
integration issues which of course can only be found through people trying
your software in their own environments." He encouraged users to
submit concrete issues through the bug tracker, but to share other
experiences through the project mailing list, or simply to blog about
them for all to read.
Web video is poised to start changing dramatically once Firefox 3.5
ships with a built-in Theora decoder underlying the HTML5 video element.
That makes it all the more important to get the Theora encoder right.
Xiph.org does not have the full-time staff or resources of larger activist
groups like the Free Software Foundation or Creative Commons, it has only
software developers. Consequently, without the support of Red Hat,
Mozilla, and the Wikimedia Foundation, it might not have been able to get up
to speed. It remains to be seen whether the final build of Thusnelda will
beat Firefox 3.5 to release, but the progress made already is
encouraging.
(
Log in to post comments)