GStreamer 1.26.0 released

[Posted March 12, 2025 by jzb]

Version 1.26.0 of the GStreamer cross-platform multimedia framework has been released. Notable changes in this release include support for the H.266 Versatile Video Coding (VVC) codec, Low Complexity Enhancement Video Coding (LCEVC) support, closed caption improvements, and JPEG XS image codec support.

another patent snake pit

Posted Mar 13, 2025 1:41 UTC (Thu) by rolexhamster (guest, #158445) [Link] (10 responses)

Great, a whole bunch of new patent-encumbered codecs. Yay.

The JPEG XS standard is basically a patented rehash of wavelets: techniques from ~30 years ago, which in turn are essentially special cases (read: glorified versions) of digital filters, stuff known for at least 50 years. The patent system at its best.

The "Low Complexity Enhancement Video Coding (LCEVC)" looks like the most insidious one. From the linked Wikipedia page:

... LCEVC leverages a base video codec (e.g., ..., AV1, ....) and employs an efficient low-complexity enhancement that adds up to two layers of encoded residuals

First of all, the MPEG cartel wants to add a patented wrapper around AV1, a codec explicitly designed to be patent-free? Where do I sign up?

Secondly, post-processing (such as handling residuals) is the job of the codec itself, which makes the LCEVC wrapper particularly egregious and redundant. It smells like a cynical attempt at carpet bombing the codec space with patents.

Looking at directions to challenge patents on LCEVC

Posted Mar 13, 2025 11:15 UTC (Thu) by farnz (subscriber, #17727) [Link] (6 responses)

The idea behind LCEVC is closely related to bitrate peeling; you have a base codec (such as AV1) at one bitrate, and one or two enhancement layers that increase the bitrate but also increase the video quality. This is an old enough idea that it can't be patent protected, so the patents must apply to the specifics of the enhancement layers.

For example, MPEG-2 (back in 1994) had its "scalability extension layers", where you had a base layer, and at most one extension layer that could improve one of resolution, frame rate or SNR (picture quality). The intended use case was for "hierarchical transmission" (which was added to DVB-T to support this) where a receiver in a poor signal area would just get the base layer, and a receiver in a good signal area would get both the base layer and the example layer, and thus be able to give you a better picture when you got good signal.

Nowadays, you'd use this with IP streaming, to give you more quality/bitrate options without blowing up your storage needs; instead of having (say) 15 options, each of which is entirely independent of any others, you'd hope to get closer to 30 options in the same storage, using 10 base layers. You might also use this if you have multicast to your viewers - multicast the base layer to everyone watching, and send enhancement layers only to those with good enough connections to support the higher bitrate.

The gotcha will be that an enhancement layer, by its nature, will not have the same class of content as the base layer; because the enhancement layer is only working on the difference between the base layer and the original content (and not the full content), it'll be doing something radically different to the base codec. And that, I suspect is where the patents come in - encoding to get maximum entropy from the enhancement layer is likely to involve new techniques that aren't present in older codecs.

And note that the MPEG-2 extension layers are slightly different, because they change the way you decode the base layer. LCEVC has the base layer fully decoded independently of the enhancement layers, and then applies the enhancement layers to the decoded picture before presentation to the viewer. This means that the MPEG-2 layers aren't quite enough to challenge most patents :-(

Looking at directions to challenge patents on LCEVC

Posted Mar 13, 2025 13:42 UTC (Thu) by rolexhamster (guest, #158445) [Link] (5 responses)

I can see the use case for separate streams at various bitrates, but it seems awfully complicated and inefficient to have a wrapper codec around a given codec. Residuals are very likely to end up as edges (high frequencies), which would be a pain to encode efficiently separately from the rest of the video content.

I bet it would be far more effective (and simpler) to have multiple AV1 (or H264, H265 etc) streams specifically coded for a set of bitrates. An AV1 bitstream at say 1000 kbits/sec would have higher visual quality than a combined stream comprised of AV1 at 500 kbits/sec + wrapper at 500 at kbits/sec.

Looking at directions to challenge patents on LCEVC

Posted Mar 13, 2025 15:18 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

First, note that we already see that it's more compute and storage efficient to have a small number of H.264 base layers, plus extension layers to cover the gaps between them than to have a larger number of base layers; it's computationally cheaper to produce a 2 megabit per second (Mb/s) stream with three 1 Mb/s extension layers (thus covering 2 Mb/s, 3 Mb/s, 4 Mb/s and 5 Mb/s bitrates) than to produce 4 different H.264 streams at the different bitrates (and paying for 14 Mb/s of storage instead of 5 Mb/s). The downside is that the quality is a bit lower - with modern scalable encoders, each extension layer "wastes" about 10% of the bits it uses as compared to directly encoding at that bitrate (so with three extension layers to get 5 Mb/s, you've got a quality that's closer to a 4.5 Mb/s base layer encoding than to a 5 Mb/s encoding) - but it is worth it when you're looking at offering 30 quality tradeoff points instead of 10 for the same cost - since it means that you can "spread" across a wider range of bitrates. H.265 narrows the gap between doing a single layer encode, and an encode of a base layer plus several extension layers, so that you'd be looking at closer to a 4.8 Mb/s single layer encode than a 4.5 Mb/s one if you used 3 extension layers.

For prerecorded content, this is plenty good enough - you can cover all the bitrate points you want to cover, and use buffering to overcome variability in consumer connection speeds. But for live content, you start having problems with common connection types like WiFi and 3GPP cellular (LTE, 5G NR etc), because you've got more constraints on you:

Take someone using mobile data to watch live sports; you want to keep the latency from action being filmed to viewer seeing it low (so that they can keep up with a group chat including friends in the stadium), so your buffer size is on the order of half a second - much more than that, and messages from the friends in the stadium will trigger a notification on your phone before you've seen the corresponding action in the stream. For efficiency reasons, you want the stream you're viewing to have "switch points" (where you can switch to a different bitrate seamlessly) as infrequently as possible - Netflix uses every 10 seconds, for example - because that allows you to use inter-frame redundancy efficiently, and new viewers of the stream start at a lower bitrate so that they can predecode and discard up to 9.5 seconds before joining the live stream.

To make this work, you have to choose a bitrate for the base layer such that the lowest throughput over a half second period in any ten seconds is higher than the stream bitrate. But this then gives you a quality problem; if I'm in a train, going through a city, my lowest throughput over a half second period might be 500 kilobits per half second (1 Mb/s), with an average over a ten second period of 4,000 megabits delivered (400 Mb/s). You now have three options to exploit my high throughput mobile data link:

Increase latency. This gets you more buffering, and thus increases the averaging time for my throughput from half a second to (say) 10 seconds, and is what Netflix does for prerecorded video. But, for live sports and similar, it's not workable because of the group chat problem.
Let it glitch. When my mobile network drops to a slow speed for a short period, I miss the action. Again, not acceptable for live sports, since if I miss the key events completely because the stream glitched for a few seconds, I'm going to be upset.
Keep the quality and thus bitrate low. This is the current tradeoff for live sports - if I can only reliably receive 1 Mb/s, that's the stream speed I get, even though most of the time, I can receive 400 times faster. Here, though, you have a PR problem; how do you explain to customers that they're getting poor quality because their network blips occasionally as they're on the move, when every test they know how to run shows that they're getting far higher speeds?

That's where the enhancement layers idea comes from; you need a very different encoder to "normal" video (since the base layer has covered all the low frequency content) to do this efficiently, but you can then send the enhancement layers separately to the base layer, with different congestion behaviours; if the network blips in speed, you set things up so that the base layer gets through (quality reduces, but you can still see the important parts of the action), but the enhancement layers don't. Then, you can use enhancement layers to keep the quality up most of the time, and when there's a blip, the quality falls back to the base layer.

This gets you out of the hole; the enhancement layers can be turned off when they blip, resulting in a quality fallback, but you can now (e.g.) have a base layer at 1 Mb/s, with a 5 Mb/s and a 15 Mb/s enhancement layer. I get 1 Mb/s quality all the time, but whenever my network is good, I get the quality I'd expect from 19 Mb/s of base layer codec; if the network blips, instead of a glitch in the content where I simply miss 1 to 10 seconds of action outright (depending on what gets lost), I go down to 1 Mb/s quality for 1 to 10 seconds instead.

And that's also where I expect to see the patent minefield; the enhancement layers are going to need different codec design to the base layer - they're encoding a residual with very little (or no) low frequency content, they've always got the option of outputting the base layer's decoded output with no changes, they can have extra reference pictures (base layer, and combination of base layer with enhancement layer), and I'm sure there's less obvious things, too.

Looking at directions to challenge patents on LCEVC

Posted Mar 13, 2025 23:07 UTC (Thu) by lynxlynxlynx (guest, #90121) [Link] (1 responses)

Thanks for the very insightful comments! Are there any open extension layer implementations, formats?

Looking at directions to challenge patents on LCEVC

Posted Mar 14, 2025 8:17 UTC (Fri) by farnz (subscriber, #17727) [Link]

Looking at the AOM specs, AV1 (section 6.7.5 of the PDF) supports temporal and spatial enhancement layers (a spatial enhancement layer increases the resolution of a frame, but does not change frame rate or quality at a lower resolution, a temporal layer inserts extra frames into the decoder output). It does not have an SNR improvement layer - it can't do a thing where the base layer is 1280x720p60, and the enhancement layers get you 1280x720p60 at a higher quality.

Looking at directions to challenge patents on LCEVC

Posted Mar 14, 2025 1:46 UTC (Fri) by rolexhamster (guest, #158445) [Link] (1 responses)

Thanks for the detailed explanation. It all looks very reasonable on the surface, but wrapper codecs such as LCEVC are still essentially a hack to (partially and inefficiently) address a very real problem.

The core issue seems to be the lack of built-in layered multi-bitrate encoding directly at the base codec level.

The AV1 bitstream format was frozen many years ago, so I presume this is only addressable by AV2?

Looking at directions to challenge patents on LCEVC

Posted Mar 14, 2025 8:22 UTC (Fri) by farnz (subscriber, #17727) [Link]

The missing bit from AV1 bitstream format is a quality extension layer; it can increase resolution (so base layer gets you 1280x720, extension layer + base layer gets you 1920x1080), and it can increase frame rate (base layer gets you 15 fps, extensions give you 30 fps or 60 fps), or a combination of the two, but not keep resolution and frame rate the same but increase picture quality.

With that, you could get very similar effects with more memory in the decoder - you decode extensions and base in parallel, and keep all the decoded pictures around. If you keep getting extensions, you keep using them. If you can't decode the extension layer due to packet loss, fall back to base layer until you can decode the extension layer. Then, you "just" need the encoder to be intelligent about the extension layer such that you can switch the extension layer off and on quickly on a frame-by-frame basis.

another patent snake pit

Posted Mar 13, 2025 22:27 UTC (Thu) by tpm (subscriber, #56271) [Link] (2 responses)

Multimedia frameworks unfortunately don't get to decide what codecs and technologies the rest of the world puts into their specs, hardware, and interoperability guidelines :)

another patent snake pit

Posted Mar 14, 2025 1:51 UTC (Fri) by rolexhamster (guest, #158445) [Link] (1 responses)

They can always resist. Refuse to support new patent-encumbered formats, to slow their adoption. This allows more room and time for proper open formats to gain in popularity.

another patent snake pit

Posted Mar 14, 2025 2:17 UTC (Fri) by pizza (subscriber, #46) [Link]

> They can always resist. Refuse to support new patent-encumbered formats, to slow their adoption. This allows more room and time for proper open formats to gain in popularity.

Please let us know how well that works out when your family wants to watch something on Netflix or Disney+.