LWN: Comments on "Giving Rust a chance for in-kernel codecs"

Giving Rust a chance for in-kernel codecs

dvdeug — Sun, 26 May 2024 21:58:40 +0000

Wuffs sucks even harder for dynamic allocations and generic code, since it doesn't permit them. An apples to apples comparison of SPARK/Ada to Wuffs, even within Wuffs' narrow design space, would be interesting.

Giving Rust a chance for in-kernel codecs

dwlsalmeida — Fri, 03 May 2024 14:45:36 +0000

My next step is to set up some infrastructure to test this continuously. If regressions can be quickly pinpointed and acted upon, we will be in a really good place to move forward with Rust in other fragile parts of the media code. This also applies to frameworks as well, like videobuf2 and mem2mem.

I've started discussions on a new iteration of mem2mem[0]. It's clear to me that it can be written in Rust directly by providing a C API through cbindgen. The framework can switch to C when executing any callbacks implemented by the driver, while drivers themselves keep their C code intact, if they so desire.

[0] - https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF...

Giving Rust a chance for in-kernel codecs

Gnurou — Fri, 03 May 2024 05:25:31 +0000

FWIW I think this approach (writing safer versions of C functions that could benefit from memory safety while not having too much dependencies to pull in) is an excellent and gentle way to introduce Rust to a new kernel subsystem. It allows apples-to-apples comparison of the C vs Rust code (notably in terms of memory footprint), lets maintainers ease into the new language through bite-sized code, and paves the way for the next layer of functions to be converted if there is appetite to go further.

Giving Rust a chance for in-kernel codecs

farnz — Tue, 30 Apr 2024 09:38:06 +0000

Note that even without the little quirks, stateless codecs will always need something to manage three chunks of state for them:

Position in the bitstream. Something needs to track how far through the bitstream you are, and avoid either skipping bits that the stateless codec needs to see, or sending it the same bits repeatedly when it doesn't need them again.
Picture reordering. Video codecs can predict the "current" picture from both past pictures, and future pictures. If you have pictures in output order 1 2 3 4, where pictures 2 and 3 can use picture 4 as a reference, the bitstream will contain pictures in the order 1 4 2 3, and something needs enough state to rearrange that into 1 2 3 4.
Reference picture buffering. Video codecs aren't allowed to refer to arbitrary pictures when they predict the current picture; they're only allowed to refer to "reference pictures". Every codec has rules for when a picture enters the set of reference pictures, and when it leaves, and the stateful wrapper has to ensure that the stateless decoder has the "right" set of reference pictures available to it.

And then you get into more complicated things like stormer described, but also things like MPEG-2 having per-sequence and per-picture state, which needs to be available to the decoder with every slice it's decoding. The wrapper thus has to understand enough of the bitstream to know what state the decoder needs to be given with each slice it's decoding.

Giving Rust a chance for in-kernel codecs

leromarinvit — Tue, 30 Apr 2024 08:32:12 +0000

Thanks for the detailed explanation! Somehow, I misread the article and was left with the impression that v4l2 contains at least significant parts of full codec implementations, to support the parts that some decoders don't implement in hardware. That's what I commented on, but I see now that what looked like a strange design to me was simply a misunderstanding.

Like I said, so far I've never had the need to handle video in my own code, so I know next to nothing about how all the components work and interact.

Giving Rust a chance for in-kernel codecs

tialaramex — Mon, 29 Apr 2024 23:19:24 +0000

But it certainly _is_ possible to develop a language where a machine tool can follow along with the discrete maths needed to see that we're in bounds when we do this, or point to where our code doesn't do what we thought it did (e.g. it's mandatory for videos in our encoding to have width multiple of 16, and so we didn't check that but our code assumes it's true, leaving a gap for anybody to just lie)

We only need to: Develop that language (for conventional software codecs it exists, it's named WUFFS) and teach the handful of people who write new codecs how to use this purpose made language and then sit back and enjoy the high performance totally safe results.

Giving Rust a chance for in-kernel codecs

gspr — Mon, 29 Apr 2024 20:40:56 +0000

Since multimedia codecs are the focus of this thread, I think they constitute a type of code where explicit indexing very much remains necessary. It's often not possible to express things like "add the red value of this pixel plus c times the blue value of the pixel k rows directly above it" using iterators, let alone iterators where the compiler can automatically elide bounds checking.

Giving Rust a chance for in-kernel codecs

stormer — Mon, 29 Apr 2024 20:03:51 +0000

DRM/KMS drivers is clearly another option and an option that I keep an eye on for the future. For me, V4L2 uAPI choice was made before my time and it does work well enough. But in DRM/KMS subsystem, the notion of scheduling work from various user process onto a certain set of fixed function cores is already solved. And this is exactly what modern stateless codecs are becoming. Rasperry Pi HEVC decoders process two task concurrently (entropy decoding and reconstruction). Mediatek VCODEC do that same, but with two independent cores. Rockchip 3588 have quad core jpeg decoders, but also dual core HEVC decoders were you can use both independently or bind the cores to process 8K frames. This in V4L2 framework is a true challenge and huge gap for which DRM/KMS do have solution for.

Of course, the validation is still mandatory, but we can easily design a common command bitstream for this type of hardware, and then internally validate and reconstruct a validated command that is simply the register layout in RAM that can be used by the scheduler to applied. Might be a steep dive, but what I'm saying is that DRM/KMS is a very viable solution for future CODEC and may help fixing all the legacy and short coming of V4L2 uAPI and hopefully support Vulkan Video on this type of hardware.

Giving Rust a chance for in-kernel codecs

stormer — Mon, 29 Apr 2024 19:50:24 +0000

I'm quite suspicious about what you mean by "having in kernel codecs". Stateless decoders refer to some hardware that does not track the sate of the decoding process. The benefit of such hardware represent a special processing core that can be scheduled to different tasks (task being streams in the context) in a very flexible way. In contrast, the stateful kind of decoding hardware needs to maintain the state of each concurrent streams and this is often limited by firmware and co-processor resources. The scheduling usually cannot be adapted to any third party constraints (consider cgroup and other kind of quotas).

All in call, the layer placed in front of these drivers through V4L2 Stateless Decoder interface does not constitute an "in-kernel" codec". The hardware implements the heavy processing, userspace implements the high level decoding logic and parsing. The responsibility of such Linux kernel driver is to ensure that the parsed parameters and state is valid and matches the pre-allocate resources size. This isn't something Rust can solve and will ever solve, this is pure logic and logic can be broken even in Rust. For each codec, specific stream parameters passed to the hardware imply specific auxiliary or image sizes. It is the responsibility of the Linux kernel to ensure that the hardware will not overflow these for a given decoder command. As this is a mix of code and hardware, Rust brings nothing here.

Though, in order to adapt to all kind of hardware, we are forced to implement small bits of the codec spec. This is implemented in the form of different codec specific libraries. For H.264 and HEVC, we transform and reorder references lists to match each hardware requirement. For VP9 and AV1, we need to post process some of the probability tables in order to combine bitstream probability updates with observations made during the decoding process. Just a quick read at these C helpers, you'll notice its made of tones of C arrays which if overrun will overwrite each other silently. I do hope our implementation is right and safe in C, but real confidence could come from the guaranties offered by the Rust language/compilers.

Another study that Daniel has been doing is the inner part of the stateless hardware programming. This is not very specific at this point, but this kind of hardware have hundreds of variable sized parameters packed into registers at different bit location. What the study revealed is that this code often misses some integer sign and sizes consideration. This may lead to miss-programming of the hardware in corner cases, errors that would generally be prevented by the Rust compiler. I personally think we could do more then just safety, and improve how we program these register with the Rust language, but at this step, this pure choice and preference.

Giving Rust a chance for in-kernel codecs

ocrete — Mon, 29 Apr 2024 19:11:30 +0000

What you are describing is more or less exactly how stateless drivers work. The userspace parses the bitstream and fills this information into C structures that the kernel then reads and copies this information almost directly into the registers to program the hardware. Most of the drivers is similar to DRM drivers in that the kernel mostly deals with allocating buffers, lifetime, telling the hardware where it's allowed to read and write, etc.

And although the legacy v4l2 API allows one to mmap the buffers directly, the modern way is just to export them as a dmabuf which works like any other dmabuf.

Giving Rust a chance for in-kernel codecs

farnz — Mon, 29 Apr 2024 16:42:28 +0000

I think it's also worth calling out the DMA-BUF UAPI as a prerequisite for change. Without DMA-BUF, the only ways to move data between hardware blocks involve either copies into userspace memory, or having a subsystem-specific mechanism like the V4L2 media controller API to move data around in kernelspace within a single subsystem.

DMA-BUF enables you to export a handle to data to userspace, which can then pass it to a different kernel subsystem. This allows you to elide the copies, and export data from V4L2 to DRM, or from DRM to V4L2.

Giving Rust a chance for in-kernel codecs

flussence — Mon, 29 Apr 2024 16:20:08 +0000

I see what you're proposing, though that'd be an enormous amount of work. It'd be doing mostly the same thing as the DRM KMS/GBM/TTM/DRI3/etc transition, and although that has delivered solid improvements it's also been an ongoing effort for 20 years now.

With modern media hardware the main obstacle is becoming middle managers desperately trying to put ever more useless buzzword-chasing junk in the devices. Webcam utility plateaus at "looks okay in standard room illumination, provides the functionality of a point and click camera from 2005, does not eat cpu like a winmodem" and anything added past that is fluff and an extra point of failure the driver has to account for, even if nothing in userspace will ever take advantage of it. Sound cards went through the same blight at the turn of the century.

Giving Rust a chance for in-kernel codecs

farnz — Mon, 29 Apr 2024 16:00:02 +0000

There is an alternative option, in drivers/gpu/drm, where your in-kernel component handles device management (buffer handling, command submission etc), and you supply an open-source userspace component (doesn't have to be part of Mesa3D) that implements the majority of the codec driver.

Giving Rust a chance for in-kernel codecs

leromarinvit — Mon, 29 Apr 2024 14:44:36 +0000

I get the impression that the reason we have in-kernel codecs is an impedance mismatch between (modern) hardware and the v4l2 api. Handing the raw bitstream to the kernel and expecting decoded frames in return is a concept that made sense with decoders that handle everything in hardware, but it doesn't with stateless decoders that only implement the expensive parts.

From my perspective (as someone who has never worked with v4l2 or video codecs in general, so take this with a grain of salt) it would make more sense for the kernel to expose the operations the hardware actually implements, and let a user space library deal with mangling the raw bitstream into whatever the hardware accepts. What is to be gained by doing this in the kernel? Is transparently supporting stateless codecs in v4l2 really worth the downsides of having full blown media format parsers in the kernel?

Giving Rust a chance for in-kernel codecs

flussence — Mon, 29 Apr 2024 04:15:04 +0000

v4l2 has to do a lot of container format framing and bitstream marshalling, and the hardware usually just does the compression and decompression which is computationally heavy but rigid in its scope. Media formats are also sometimes link-layer formats, as in DVB MPEG-TS packets, so it also has to handle corrupted or missing data robustly. It'd be pretty bad if someone figured out a kernel 0-day that could be broadcast over DVB-T airspace.

It's almost exactly the same deal as encryption - you can have silicon that does RSA or SHA2 really fast, but it probably doesn't know what X.509 or ASN.1 are and so the responsibility for juggling them lies with the kernel (and historically those have also been a security headache).

Giving Rust a chance for in-kernel codecs

tialaramex — Sat, 27 Apr 2024 23:04:06 +0000

I think it might be beneficial to expand, given the context, on what exactly WUFFS is (and is not) doing here to pull this off.

The main trick is that WUFFS is only _checking_ a proof, so while a proof assistant is very complicated and barely plausible at the scale we need so it will be annoyingly slow, WUFFS is just checking a proof which is much faster. But how do we get a proof? Well, because it was conceived specifically for this purpose, to the exclusion of even basic features not needed for such work (e.g. it literally can't do "Hello, World") the language is designed so that mostly you are writing the things needed for a proof as you write code e.g. in WUFFS we specify not only the mechanical size of a data type when declaring a variable, like it's an 8-bit integer, but also specify the range, maybe this particular 8-bit integer must have values between 10 and 186 inclusive. WUFFS will ensure everywhere that this variable is assigned that the value is guaranteed to be between 10 and 186, and equally, when assigning from (or comparing against) this value, it will be known to be between 10 and 186 inclusive again.

You will need to write stuff that just wouldn't exist in other languages, to justify yourself, but it's still not violently dissimilar from the assertions or similar you might write in Rust or even C - it's just that they're always checked, whereas a C assertion is not checked in release code, and they're in mathematical terms, not imperative terms, if we say that x < 10 in this block of code we mean that *in every possible case* this is true, and WUFFS will refuse to compile the software unless it can see why that's true.

To this end, WUFFS knows a bunch more intro level discrete mathematics than a compiler often does and it can be told to apply specific rules it has been taught (which mathematicians have long proved true) to what it knows to achieve new knowledge e.g. if a < b and b < c, then a < c when writing such an assertion.

Because it isn't a prover or proof assistant, WUFFS won't try to figure out whether your code *could* be proved correct. It just checks the proof you've in effect written in the code itself.

Giving Rust a chance for in-kernel codecs

ocrete — Sat, 27 Apr 2024 21:20:44 +0000

These are codecs which are largely implemented in hardware, so they need a driver. These are also not the codec accelerators that are part of the GPU that you see on desktop platforms, but they're independent hardware blocks on all the non-x86 chips.

Giving Rust a chance for in-kernel codecs

tialaramex — Sat, 27 Apr 2024 20:16:30 +0000

WUFFS-the-language is currently implemented as a transpiler, which produces C. It could in principle produce anything ("unsafe" Rust, Python, Java byte code) but since WUFFS isn't finished that's what it does today.

I don't know enough about the details of the work being done here to figure out whether WUFFS is the right tool for the job. Today WUFFS is an excellent (very fast yet entirely safe) way to produce codecs in software. It has no idea what a "string" is, which isn't a problem for this application space, and as you observed it doesn't emit bounds checks since it has necessarily checked your code can't have any bounds misses, so they would be redundant.

[If the only way to avoid bounds misses in your implementation is to check for them, which is probably a sign you've designed it wrong, you have to write them, and then WUFFS will see that your checks are sufficient and the code compiles, or maybe it won't and you just found a bug in your bounds checks...]

Giving Rust a chance for in-kernel codecs

Cyberax — Sat, 27 Apr 2024 17:03:27 +0000

> SPARK/Ada can do it

Not really. Ada sucks for anything that uses dynamic allocations or generic code. It only recently copy-pasted Rust's approach with borrowing, but it's still not nearly as advanced. Wuffs is probably the best practical tool for parsing.

Giving Rust a chance for in-kernel codecs

atnot — Sat, 27 Apr 2024 16:44:55 +0000

I'd say this is basically already the case for rust with e.g. capable iterators that remove almost every case where you'd normally use indexing in languages without them. Of course that doesn't help you when you do for whatever reason need to for whatever reason, but to be honest I think most of my codebases contain few if any instances of array indexing.

Giving Rust a chance for in-kernel codecs

dwlsalmeida — Sat, 27 Apr 2024 16:30:56 +0000

The Rust compiler will optimize away bound checks if it can prove that they are not needed through static analysis. You can also opt out of bound checks but that has to go into an unsafe{} block, see https://doc.rust-lang.org/std/vec/struct.Vec.html#method....

Giving Rust a chance for in-kernel codecs

walters — Sat, 27 Apr 2024 16:07:27 +0000

There's also https://github.com/google/wuffs in this space. I've only seen it referenced in this space before. I suspect the tradeoff boils down to the costs of introducing a 3rd programming language; bridging Rust and C is already hard enough.

Giving Rust a chance for in-kernel codecs

dvdeug — Sat, 27 Apr 2024 14:37:38 +0000

I wonder at what point we'll get a programming language in kernel that simply doesn't do bounds checks because it can prove they're not needed. SPARK/Ada can do it, and could be used; Coq and Idris are more powerful and advanced, but hardly kernel usable. It just seems like runtime bounds checks are a waste of time, when they can be made explicit under the control of the programmer, and the compiler can prove they're sufficient.

Giving Rust a chance for in-kernel codecs

atnot — Sat, 27 Apr 2024 13:02:24 +0000

> What is less clear to me is why we have in-kernel codecs in the first place. Is it faster to blit video to a Weyland server from the kernel, or do the codecs need low-level access to the GPU for acceleration?

The article touches on it pretty well I think, but the reason basically comes down to stateful vs stateless decoding hardware. In the olden days hardware media decoders were pretty simple as far as programmers were concerned. You just put the bytes from your file in one end and got pixel data out the other. And vice versa, which is e.g. how those high resolution IP cameras are so cheap.

However, among other things implementing an entire complex codec including the file parsing logic this way is pretty inflexible and kind of wasteful when you have perfectly good CPU cores sitting there anyway. So in the newer stateless model, you instead favor implementing only the "hot loops" of the codec in hardware (some of which may even be shared by multiple codecs) and rely on the driver to pass in the required state. That requires a much deeper understanding of how the codec works, which can't really be fully offloaded to underspace because similar to GPUs, the kernel still needs to validate that the potentially dangerous commands it's getting actually make sense before passing them to the hardware.

Giving Rust a chance for in-kernel codecs

gmatht — Sat, 27 Apr 2024 08:16:46 +0000

It seems to me that the reason to write is Rust would be quite obvious to anyone who hasn't lived under a rock for the last nine years. What is less clear to me is why we have in-kernel codecs in the first place. Is it faster to blit video to a Weyland server from the kernel, or do the codecs need low-level access to the GPU for acceleration?