|
|
Subscribe / Log in / New account

DebugFS on Rust

By Daroc Alden
October 22, 2025

Kangrejos

DebugFS is the kernel's anything-goes, no-rules interface: whenever a kernel developer needs quick access to internal details of the kernel to debug a problem, or to implement an experimental control interface, they can expose them via DebugFS. This is possible because DebugFS is not subject to the normal rules for user-space-interface stability, nor to the rules about exposing sensitive kernel information. Supporting DebugFS in Rust drivers is an important step toward being able to debug real drivers on real hardware. Matthew Maurer spoke at Kangrejos 2025 about his recently merged DebugFS bindings for Rust.

Maurer began with an overview of DebugFS, including the things that make implementing a Rust API tricky. DebugFS files should outlive the private data that they allow access to, in case someone holds a file descriptor open after the underlying object has gone away. Also, DebugFS directory entries can be removed at any time, or will be automatically removed when the parent directory entry is destroyed. "That will come back to haunt us." Finally, DebugFS directories have to be manually torn down; they aren't scoped to an individual kernel module.

[Matthew Maurer]

All of this comes together to make a set of lifetime constraints that's difficult to faithfully model in Rust. At first, Maurer thought to implement a DebugFS file as a weak reference-counted pointer to a Rust trait object. That doesn't work for several reasons, including the fact that DebugFS files don't have a destruction callback. Also, DebugFS gives files one word of private data — normally used as a pointer to the object they are concerned with — but Rust pointers to trait objects are two words wide (one pointer to the object, and one to its virtual method table).

These problems aren't insurmountable — Maurer could have just added an additional pointer indirection — but that wouldn't be elegant. He wanted to find a solution that naturally fits with the lifecycle of a DebugFS directory entry, while only having one word of private data and minimal overhead. The design that Maurer ended up proposing was to have the directory entries reference-counted such that they are not destroyed until all of their child objects have been dropped, and the directory itself has been dropped. To accomplish this, two different interfaces would be exposed to Rust: a simple one for DebugFS directories with simple lifetimes, as well as a more complex, general one.

The simpler API, which Maurer called the "File API", has the DebugFS file actually own its associated data. Exposing some existing Rust data is as simple as wrapping it in a debugfs::File<T>; by default, the read and write operations for the file will convert the value to or from a string and read or update it as appropriate. The programmer can attach their own callbacks, instead, to implement custom behaviors. The downside is that there is no way to have multiple files reference the same data (without some internal reference-counted pointer), and it's not possible to conditionally provide a file based on whether some run-time value is true or false.

The more complex API, the "Scope API", allows multiple files to refer to the same data, to refer to multiple separate structures in any combination, to create files conditionally, etc. In turn, it can't delete individual subdirectories or files — the whole DebugFS directory needs to be released at once.

Maurer went through examples of how to use each API; while a bit complex, the use of the file API could be substantially simplified if Rust gains built-in in-place initialization. Neither API was terribly surprising — but the obscure contortions (read: cool hacks) required to make them work efficiently were considerably more interesting.

Pointer smuggling

As previously mentioned, DebugFS provides only a single word of private data for file structures, which is ordinarily a pointer to the underlying data for the DebugFS file, a property that Maurer wanted to preserve. But part of the utility of DebugFS is that the developer can override the file operations with arbitrary functions; that makes it easy to trigger actions in a driver in response to reads or writes to a DebugFS file. It would be possible to do this by making the user of DebugFS fill out a struct file_operations, but Maurer wanted a less verbose API. The ergonomic way to encode this in the Rust APIs is to allow the programmer to attach a function or closure to the debugfs::File object. Somehow, those function pointers need to make their way into the file_operations structure used by DebugFS. But Maurer also didn't want the API to need to allocate space for the structure at run time — he wanted the appropriate structure to be generated statically, at compile time, making the entire Rust DebugFS interface allocation-free.

Maurer's solution relies on the fact that, in Rust, every function and closure has its own unique type at compile time. This is done because it makes it easier for LLVM the Rust compiler to apply certain optimizations — a call through Rust function pointer can often be lowered to a direct jump or a jump through a dispatch table, instead of a call through an actual pointer. This makes Rust function types unique zero-sized types: there is no actual data associated with them, because the type is enough for the compiler to determine the address of the function.

The *_callback_file() functions in his new API, which take callbacks to implement the read and write operations on a file, don't actually store the provided function pointers anywhere. Instead, the type of the callback is passed as a generic argument to the code that fills out instances of the file_operations structure. When the Rust code is monomorphized during compilation, a different file_operations structure is generated for each file that uses a different set of callbacks. The generic code turns the type of the function back into a pointer to the actual function itself, and calls it. Since the conversion is done at compile time, the pointer to the callback never actually has to be stored anywhere outside the file_operations structure at run time. This trick effectively "smuggles" the function pointer through the type system, which lets Maurer pass off the work of constructing all of the needed file_operations structures to the compiler's monomorphization implementation and avoid allocating.

The reaction to this explanation was mixed. While everyone present agreed that it was clever, and permitted writing a nice API, there was some sentiment that it might be too clever. Gary Guo pointed out one potential problem with the (unsafe) code that Maurer wrote to turn a function type back into an actual function pointer: while it was correct for function types, attempting to use it with other zero-sized types could cause undefined behavior, because it didn't ensure that internal invariants of the type are checked.

There are some zero-sized types where the actual address of the value is important, Guo explained. For example, a programmer could create a zero-sized type representing that the data at a particular address is readable. Alice Ryhl suggested restricting the function to only operate on types that implement the Copy trait, since they can't have invariants that rely on having a stable address. Maurer replied that he wasn't worried in this case, because the function was intended as an internal implementation detail of the DebugFS interface, but agreed that in the general case requiring the type to implement the Copy trait would make sense. One of the assembled developers asked Pierre-Emmanuel Patry whether he anticipated supporting code like this to be a problem for gccrs; he did not think that it would impose any additional burden, since some parts of the standard library already rely on the behavior of function types.

Andreas Hindborg asked for more details on why smuggling a pointer through the type system like this was permitted — specifically, why Maurer had claimed that the type needed to be "inhabited" for the trick to work. Zero-sized types can either have one valid value (the typical case), or no valid values, Maurer explained. So, if someone tried to use his trick to create a pointer to a type that exists, but where constructing a value of the type is impossible, they could break Rust's type system — which is why the helper function is unsafe.

Hindborg asked whether the pointer-smuggling trick was documented anywhere. Maurer replied: "It's well documented in the code", to general laughter. Guo asked whether they could just change the DebugFS C structure to have two pointers, and avoid this whole workaround. Maurer passed the question off to Greg Kroah-Hartman, who answered that he didn't think they could, because it would impact the layout of the inode structure, which is widely used outside DebugFS. In his opinion, this was a case of "you optimized for fun" — the equivalent C code just allocates and eats the cost of an additional pointer indirection. But he didn't think there was anything wrong with odd techniques being used here; in many ways, it's what DebugFS is there for.

Ultimately, the pointer-smuggling solution did remain in the final patch set that was merged for the 6.18 kernel. The trick is unlikely to be adapted for use in wider contexts in the kernel's Rust bindings, though.


Index entries for this article
KernelDevelopment tools/Rust
ConferenceKangrejos/2025


to post comments

So do they just leak on module unload?

Posted Oct 23, 2025 7:26 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

> Finally, DebugFS directories have to be manually torn down; they aren't scoped to an individual kernel module.

So what happens on module unload here? Do they just leak if the module doesn't implement cleanup? Can the same module clean them up if it is loaded again? Or a newer version of that module during development where the programmer initially forgets to implement cleanup?

Honestly, this seems like the exact kind of sloppy design common in C that Rust ownership is supposed to make harder to do accidentally.

So do they just leak on module unload?

Posted Oct 23, 2025 13:15 UTC (Thu) by daroc (editor, #160859) [Link]

As I understand it, they're tied to a specific directory name, not to a module. So any part of the kernel that asks to unload that directory will unload them. Therefore yes: a newer version of the same module can tear down old entries if it wants to.

On the Rust side of things, the files can persist until the File object is dropped or the scoped directory handle is dropped, depending on which API you use. Unless the driver takes special action to ensure those things leak, they should normally be dropped on module unload. So Rust drivers, at least, don't need to worry about that.

Rust functions are a bit more complicated than described

Posted Oct 23, 2025 21:17 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (2 responses)

> Maurer's solution relies on the fact that, in Rust, every function and closure has its own unique type at compile time. This is done because it makes it easier for LLVM to apply certain optimizations — a call through Rust function pointer can often be lowered to a direct jump or a jump through a dispatch table, instead of a call through an actual pointer. This makes Rust function types unique zero-sized types: there is no actual data associated with them, because the type is enough for the compiler to determine the address of the function.

Just to clarify things a bit, Rust actually has three different families of types that are relevant here:

* Function pointers (written as fn(...) -> ..., note the lowercase "f") - These are roughly equivalent to C function pointers, with the proviso that the pointee must be a safe function, or else the type must be prefixed with unsafe. In practice, these are not commonly used in Rust, because most contexts where you need them are better served by trait objects (dyn Trait, the Rust equivalent of C++ virtual method dispatch).
* Function item types - The type of a bare function name. For example, if foo() is declared somewhere in scope, then the expression "foo" has a function item type. These are zero-sized types as the article describes, but to be clear, LLVM does not "optimize" them. Instead, rustc emits a direct call to the underlying function at each call site (resolved during monomorphization, if necessary), and LLVM never even knows that we did any indirection in the first place.
* Closure types - For closures that do not capture anything, these are largely equivalent to function item types (the closure is emitted as if it was a top-level function, and callsites are emitted as direct calls). For capturing closures, these are anonymous structs that hold all the captured values, the closure is modified to take this struct as a hidden argument, and the callsites are again emitted as direct calls.

Function item types and closure types are unnameable within the Rust type system - that is, while you can write the name foo as the only value of the foo function item type, you cannot write the name of that type (e.g. in a function signature). To use these types, they must be bound to a generic parameter, and you must then indicate to the Rust compiler that said parameter is callable using the Fn traits (note capital "F"):

* FnOnce(...) -> ... is implemented for everything callable (all types mentioned above), but it only allows you to call it once (because the underlying type might be a closure that consumes a captured value, and can't be called again). This takes a receiver type of self, which would normally indicate that it must have a size known at compile time. But in a mad anomaly, there is a blanket Box<T: FnOnce + ?Sized> implementation, which somehow(???) moves an unsized object out of the Box and passes it through to the FnOnce trait as if this restriction did not exist.[1] So you can have an unsized FnOnce if it lives on the heap, at least (and I guess the standard library can just decide it doesn't want to follow the language's usual rules if they prove inconvenient).
* FnMut(...) -> ... is implemented for everything that you can call more than once, but requires you to have &mut to the callable (because it might be a closure that mutates one of its captured values). Unlike FnOnce, neither this trait nor Fn take a self receiver by value, so there's no issue with unsized types here.
* Fn(...) -> ... is implemented for all callables that don't mutate or consume state, including all function item types, closures that don't mutate their captures, and (safe) function pointers, as well as (shared) references to all of the above.

If the generic parameter is bound to a function item type or closure type, then in order for monomorphization to work, rustc must consider both the callsite and the callee's source code at the same time. That in turn implies that monomorphization has the opportunity to inline these calls, although I'm not sure whether rustc makes that decision by itself or somehow involves LLVM in the process. This can be done cross-crate, even when LTO is disabled, because (as mentioned) monomorphization has an overriding need to look at the source code anyway.

(There are also async functions, but those are mostly just syntactic sugar for regular functions that happen to return impl Future, and then all the really complicated stuff happens elsewhere in the type system.)

[1]: https://doc.rust-lang.org/src/alloc/boxed.rs.html#1967

Rust functions are a bit more complicated than described

Posted Oct 23, 2025 22:20 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

Thank you; that's a helpful clarification. I knew about the distinction between function pointers and function types, but I had misremembered which part of the compiler used the type information to emit static calls. I'll add a correction.

Rust functions are a bit more complicated than described

Posted Oct 26, 2025 14:31 UTC (Sun) by pbonzini (subscriber, #60935) [Link]

For what it's worth, the same function type "rematerialization" is quite pervasive in QEMU's experimental Rust bindings.

The Rust wrappers for QEMU callbacks take the target Rust function as a type parameter, and a utility trait is added to Fn that causes a compiler error if that function is not zero-sized (https://lists.nongnu.org/archive/html/qemu-rust/2024-12/m...).


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds