|
|
Log in / Subscribe / Register

Rust functions are a bit more complicated than described

Rust functions are a bit more complicated than described

Posted Oct 23, 2025 21:17 UTC (Thu) by NYKevin (subscriber, #129325)
Parent article: DebugFS on Rust

> Maurer's solution relies on the fact that, in Rust, every function and closure has its own unique type at compile time. This is done because it makes it easier for LLVM to apply certain optimizations — a call through Rust function pointer can often be lowered to a direct jump or a jump through a dispatch table, instead of a call through an actual pointer. This makes Rust function types unique zero-sized types: there is no actual data associated with them, because the type is enough for the compiler to determine the address of the function.

Just to clarify things a bit, Rust actually has three different families of types that are relevant here:

* Function pointers (written as fn(...) -> ..., note the lowercase "f") - These are roughly equivalent to C function pointers, with the proviso that the pointee must be a safe function, or else the type must be prefixed with unsafe. In practice, these are not commonly used in Rust, because most contexts where you need them are better served by trait objects (dyn Trait, the Rust equivalent of C++ virtual method dispatch).
* Function item types - The type of a bare function name. For example, if foo() is declared somewhere in scope, then the expression "foo" has a function item type. These are zero-sized types as the article describes, but to be clear, LLVM does not "optimize" them. Instead, rustc emits a direct call to the underlying function at each call site (resolved during monomorphization, if necessary), and LLVM never even knows that we did any indirection in the first place.
* Closure types - For closures that do not capture anything, these are largely equivalent to function item types (the closure is emitted as if it was a top-level function, and callsites are emitted as direct calls). For capturing closures, these are anonymous structs that hold all the captured values, the closure is modified to take this struct as a hidden argument, and the callsites are again emitted as direct calls.

Function item types and closure types are unnameable within the Rust type system - that is, while you can write the name foo as the only value of the foo function item type, you cannot write the name of that type (e.g. in a function signature). To use these types, they must be bound to a generic parameter, and you must then indicate to the Rust compiler that said parameter is callable using the Fn traits (note capital "F"):

* FnOnce(...) -> ... is implemented for everything callable (all types mentioned above), but it only allows you to call it once (because the underlying type might be a closure that consumes a captured value, and can't be called again). This takes a receiver type of self, which would normally indicate that it must have a size known at compile time. But in a mad anomaly, there is a blanket Box<T: FnOnce + ?Sized> implementation, which somehow(???) moves an unsized object out of the Box and passes it through to the FnOnce trait as if this restriction did not exist.[1] So you can have an unsized FnOnce if it lives on the heap, at least (and I guess the standard library can just decide it doesn't want to follow the language's usual rules if they prove inconvenient).
* FnMut(...) -> ... is implemented for everything that you can call more than once, but requires you to have &mut to the callable (because it might be a closure that mutates one of its captured values). Unlike FnOnce, neither this trait nor Fn take a self receiver by value, so there's no issue with unsized types here.
* Fn(...) -> ... is implemented for all callables that don't mutate or consume state, including all function item types, closures that don't mutate their captures, and (safe) function pointers, as well as (shared) references to all of the above.

If the generic parameter is bound to a function item type or closure type, then in order for monomorphization to work, rustc must consider both the callsite and the callee's source code at the same time. That in turn implies that monomorphization has the opportunity to inline these calls, although I'm not sure whether rustc makes that decision by itself or somehow involves LLVM in the process. This can be done cross-crate, even when LTO is disabled, because (as mentioned) monomorphization has an overriding need to look at the source code anyway.

(There are also async functions, but those are mostly just syntactic sugar for regular functions that happen to return impl Future, and then all the really complicated stuff happens elsewhere in the type system.)

[1]: https://doc.rust-lang.org/src/alloc/boxed.rs.html#1967


to post comments

Rust functions are a bit more complicated than described

Posted Oct 23, 2025 22:20 UTC (Thu) by daroc (editor, #160859) [Link] (1 responses)

Thank you; that's a helpful clarification. I knew about the distinction between function pointers and function types, but I had misremembered which part of the compiler used the type information to emit static calls. I'll add a correction.

Rust functions are a bit more complicated than described

Posted Oct 26, 2025 14:31 UTC (Sun) by pbonzini (subscriber, #60935) [Link]

For what it's worth, the same function type "rematerialization" is quite pervasive in QEMU's experimental Rust bindings.

The Rust wrappers for QEMU callbacks take the target Rust function as a type parameter, and a utility trait is added to Fn that causes a compiler error if that function is not zero-sized (https://lists.nongnu.org/archive/html/qemu-rust/2024-12/m...).

Rust functions are a bit more complicated than described

Posted Nov 25, 2025 9:55 UTC (Tue) by ras (subscriber, #33059) [Link]

> But in a mad anomaly, there is a blanket Box<T: FnOnce + ?Sized> implementation

I hadn't noticed FnOnce talking self rather than &mut self. But Box<T: FnOnce + ?Sized> has the same sort of twisted logic Box<[u8]> uses. In both cases, there will never be an actual instance of the unsized type. The instance had to be sized, because Box has to pass its size to the memory allocator. You can't call Box::<[u8]>::new() for that reason.

A variable of type Box<[u8]> will always hold an instance whose type was Box<[u8; N]>, which is a sized type. The [u8; N] value was put into the memory allocated by the Box by the compiler intrinsic box_new(), and it knows to store N along with the array. Later the &**self somehow constructs an slice from the information stored by box_new(). There is some magic going in in the &**, because how did it know the value it is dereferencing was created by box_new()? But apart from that it seems pedestrian.

Similarly, the variable may have type Box<T: FnOnce + ?Sized>, but the thing stored in the box isn't ?Sized - it has a very definite size, and just like the &**self, FnOnce::call_once() could know what it was, if it cared. I don't think the rules about self being sized are being bent too much, at least not more than &**self bends them.

The quirk I did notice is a Box is defined as tuple of length 2, viz: Box<...>(Unique<T>, Allocator). So presumably &**self in deref() is being interpreted as &**self.0. I don't know why &**self is acceptable. Allocation will be a zero sized type, but if I use a ZST in that way I'm still not allowed to write to dereference the tuple with * in my code.

It's all very magic.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds