Splitting implementation and interface in C++

Posted Feb 26, 2025 12:26 UTC (Wed) by farnz (subscriber, #17727)
In reply to: Splitting implementation and interface in C++ by ras
Parent article: Rewriting essential Linux packages in Rust

The problem is that shipping an updated .so in the way C or C++ do it runs the risk of invoking UB from the "safe" subset of Rust, and one of Rust's promises is that invoking UB requires you to use "unsafe Rust". Thus, just copying the C way of doing things isn't acceptable, because it can take a safe program and cause it to invoke UB.

For example, if you go deep into how Vec::shrink_to_fit is implemented internally, you find that you have a set of tiny inline functions that guarantee that an operation is safe that leads down to a monomorphic unsafe shrink_unchecked function that actually does the shrinking.

Because these are all shipped together, it's OK to rearrange where the various checks live; it would be acceptable to move a check out of shrink_unchecked into its callers, for example. But, in the example you describe, you've separated the callers (which are inlined into your binary) from the main body of code (in the shared object), and now we have a problem with updating the shared object; if you move a check from the main body into the callers, you now must know somehow that the callers are out-of-date and need recompiling before you can update the shared object safely.

C and C++ implementations handle this by saying that you must just know that your change (and a security fix is a change to existing stuff, breaking your rule that "new stuff can be added, but existing stuff can't be changed") is one that needs a recompile of dependents, and it's on you to get this right else you face UB for your mistakes. Rust is trying to build a world where you only face UB if you explicitly indicate to the compiler that you know that UB's a risk here, not one where a "trivial" cp new/libfoo.so.1 /usr/lib/libfoo.so.1 can create UB.

Splitting implementation and interface in C++

Posted Feb 26, 2025 13:13 UTC (Wed) by Wol (subscriber, #4433) [Link] (10 responses)

> Because these are all shipped together, it's OK to rearrange where the various checks live;

But if you've explicitly declared an interface, surely that means rearranging the checks across the interface is unsafe in and of itself, so the compiler won't do it ...

Cheers,
Wol

Rearranging across the interface

Posted Feb 26, 2025 14:15 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

That's why I chose that particular example; the explicitly declared interface is:


#[inline]
impl<T, A: Allocator> Vec<T, A> {
    pub fn shrink_to_fit(&mut self);
}

The compiler could stop you changing shrink_to_fit quite easily, because it's an external interface, but it uses a RawVec<T, A> as an implementation detail, which uses a heavily unsafe RawVecInner<A> as a monomorphic implementation detail. The current implementation of Vec::shrink_to_fit checks to see if the length of greater than the capacity, and if it is, calls the inline function RawVec::shrink_to_fit(self.buf, length). In turn, RawVec::shrink_to_fit simply calls the inline function RawVecInner::shrink_to_fit(self.inner, cap, T::LAYOUT) (which is a manual monomorphization so that RawVecInner is only generic over the allocator chosen, not the type in the vector). Following that, RawVecInner::shrink_to_fit arranges to panic if it can't shrink, and calls the inline function RawVecInner::shrink(&mut self, cap, layout). This then panics if you're trying to grow via a call to shrink, then calls the unsafe function RawVecInner::shrink_unchecked.

There's a lot of layers of inline function here, each doing one thing well and calling the next layer. But it would not be unreasonable to change things so that RawVecInner::shrink_unchecked does the capacity check that's currently in RawVecInner::shrink, and then have a later release move the capacity check back to RawVecInner::shrink; the reason they're split the way they are today is that LLVM's optimizer is capable of collapsing all of the checks in the inline functions into a single check-and-branch, but not of optimizing RawVecInner::shrink_unchecked on the assumption that the check will pass, and doing all of this means that LLVM correctly optimizes all the inline functions down to a single check-and-branch-to-cold-path, followed by the happy path code if all checks pass.

And note that the reason that this is split into so many tiny inline functions is that there's other callers in Vec that call different sequences of inline functions - rather than duplicate checks, they've been split into other functions so that you can call at the "right" point after your function-specific checks.

But, going back to the "compiler shouldn't do it"; why should it know that moving a check in one direction inside RawVecInner (which is an implementation detail) is not OK, but moving it in the other direction is OK? For this particular call chain, only RawVecInner::shrink_unchecked is going to be in the shared object, because the remaining layers (which are critical to the safety of this specific operation) are inlined.

Rearranging across the interface

Posted Feb 26, 2025 15:45 UTC (Wed) by Wol (subscriber, #4433) [Link] (8 responses)

So what you're saying is, if you the programmer move the checks across the interface boundary, the compiler has no way of knowing you've done it?

Hmmm ...

That is an edge case, but equally, you do want the compiler to catch it, and I can see why it wouldn't ... but if you're building a library I find it hard to see why you the programmer would want to do it - surely you'd either have both sides of the interface in a single crate, or you're explicitly moving stuff between a library and an application ... not good ...

Cheers,
Wol

Rearranging across the interface

Posted Feb 26, 2025 16:32 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked?

Put concretely, in the private module raw_vec.rs (none of which is exposed as an interface boundary), I move a check from shrink_unchecked to shrink; how is the compiler supposed to know that this is not a safe movement to make, given that shrink is the only caller of shrink_unchecked? Further, how it is supposed to know that moving a check from shrink to shrink_unchecked is safe? And, just to make it lovely and hard, how is it supposed to distinguish "this check is safe to move freely" from "this check must not move"?

And note that "checks" and "security fixes" look exactly the same to the compiler; some code has changed. How is the compiler supposed to distinguish a "good" change from a "bad" change?

Rearranging across the interface

Posted Feb 26, 2025 21:43 UTC (Wed) by Wol (subscriber, #4433) [Link] (3 responses)

> No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked?

Because if the whole aim of this is to create a dynamic library, the compiler NEEDS to know this is an interface boundary, no?

Cheers,
Wol

Rearranging across the interface

Posted Feb 27, 2025 10:44 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

But then you're getting into a mess around defining what is, and is not, a safe code change inside the ABI boundary. If you do make the internals of a crate (not the exported interface) the ABI boundary, you're now in a position where the compiler has to make a judgement call - "is this change inside the internals of a library a bad change, or a good change?".

Note that when making this judgement call, it can't just look at things like "is this moving a check across an internal boundary", since some moves across an internal boundary are safe, nor can you condition it on removing a check from inside the boundary (since I may remove an internal check that is guaranteed to be true since all the inline functions that can call this have always done an equivalent check, and I'm no longer expecting more inline functions without the check).

Rearranging across the interface

Posted Feb 27, 2025 14:07 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

but if the crate IS the compilation object (as it would be if it's a library, no?) then surely the external boundary is the external declaration - what the library provides to all and sundry - then there's no problem with any internal moves?

If an external application cannot see the boundary, then it's not a boundary! So you'd need to include the definition of all the Ts in Vec<T> you wanted to export, but the idea is that the crate presents a frozen interface to the outside world, and what goes on inside the crate is none of the caller's business. So internal boundaries aren't boundaries.

Cheers,
Wol

Rearranging across the interface

Posted Feb 27, 2025 14:12 UTC (Thu) by farnz (subscriber, #17727) [Link]

No, for performance reasons. We inline parts of our libraries (even in C, where the inlined parts go in the .h file) into their callers because the result of doing so is a massive performance boost from the optimizer - which can do things like reason "hey, len can't be zero here, so I can eliminate the code that handles the empty list case completely".

To get the sort of boundary you're describing, we do static linking and carefully hand-crafted interfaces for plugins. That's the state of play today, for everything from assembly through C to Agda and Idrs; the goal, however, is to dynamically link, which means that we need to go deeper. And then we have a problem, because the moment you go deeper, your boundaries stop applying, thanks to inlining.

Rearranging across the interface

Posted Feb 26, 2025 18:04 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

As I understand it, the issue is that "interface boundary" can mean either "API boundary" or "ABI boundary". `Vec<T, A>::shrink_to_fit` is an API boundary; it's publicly documented and safe and has stability guarantees. But in a hypothetical Rust ABI, that API couldn't be an ABI boundary, because it depends on generic type parameters that aren't known when the .so is compiled.

`RawVecInner<A>::shrink_to_fit` could be an ABI boundary, because that doesn't depend on `T` (and we'll ignore `A`), but it's currently not an API boundary. It can't be made into a public API because its safety depends on non-trivial preconditions (like being told the correct alignment of `T`) and that'd be terrible API design - preconditions should be as tightly scoped as possible, within a function or module or crate. So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.

Rearranging across the interface

Posted Feb 26, 2025 22:46 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

> So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.

Like putting the equivalent of a C .h in the crate?

But I would have thought if the compiler can prove the preconditions as part of a monolithic compilation, surely it must be able to encode them in some sort of .h interface in a library crate?

Of course, if you get two libraries calling each other, then the compiler might have to inject glue code to rearrange the structures passed bwtween the two :-)

Cheers,
Wol

Rearranging across the interface

Posted Feb 27, 2025 12:43 UTC (Thu) by farnz (subscriber, #17727) [Link]

The compiler doesn't prove anything for the danger cases; it relies on the human assertion that they've checked that this unsafe block is safe, given the code that they can see today.

The challenge is that we're talking about separating the unsafe block (in an inline function) from the unsafe fn it calls (in the shared object); this means that the human not only has to consider the unsafe code as it stands today, but all possible future and past variants on the unsafe code, otherwise Rust's safety promise is not upheld.

That's clearly an intractable problem; the question is about reducing it down to a tractable problem. There's three basic routes to make it tractable:

Ignore the problem, and rely on the humans being infallible and remembering to change a "version identifier" when making a change to both the unsafe fn and its inlined callers. This makes it far too likely that you'll breach Rust's safety promises for Rust to adopt this.
Ensure that there's an ABI change whenever anything in the body of the unsafe fn changes. This is problematic, because security fixes are likely to change the body, and those are the cases where you most want to be able to swap in a new shared object.
Build a reverse tree of inline functions in this library that call the unsafe fn, and ensure that if any of them changes, the unsafe fn's ABI changes. This means that you can change the unsafe fn freely as needed for a security fix, but if you change any inlined caller, or add a new inlined caller, you change the ABI of the unsafe fn and require a new shared object, even if this change was harmless.

There's room to be sophisticated with symbol versioning in all cases; for example, you can have a human assert that this version of the unsafe fn is compatible with the inlined callers from older versions (thus allowing a swap of a shared object), or in case 3 you can use it to allow new inlined callers to use a new shared object, while allowing the existing ones to use either old or new shared objects.

In all cases, though, the trouble is preventing the human proofs of correctness being invalidated by creating new combinations of inline functions and out-of-line unsafe code that weren't present in any source version; you want the combinations to be ones that a human has approved.