LWN: Comments on "Rewriting essential Linux packages in Rust" https://lwn.net/Articles/1007907/ This is a special feed containing comments posted to the individual LWN article titled "Rewriting essential Linux packages in Rust". en-us Mon, 22 Sep 2025 01:40:50 +0000 Mon, 22 Sep 2025 01:40:50 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Writing clarity request https://lwn.net/Articles/1012879/ https://lwn.net/Articles/1012879/ sammythesnake <div class="FormattedComment"> My dad reached a certain moment in life when he had to get glasses, both because his eyesight was getting worse, he insisted, but because his arms were getting shorter...<br> <p> I passed the same milestone myself some time ago, too :-/<br> </div> Tue, 04 Mar 2025 11:16:11 +0000 fix uniq -c https://lwn.net/Articles/1012552/ https://lwn.net/Articles/1012552/ jkingweb <div class="FormattedComment"> The trouble with writing a drop-in replacement for X while trying to also improve on X is that as X changes, you may cease to be a drop-in replacement because your enhancements become incompatible with new functionality. MariaDB ran into this problem.<br> </div> Sat, 01 Mar 2025 17:00:48 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1012259/ https://lwn.net/Articles/1012259/ mb <div class="FormattedComment"> The default for cargo install is to ignore the lock file:<br> <a href="https://doc.rust-lang.org/cargo/commands/cargo-install.html#dealing-with-the-lockfile">https://doc.rust-lang.org/cargo/commands/cargo-install.ht...</a><br> <p> There's no need to use a lock file or to use an online crates forge.<br> You can just use what is --offline available in your distribution.<br> <p> I don't really get it why this would be a nontrivial task.<br> <p> <span class="QuotedText">&gt;I dislike just ignoring the</span><br> <p> Well, you can either use it or ignore it.<br> If you don't want to use it, because you want to use your own packaged dependencies there's only the option to ignore it, right?<br> </div> Thu, 27 Feb 2025 19:11:49 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1012050/ https://lwn.net/Articles/1012050/ Cyberax <div class="FormattedComment"> I checked a few Rust applications, and most of them do have lockfiles. So something will need to be done with that. I dislike just ignoring them, but then perhaps some form of a limited override should be OK?<br> <p> And yep, it's strictly better than C. It's just not a trivial task...<br> </div> Thu, 27 Feb 2025 18:41:28 +0000 Rearranging across the interface https://lwn.net/Articles/1012135/ https://lwn.net/Articles/1012135/ farnz No, for performance reasons. We inline parts of our libraries (even in C, where the inlined parts go in the <tt>.h</tt> file) into their callers because the result of doing so is a massive performance boost from the optimizer - which can do things like reason "hey, <tt>len</tt> can't be zero here, so I can eliminate the code that handles the empty list case completely". <p>To get the sort of boundary you're describing, we do static linking and carefully hand-crafted interfaces for plugins. That's the state of play today, for everything from assembly through C to Agda and Idrs; the goal, however, is to dynamically link, which means that we need to go deeper. And then we have a problem, because the moment you go deeper, your boundaries stop applying, thanks to inlining. Thu, 27 Feb 2025 14:12:18 +0000 Rearranging across the interface https://lwn.net/Articles/1012132/ https://lwn.net/Articles/1012132/ Wol <div class="FormattedComment"> but if the crate IS the compilation object (as it would be if it's a library, no?) then surely the external boundary is the external declaration - what the library provides to all and sundry - then there's no problem with any internal moves?<br> <p> If an external application cannot see the boundary, then it's not a boundary! So you'd need to include the definition of all the Ts in Vec&lt;T&gt; you wanted to export, but the idea is that the crate presents a frozen interface to the outside world, and what goes on inside the crate is none of the caller's business. So internal boundaries aren't boundaries.<br> <p> Cheers,<br> Wol<br> </div> Thu, 27 Feb 2025 14:07:03 +0000 Rearranging across the interface https://lwn.net/Articles/1012079/ https://lwn.net/Articles/1012079/ farnz The compiler doesn't prove anything for the danger cases; it relies on the human assertion that they've checked that this <tt>unsafe</tt> block is safe, given the code that they can see today. <p>The challenge is that we're talking about separating the <tt>unsafe</tt> block (in an inline function) from the <tt>unsafe fn</tt> it calls (in the shared object); this means that the human not only has to consider the unsafe code as it stands today, but all possible future and past variants on the unsafe code, otherwise Rust's safety promise is not upheld. <p>That's clearly an intractable problem; the question is about reducing it down to a tractable problem. There's three basic routes to make it tractable: <ol> <li>Ignore the problem, and rely on the humans being infallible and remembering to change a "version identifier" when making a change to both the <tt>unsafe fn</tt> and its inlined callers. This makes it far too likely that you'll breach Rust's safety promises for Rust to adopt this. <li>Ensure that there's an ABI change whenever anything in the body of the <tt>unsafe fn</tt> changes. This is problematic, because security fixes are likely to change the body, and those are the cases where you most want to be able to swap in a new shared object. <li>Build a reverse tree of inline functions in this library that call the <tt>unsafe fn</tt>, and ensure that if any of them changes, the <tt>unsafe fn</tt>'s ABI changes. This means that you can change the <tt>unsafe fn</tt> freely as needed for a security fix, but if you change any inlined caller, or add a new inlined caller, you change the ABI of the <tt>unsafe fn</tt> and require a new shared object, even if this change was harmless. </ol> <p>There's room to be sophisticated with symbol versioning in all cases; for example, you can have a human assert that this version of the <tt>unsafe fn</tt> is compatible with the inlined callers from older versions (thus allowing a swap of a shared object), or in case 3 you can use it to allow new inlined callers to use a new shared object, while allowing the existing ones to use either old or new shared objects. <p>In all cases, though, the trouble is preventing the human proofs of correctness being invalidated by creating new combinations of inline functions and out-of-line unsafe code that weren't present in any source version; you want the combinations to be ones that a human has approved. Thu, 27 Feb 2025 12:43:36 +0000 Rearranging across the interface https://lwn.net/Articles/1012072/ https://lwn.net/Articles/1012072/ farnz But then you're getting into a mess around defining what is, and is not, a safe code change inside the ABI boundary. If you do make the internals of a crate (<em>not</em> the exported interface) the ABI boundary, you're now in a position where the compiler has to make a judgement call - "is this change inside the internals of a library a bad change, or a good change?". <p>Note that when making this judgement call, it can't just look at things like "is this moving a check across an internal boundary", since some moves across an internal boundary are safe, nor can you condition it on removing a check from inside the boundary (since I may remove an internal check that is guaranteed to be true since all the inline functions that can call this have always done an equivalent check, and I'm no longer expecting more inline functions without the check). Thu, 27 Feb 2025 10:44:15 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1012045/ https://lwn.net/Articles/1012045/ mb <div class="FormattedComment"> It's fully up to the builder/distributor what to do with application level locks.<br> (I almost never use them and I almost always provide them.)<br> <p> As 1.1.124 and 1.1.123 are semantically compatible versions you can just upgrade all packages to 1.1.124. And it's also likely that it would work with 1.1.123, too.<br> <p> This is really not different at all from C library dependencies with backward compatible versions.<br> Except that typical C applications simply don't provide a lock information, so you're fully on your own.<br> Providing lock information is better than providing no lock information.<br> <p> </div> Thu, 27 Feb 2025 05:27:00 +0000 Rearranging across the interface https://lwn.net/Articles/1012021/ https://lwn.net/Articles/1012021/ Wol <div class="FormattedComment"> <span class="QuotedText">&gt; So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.</span><br> <p> Like putting the equivalent of a C .h in the crate?<br> <p> But I would have thought if the compiler can prove the preconditions as part of a monolithic compilation, surely it must be able to encode them in some sort of .h interface in a library crate?<br> <p> Of course, if you get two libraries calling each other, then the compiler might have to inject glue code to rearrange the structures passed bwtween the two :-)<br> <p> Cheers,<br> Wol<br> </div> Wed, 26 Feb 2025 22:46:10 +0000 Rearranging across the interface https://lwn.net/Articles/1012008/ https://lwn.net/Articles/1012008/ Wol <div class="FormattedComment"> <span class="QuotedText">&gt; No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked? </span><br> <p> Because if the whole aim of this is to create a dynamic library, the compiler NEEDS to know this is an interface boundary, no?<br> <p> Cheers,<br> Wol<br> </div> Wed, 26 Feb 2025 21:43:54 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1012005/ https://lwn.net/Articles/1012005/ Cyberax <div class="FormattedComment"> <span class="QuotedText">&gt; Only the top level application crate lock matters.</span><br> <p> Yes, that's what I mean. One app can lock somelibrary#1.1.123, and another one at somelibrary#1.1.124 If this is packaged naïvely, you'll end up with two shared objects for `somelibrary`.<br> <p> </div> Wed, 26 Feb 2025 21:25:33 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1012004/ https://lwn.net/Articles/1012004/ mb <div class="FormattedComment"> <span class="QuotedText">&gt;if your dependencies are locked at a different time.</span><br> <p> Dependency locks are ignored.<br> Only the top level application crate lock matters.<br> <p> <span class="QuotedText">&gt;slightly different versions of libraries</span><br> <p> No. It can include several *incompatible* versions of the libraries with a different major semantic version.<br> </div> Wed, 26 Feb 2025 21:19:52 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1012003/ https://lwn.net/Articles/1012003/ Cyberax <div class="FormattedComment"> There are several issues. Rust is very happy to use slightly different versions of libraries, if your dependencies are locked at a different time.<br> <p> This is fine for static linking, but you don't generally want to end up with 15 versions of the same shared library with a slightly different patch version. So you ideally should be able to control the versions so that distro-provided libraries are used as much as possible, overriding Cargo's resolution mechanism. Ideally, making sure that you get CVE fixes.<br> <p> Doing it properly is not trivial.<br> </div> Wed, 26 Feb 2025 21:14:29 +0000 Rearranging across the interface https://lwn.net/Articles/1011983/ https://lwn.net/Articles/1011983/ excors <div class="FormattedComment"> As I understand it, the issue is that "interface boundary" can mean either "API boundary" or "ABI boundary". `Vec&lt;T, A&gt;::shrink_to_fit` is an API boundary; it's publicly documented and safe and has stability guarantees. But in a hypothetical Rust ABI, that API couldn't be an ABI boundary, because it depends on generic type parameters that aren't known when the .so is compiled.<br> <p> `RawVecInner&lt;A&gt;::shrink_to_fit` could be an ABI boundary, because that doesn't depend on `T` (and we'll ignore `A`), but it's currently not an API boundary. It can't be made into a public API because its safety depends on non-trivial preconditions (like being told the correct alignment of `T`) and that'd be terrible API design - preconditions should be as tightly scoped as possible, within a function or module or crate. So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.<br> </div> Wed, 26 Feb 2025 18:04:28 +0000 Rearranging across the interface https://lwn.net/Articles/1011961/ https://lwn.net/Articles/1011961/ farnz <p>No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked? <p>Put concretely, in the private module <tt>raw_vec.rs</tt> (none of which is exposed as an interface boundary), I move a check from <tt>shrink_unchecked</tt> to <tt>shrink</tt>; how is the compiler supposed to know that this is not a safe movement to make, given that <tt>shrink</tt> is the only caller of <tt>shrink_unchecked</tt>? Further, how it is supposed to know that moving a check from <tt>shrink</tt> to <tt>shrink_unchecked</tt> is safe? And, just to make it lovely and hard, how is it supposed to distinguish "this check is safe to move freely" from "this check must not move"? <p>And note that "checks" and "security fixes" look exactly the same to the compiler; some code has changed. How is the compiler supposed to distinguish a "good" change from a "bad" change? Wed, 26 Feb 2025 16:32:54 +0000 Rearranging across the interface https://lwn.net/Articles/1011958/ https://lwn.net/Articles/1011958/ Wol <div class="FormattedComment"> So what you're saying is, if you the programmer move the checks across the interface boundary, the compiler has no way of knowing you've done it?<br> <p> Hmmm ...<br> <p> That is an edge case, but equally, you do want the compiler to catch it, and I can see why it wouldn't ... but if you're building a library I find it hard to see why you the programmer would want to do it - surely you'd either have both sides of the interface in a single crate, or you're explicitly moving stuff between a library and an application ... not good ...<br> <p> Cheers,<br> Wol<br> </div> Wed, 26 Feb 2025 15:45:51 +0000 Rearranging across the interface https://lwn.net/Articles/1011893/ https://lwn.net/Articles/1011893/ farnz That's why I chose that particular example; the explicitly declared interface is: <pre> <code> #[inline] impl&lt;T, A: Allocator&gt; Vec&lt;T, A&gt; { pub fn shrink_to_fit(&amp;mut self); } </code> </pre> <p>The compiler could stop you changing <tt>shrink_to_fit</tt> quite easily, because it's an external interface, but it uses a <tt>RawVec&lt;T, A&gt;</tt> as an implementation detail, which uses a heavily unsafe <tt>RawVecInner&lt;A&gt;</tt> as a monomorphic implementation detail. The current implementation of <tt>Vec::shrink_to_fit</tt> checks to see if the length of greater than the capacity, and if it is, calls the inline function <tt>RawVec::shrink_to_fit(self.buf, length)</tt>. In turn, <tt>RawVec::shrink_to_fit</tt> simply calls the inline function <tt>RawVecInner::shrink_to_fit(self.inner, cap, T::LAYOUT)</tt> (which is a manual monomorphization so that <tt>RawVecInner</tt> is only generic over the allocator chosen, not the type in the vector). Following that, <tt>RawVecInner::shrink_to_fit</tt> arranges to panic if it can't shrink, and calls the inline function <tt>RawVecInner::shrink(&amp;mut self, cap, layout)</tt>. This then panics if you're trying to grow via a call to <tt>shrink</tt>, then calls the unsafe function <tt>RawVecInner::shrink_unchecked</tt>. <p>There's a lot of layers of inline function here, each doing one thing well and calling the next layer. But it would not be unreasonable to change things so that <tt>RawVecInner::shrink_unchecked</tt> does the capacity check that's currently in <tt>RawVecInner::shrink</tt>, and then have a later release move the capacity check back to <tt>RawVecInner::shrink</tt>; the reason they're split the way they are today is that LLVM's optimizer is capable of collapsing all of the checks in the inline functions into a single check-and-branch, but not of optimizing <tt>RawVecInner::shrink_unchecked</tt> on the assumption that the check will pass, and doing all of this means that LLVM correctly optimizes all the inline functions down to a single check-and-branch-to-cold-path, followed by the happy path code if all checks pass. <p>And note that the reason that this is split into so many tiny inline functions is that there's other callers in <tt>Vec</tt> that call different sequences of inline functions - rather than duplicate checks, they've been split into other functions so that you can call at the "right" point after your function-specific checks. <p>But, going back to the "compiler shouldn't do it"; why should it know that moving a check in one direction inside <tt>RawVecInner</tt> (which is an implementation detail) is not OK, but moving it in the other direction is OK? For this particular call chain, only <tt>RawVecInner::shrink_unchecked</tt> is going to be in the shared object, because the remaining layers (which are critical to the safety of this specific operation) are inlined. Wed, 26 Feb 2025 14:15:46 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011892/ https://lwn.net/Articles/1011892/ Wol <div class="FormattedComment"> <span class="QuotedText">&gt; Because these are all shipped together, it's OK to rearrange where the various checks live;</span><br> <p> But if you've explicitly declared an interface, surely that means rearranging the checks across the interface is unsafe in and of itself, so the compiler won't do it ...<br> <p> Cheers,<br> Wol<br> </div> Wed, 26 Feb 2025 13:13:02 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011887/ https://lwn.net/Articles/1011887/ farnz The problem is that shipping an updated .so in the way C or C++ do it runs the risk of invoking UB from the "safe" subset of Rust, and one of Rust's promises is that invoking UB requires you to use "unsafe Rust". Thus, just copying the C way of doing things isn't acceptable, because it can take a safe program and cause it to invoke UB. <p>For example, if you go deep into how <a href="https://github.com/rust-lang/rust/blob/master/library/alloc/src/vec/mod.rs#L1415"><tt>Vec::shrink_to_fit</tt></a> is <a href="https://github.com/rust-lang/rust/blob/master/library/alloc/src/raw_vec.rs#L690">implemented internally</a>, you find that you have a set of tiny inline functions that guarantee that an operation is safe that leads down to a monomorphic <a href="https://github.com/rust-lang/rust/blob/master/library/alloc/src/raw_vec.rs#L709">unsafe <tt>shrink_unchecked</tt></a> function that actually does the shrinking. <p>Because these are all shipped together, it's OK to rearrange where the various checks live; it would be acceptable to move a check out of <tt>shrink_unchecked</tt> into its callers, for example. But, in the example you describe, you've separated the callers (which are inlined into your binary) from the main body of code (in the shared object), and now we have a problem with updating the shared object; if you move a check from the main body into the callers, you now must know somehow that the callers are out-of-date and need recompiling before you can update the shared object safely. <p>C and C++ implementations handle this by saying that you must just know that your change (and a security fix <em>is</em> a change to existing stuff, breaking your rule that "new stuff can be added, but existing stuff can't be changed") is one that needs a recompile of dependents, and it's on you to get this right else you face UB for your mistakes. Rust is trying to build a world where you only face UB if you explicitly indicate to the compiler that you know that UB's a risk here, not one where a "trivial" <tt>cp new/libfoo.so.1 /usr/lib/libfoo.so.1</tt> can create UB. Wed, 26 Feb 2025 12:26:18 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011879/ https://lwn.net/Articles/1011879/ ras <div class="FormattedComment"> <span class="QuotedText">&gt; but is completely useless if you switch to a new .so for an already compiled application </span><br> <p> I expect it would be the same story as C or C++. It has the same traps - don't expect an inline function (or template in C++'s case) in a .h to be effected by distributing a new .so. Despite that limitation shipping updated .so's to fix security problems happens all the time. The rule is always new stuff can be added, but existing stuff can't be changed. It would be the same deal with Rust, but would cover the ".h" section too, meaning you can add new exported types of monomorphized functions, but not changed existing ones.<br> <p> Putting the .h section in the .so brings one advantage. There is no way for a C program to know if the .h it is compiled against matches the one the .so was compiled against. But a Rust program compiled against a .so could check the types in the .h section match the ones it was compiled with, and reject it if they aren't.<br> </div> Wed, 26 Feb 2025 11:36:50 +0000 Versioning of shared objects https://lwn.net/Articles/1011880/ https://lwn.net/Articles/1011880/ farnz Note that libc handles versioning well because the glibc maintainers do a lot of hard and non-trivial work to have things like compatibility symbols so that you can swap to a later glibc without breakage. You can do a similar level of work in Rust today to get dynamic linking working, and working well. <p>What C has that Rust doesn't is that it's fairly trivial to take a C library, build it into a <tt>.so</tt>, and have it work as long as upstream doesn't make a silently breaking change (which can result in UB, rather than a failure); it's also fairly simple to patch the build system so that the <tt>.so</tt> is versioned downstream of the library authors, so that they are ignorant of the use as a shared library. This is being worked on for Rust, but the goal in Rust is to ensure that any breaking changes upstream result in a failure to dynamically link, rather than a risk of UB. Wed, 26 Feb 2025 10:53:40 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011869/ https://lwn.net/Articles/1011869/ taladar <div class="FormattedComment"> That might help if you have the updated .so at compile time but is completely useless if you switch to a new .so for an already compiled application or other library using the updated one.<br> </div> Wed, 26 Feb 2025 09:19:02 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011828/ https://lwn.net/Articles/1011828/ ras <div class="FormattedComment"> <span class="QuotedText">&gt; the answer is usually "because the tooling makes it painful to split it up",</span><br> <p> Or "it's faster because we can optimise".<br> <p> My favourite counter example to this is LVM / DM vs ZFS, possibly because I'm a recent user of ZFS. LVM / DM / traditional file system give you similar a outcome to ZFS, albeit with a more clunky interface because "some assembly is required". The zfs CLI is nice. However by every other metric I can think of LVM / DM / ... stack wins. The stack is faster, the modular code is much easier to understand, they have less bugs (I'm tempted to far less), and you have more ways of doing the same thing.<br> <p> This is surprising to me. I would have predicted to the monolithic style to win on speed at least, and are easier to extend (which evidence against "it's easier to develop that way").<br> <p> I guess there is a size when the code base becomes too much for one person. At that point it should become modular, with each module maintained by different people sporting an interface that requires screaming and yelling to change. But by that time it's already a ball of mud, I guess the tooling is a convenient thing to blame for not doing the work required to split it up.<br> </div> Tue, 25 Feb 2025 21:51:38 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011825/ https://lwn.net/Articles/1011825/ ras <div class="FormattedComment"> <span class="QuotedText">&gt; One big problem with this approach is versioning.</span><br> <p> I don't get the problem. libc.so.X.Y already handles versioning pretty well.<br> <p> Putting the .h's in the .elf does solve one problem that bites me on occasion - the .h's don't match the .so I'm linking against. It would be nice to see that nit disappear.<br> </div> Tue, 25 Feb 2025 21:29:02 +0000 Reasons for speedup? https://lwn.net/Articles/1011817/ https://lwn.net/Articles/1011817/ Cyberax <div class="FormattedComment"> <span class="QuotedText">&gt; But there's one straightforward answer that applies to nearly any rewrite-it-in-Rust project: Rust is really, really good at emitting noalias to the IR layer, whereas C is only mediocre at it</span><br> <p> Case in point: <a href="https://trifectatech.org/blog/zlib-rs-is-faster-than-c/">https://trifectatech.org/blog/zlib-rs-is-faster-than-c/</a> - zlib port into Rust is outperforming C.<br> </div> Tue, 25 Feb 2025 20:10:48 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011716/ https://lwn.net/Articles/1011716/ mathstuf <div class="FormattedComment"> FWIW, CMake does this by enforcing that `PRIVATE TYPE CXX_MODULES` files are never imported from `PUBLIC TYPE CXX_MODULES` sources, so at least it can help enforce the "don't expose private bits in public interface units" part.<br> </div> Tue, 25 Feb 2025 11:57:24 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011709/ https://lwn.net/Articles/1011709/ farnz Note, though, that once you have tooling that makes splitting it up easy, the dividing line is rarely as simple as "interfaces" and "implementations". You're more likely to have splits like "VFS interface and implementation", "ext4 interface and implementation" etc. Tue, 25 Feb 2025 11:24:55 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011708/ https://lwn.net/Articles/1011708/ taladar <div class="FormattedComment"> Whenever a project is a large single library or repository the answer is usually "because the tooling makes it painful to split it up", not "there is a good reason to have a humongous pile of code".<br> </div> Tue, 25 Feb 2025 11:16:41 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011702/ https://lwn.net/Articles/1011702/ farnz When you provide <tt>T</tt>, it's monomorphized for you. The hard case is when I want to provide <tt>Foo&lt;T&gt;</tt>, and allow you to provide an arbitrary <tt>T</tt> that I haven't thought about up-front. <p>There are, currently, two reasons for Rust to not have a stable ABI, both being worked on by experts in the field (often overlapping with people solving this problem for Swift and for C++ modules: <ol> <li>Rust explicitly allows the memory layout of most data structures to vary between compilations, which lets it do things like completely elide fields that never change at runtime, along with all the code to modify them. This is obviously not compatible with a stable ABI for that structure. <li>The generics problem. This is a generally hard problem, and no-one has a great solution; there are tricks that reduce the scale of the problem (which are also needed for static linking, because of the compile time and binary size issues), and <a href="https://rust-lang.github.io/compiler-team/working-groups/polymorphization/">Rust has had a working group</a> looking into how many of the tricks can be machine-applied to naïve code, as opposed to requiring a human to split the code into generic and monomorphic parts. </ol> <p>There is, however, <a href="https://github.com/rust-lang/rfcs/pull/3435">serious work going into a <tt>#[export]</tt> style</a> of ABI marker that allows you to mark the bits (or an entire crate) as intended to have a stable ABI, and errors if the compiler can't support that. This will, inevitably, be a restricted subset of the full capabilities of Rust (since macros, generics, and other forms of compile-time code creation can't be supported in an exported ABI), but it's being <a href="https://rust-lang.github.io/rust-project-goals/2025h1/safe-linking.html">actively thought about as a research project</a> with a goal of allowing as much code as possible to be dynamically linked while not sacrificing any of the safety promises that Rust makes today using static linking. Tue, 25 Feb 2025 10:43:26 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011701/ https://lwn.net/Articles/1011701/ farnz Sure, but I'm shipping full source anyway, and I can check for people exporting parts of the internal partition in CI, just as I'd have to have similar checks in place to stop people moving code from the interface module to the implementation module. <p>Remember that the goal here is one module, nicely structured for ease of maintenance, and thus split across multiple module units, with an internal module partition to make the stuff that's for internal use only invisible from outside the module, rather than multiple modules. Tue, 25 Feb 2025 10:01:02 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011697/ https://lwn.net/Articles/1011697/ Wol <div class="FormattedComment"> Hmm ... pile of musings here ...<br> <p> But if you had stuff that was specifically meant to be a library, why can't you declare "I want to monomorphise these Vec&lt;T&gt;s". Any others are assumed to be internal and might generate a warning to that effect, but they're not exported.<br> <p> And then you add rules about how the external interface is laid out, so any Rust compiler is guaranteed to create the same export interface. Again, if the programmer wants to lay it out differently, easy enough, they can declare an over-ride.<br> <p> And then lastly, the .o or whatever the Rust equivalent is, contains these declarations to enable a compiler of another program to pull them in and create the correct linkings.<br> <p> Okay, it's more work having to declare your interface, but I guess you could pull the same soname tricks as C - extending your interface and exports is okay, but changing it triggers a soname bump.<br> <p> Cheers,<br> Wol<br> </div> Tue, 25 Feb 2025 08:06:53 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011696/ https://lwn.net/Articles/1011696/ mb <div class="FormattedComment"> <span class="QuotedText">&gt; That information would be roughly equivalent to what's put in a .h file now. But where would you put it? </span><br> <p> Put it into a my-public-interface crate and generate the docs for it? I don't see the problem.<br> </div> Tue, 25 Feb 2025 06:48:30 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011692/ https://lwn.net/Articles/1011692/ Cyberax <div class="FormattedComment"> One big problem with this approach is versioning. It's less of a problem for internal libraries in monolithic projects like systemd or uutils, but it will become a problem if the package is exposed via system-level package managers. And if you're in a monolithic project, then there's no need to embed the pre-instantiated type information into .so files, you can just store it in an ".h" file.<br> </div> Tue, 25 Feb 2025 03:50:15 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011688/ https://lwn.net/Articles/1011688/ ras <div class="FormattedComment"> <span class="QuotedText">&gt; All Rust installs ship a tool that extracts the public interface of your crate and puts it into a nice html document for review:</span><br> cargo doc<br> <p> I had said earlier I wanted the ability to say to the compiler "not here, at this boundary conventional linker is all you need". I also said C++ gives you the ability to do that, by splitting stuff into .h and .cpp. @franz said "but that's the old way, boost for example doesn't do that". That's true, but the point is the people who wrote boost made the decision to adopt the way Rust does it. C++'s std makes a different decision. @taladar said C++ compiles are slow. I'm guessing that's because the packages / libraries he is working adopt this newfangled way, and everything gets recompiled all the time. He's blaming C++ for that, but I'd argue that fault lies at least as much with the package authors for making that choice.<br> <p> @franz then said "oh but it's hard to think about what has to be monomorphized and what isn't, and besides redeclaring everything in .h is verbose and a lot of work". I don't have much sympathy for the first part - I did it all the time when I wrote C++. The second is true, the information in .h is redundant. A modern language shouldn't make you type the same thing twice without good reason.<br> <p> Those language differences were swirling around in my head when I wrote: "Or perhaps the required information gets folded into the .o". It was a thought bubble. But it's key point illustrated the idea nicely with your "cargo doc" comment. Rust could add something to the language that says "this source is to be exported (made available) to people who want to link against my pre-compiled library", in the same way "cargo doc" exports stuff. That information would be roughly equivalent to what's put in a .h file now. But where would you put it? The thought bubble was place it a section of the .elf object that holds the compiled code. Call it say a ".h" section. Then when someone wants to compile against your library, they give that .o / .so / .a to both the compile phase (which looks for the equivalent of the .h sections) and the link phase (which just wants the compiled code for the non-monomorphized stuff, which - if the programmer has done their job - should be the bulk of it).<br> <p> The ultimate goal is to allow the programmer to decide what needs to be monomorphized, and what can be pre-compiled. And to have Rust tell the programmer when they've mucked that boundary up. I guess it would get an error message like: "This type / function / macro has to be exported to the .h, because it depends on type T the caller is passing in". Right now Rust programmers don't have that option, and that leads to the trade-offs I mentioned.<br> </div> Tue, 25 Feb 2025 01:27:35 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011685/ https://lwn.net/Articles/1011685/ mathstuf <div class="FormattedComment"> <span class="QuotedText">&gt; Why create that extra workload when I can have a single module with an internal module partition such that it's very obvious when I change the interface without changing the implementation to match, and where I have one thing to release instead of two?</span><br> <p> If the partition is imported into the interface for any reason, it must be shipped as well.<br> </div> Tue, 25 Feb 2025 00:00:41 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011667/ https://lwn.net/Articles/1011667/ farnz To express it slightly differently, why is the Linux kernel a single module with multiple internal partitions, rather than separate modules for the VFS interface, MM interface, network interface etc, along with implementation modules that implement each of those interfaces? If the benefits are as big as you're claiming, surely it would make sense to separate out the interfaces and implementations into separate modules, rather than just have separate partitions internally? Mon, 24 Feb 2025 17:39:26 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011663/ https://lwn.net/Articles/1011663/ mb <div class="FormattedComment"> <span class="QuotedText">&gt;The amount of code that needs to be reviewed for assessing validity of a change, perhaps?</span><br> <p> All Rust installs ship a tool that extracts the public interface of your crate and puts it into a nice html document for review:<br> cargo doc<br> <p> This is much better than manually typing in the redundant code for the public interface declarations.<br> It's easy to navigate and includes all details of your public interfaces.<br> </div> Mon, 24 Feb 2025 17:32:12 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011661/ https://lwn.net/Articles/1011661/ Wol <div class="FormattedComment"> <span class="QuotedText">&gt; Seriously? The only reason to split interface and implementation you is license considerations?</span><br> <p> That's not what he said. He said "extra work". Which in reality usually means "push the work down the road until I get to it". Which also often in reality means "I'll never get to it".<br> <p> It wouldn't get done in commercial circles either, if secrecy didn't have a (at least nominal) value. <br> <p> Time pressure usually turns out to be an extremely important consideration.<br> <p> Cheers,<br> Wol<br> </div> Mon, 24 Feb 2025 17:24:13 +0000 Splitting implementation and interface in C++ https://lwn.net/Articles/1011658/ https://lwn.net/Articles/1011658/ farnz Yes. The extra work of splitting something into an interface module and an implementation module is significant, and does not reduce the complexity of reasoning about a change, nor the amount of code that has to be reviewed for assessing validity of a change. <p>This is distinct from splitting the implementation up inside a single C++ module; having multiple module units, one for the exported interface and many for the internal implementation, makes a lot of sense, but having two separate C++ modules, one of which exports an unimplemented interface, and the other of which exports an implementation of that interface, is a mess, since it means I have to make sure that the two separate modules are kept in sync manually. <p>Why create that extra workload when I can have a single module with an internal module partition such that it's very obvious when I change the interface without changing the implementation to match, and where I have one thing to release instead of two? Mon, 24 Feb 2025 17:22:38 +0000