Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 20:06 UTC (Wed) by ballombe (subscriber, #9523)
In reply to: Rewriting the GNU Coreutils in Rust by ccchips
Parent article: Rewriting the GNU Coreutils in Rust

Welcome to static linking!

Rewriting the GNU Coreutils in Rust

Posted Jun 11, 2021 8:33 UTC (Fri) by gspr (guest, #91542) [Link] (5 responses)

Tangentally related: Are there any efforts – nomatter how off they are from real-world applicability – to explore ways in which Rust could be made more amenable to dynamic linking (between Rust libraries; it already plays well with dynamic linking to e.g. C libraries).

Rewriting the GNU Coreutils in Rust

Posted Jun 11, 2021 9:44 UTC (Fri) by farnz (subscriber, #17727) [Link] (4 responses)

There's some talk, but no efforts that I know of - see this blog entry on how Swift achieves dynamic linking when Rust doesn't that explains one route that could be taken to dynamic linking in Rust without losing too much of the Rust benefits.

Ultimately, though, we're at the "needs an innovation" stage - dynamic linking is built around the idea that compilation takes a single independent unit of source and turns it into complete object code with unresolved symbols. Linking then resolves all the symbols to get an executable binary. In that model, dynamic linking is a big win; there's a clear divide between the single units of source and the symbol resolution.

Modern languages (C++, Rust, Swift, Go and others) are not as amenable to this model; units of source code are not independent any more, because the use of generics and vtables both mean that inlining and specialisation of the inlined code is a huge win and thus we want to either have huge units of source (entire programs, say), or we want the link phase to do significant compilation effort, possibly changing the units that it has on disk.

Note that Rust is, in a very technical and useless sense, dynamically linked on Linux; while all the Rust code forms a single object, that object is dynamically linked with the C code it depends on at runtime, including the VDSO and libc. It's just that what we want is to somehow get the shared code between two Rust objects into a Rust shared object that's reused at runtime, and this is a deeply challenging problem to get right, especially in a language that aims to have cheap abstractions like Vec<_> working well.

Rewriting the GNU Coreutils in Rust

Posted Jun 11, 2021 10:22 UTC (Fri) by gspr (guest, #91542) [Link]

Thanks. That's a very interesting blog entry!

I agree with the points you bring up, and recognize them as real and significant hurdles. I am not a computer scientist, but it does seem at least plausible to me that, as you suggest, innovation can potentially occur that works around the hurdles.

Regarding your point about Rust dynamically linking to C/other libraries: Yes, absolutely. And this is a huge selling point of Rust for me. I had Rust-Rust dynamic linking in mind though.

Rewriting the GNU Coreutils in Rust

Posted Jun 12, 2021 23:49 UTC (Sat) by khim (subscriber, #9252) [Link] (2 responses)

The biggest problem with dynamic linking IMO is just the fact that people don't bother to create a stable ABIs. Not even with C libraries.

And if you do want to create such an ABI you can as well go and create C ABI for two Rust modules.

This being said it would be interesting to develop some kind of tool to make it possible to load plugins into programs (but then… maybe these should go into separate processes anyway?).

Anyway: ABI stability and dynamic linking should be considered one problem, not two. Because ABI-unstable dynamic libraries create more problems than they solve from my observation.

If you have two versions of boost deeply embedded into two separate libraries then they can work fine, if both link boost dynamically then this become a problem.

Rewriting the GNU Coreutils in Rust

Posted Jun 13, 2021 8:21 UTC (Sun) by farnz (subscriber, #17727) [Link] (1 responses)

However, in C-land (which works well with the FORTRAN compile first, then link model), an ABI-unstable dynamic library still appears to work OK; as long as you don't accidentally remove or break an exported symbol, you get the disk space and memory savings of dynamic linking, and you get an apparent ability to rebuild the shared object with newer (bug fixed) code and swap it in for the old version.

There are still two orthogonal reasons for dynamic linking:

To have code that's used in many binaries exist once on the system, saving disk space and memory via CoW.
To allow you to replace a shared object with a newer version in order to improve performance and fix bugs present in the older version.

Without an explicitly stable ABI, you can't reliably get #2 - at any time, a surprise ABI change can mean that the two shared objects are no longer equivalent from the perspective of your existing binaries. But you still get #1 - less memory used by running programs because the shared code is all in shared objects that are CoW and read-only.

Getting the first just requires the tooling to support it, and is a matter of coming up with a workable design for doing #1 without compromising on modern features like generics. Getting the second also needs developer buy-in; no amount of tooling helps if you keep a symbol unchanged, but completely alter its semantics. And the first is a requirement for the second; I can't build a stable dynamic link ABI if the tooling won't let me build a dynamic library.

Rewriting the GNU Coreutils in Rust

Posted Jun 13, 2021 9:38 UTC (Sun) by khim (subscriber, #9252) [Link]

If you don't have stable ABIs then savings from COW rarely materialize. Simply because different programs end up with different DSOs anyway.

Sure, if may still get that benefit for OS-supplied programs, but in todays' world where even low-end system gave gigabytes of RAM it's not very important.

And where it is important (embedded, e.g.) you would get bigger savings if you would just combine multiple binaries into one. Bosubox was written in C yet it's usually compiled as one binary anyway.

Thus solving dynamic linking problem without solving ABI stability problem incurs high costs yet doesn't buy you much.

And the fact that you can't solve stability ABI without developer's buy-in makes the issues easier to handle different issues, too.

Think Windows, platform with most third-party plugins implemented as dynamic libraries (Android and iOS have more apps today, but very-very few of them use or support third-party plugins): global variables can not be shared, there are no common libc (thus you need to keep track of different versions of malloc/free) and so on.

Similarly in Rust: if we don't plan to make the ability to build any module as share library the goal then we can significantly simplify that task and from practical POV there would be little difference.