LWN: Comments on "Oxidizing Ubuntu: adopting Rust utilities by default"

Manpages are important

raindog308 — Thu, 27 Mar 2025 23:49:27 +0000

manpages?

“See the info page for more details.”

/bin/true bloat, and /bin/cat

taladar — Mon, 24 Mar 2025 10:13:36 +0000

On the other hand Rust's small dependencies have regular "unmaintained" notifications while the large dependency probably has a good percentage of code that nobody looked at in years. In fact I think I still have a Qt Widget bug open from 10 years ago somewhere that has been migrated through 2-3 different issue trackers by now.

/bin/true bloat, and /bin/cat

farnz — Sun, 23 Mar 2025 17:11:27 +0000

I suspect, though, that this is more "vibes" than reality; sure, a big dependency with several maintainers looks healthier from the outside, but in practice, it's not that rare for a big dependency to internally be several small fiefdoms, each of which has just one maintainer. You thus have something that actually hasn't seen maintenance for years, but it "looks" maintained because it's part of a bigger thing where the other bits are well-maintained.

/bin/true bloat, and /bin/cat

surajm — Sun, 23 Mar 2025 15:53:26 +0000

I think the benefit of the c++ approach is that maintenance for the libraries is generally less concerning. Group ownership of libraries feels a lot safer than tons of tenuously owned and maintained libraries. You don't need to put everything in a single repo or dependency to make this work of course but if you do put everything in one repo then that ownership structure is forced. And to be clear there are many examples of the above approach in the rust ecosystem as well. It's just difficult to ensure all of your deps originate from such entities as there are such deep layers of transitive dependencies, which is again less likely in the c++ ecosystem.

I hope to see this situation improve over time as larger organizations continue to adopt rust and place more strict rules on allowable dependencies.

Weakened license protection

Wol — Sun, 23 Mar 2025 07:09:53 +0000

> Unless the car manufacturers are modifying the code, the only requirement for GPL compliance is documenting that they got it from upstream;

"6. Conveying Non-Source Forms.

You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways:

a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange."

Have you read the GPL? Have you understood it? I haven't quoted the entirety of section 6, but if you are a business there is a hell of a lot more than just "documenting you got it from upstream". You - CORPORATELY - are on the hook for making sure your customer can get the source. And that is expensive administrative hassle companies would much rather avoid.

There are "sort of" getouts, 6c, and 6e, but they're not aimed at corporates, and they still come with grief companies don't want. I've only just noticed 6e, but unless the company controls that location, they're probably not complying with it, and if they do control it it's more hassle that again they don't want.

Cheers,
Wol

Weakened license protection

himi — Sun, 23 Mar 2025 01:48:30 +0000

That doesn't make much sense, though - the GPL in this case applies specifically to the coreutils code and derivatives, not to any higher level aggregation. Unless the car manufacturers are modifying the code, the only requirement for GPL compliance is documenting that they got it from upstream; given the MIT license requires copyright attribution to persist, the practical difference is zero - a little bit of text listing copyright attributions and pointing at the upstream source, or a little bit of text that only lists copyright attributions.

Unless they're actually modifying the code, of course. Which . . . well, for coreutils? I'd have to assume that's just going to be compilation support for whatever platform they're using, in which case it'd make far more sense to submit patches upstream than to maintain their own fork in-house, and the same logic would apply whether they're using GNU coreutils or uutils.

It sounds like either the companies in question don't actually understand the way the GPL works (which shouldn't be an issue if they have competent lawyers), or they're pulling an Apple and avoiding any GPLed code on ideological grounds.

Weakened license protection

Wol — Sat, 22 Mar 2025 09:18:45 +0000

> > I kinda doubt a company will take these utils, close source them, and resell them without redistributing sources. It would bring only marginal benefit.

What it DOES bring them is a big reduction in pain. If I can ship a product, based on a publicly available tree, without all the hassle of tracking, responding to requests, etc etc, then that's a big attraction.

And regardless of whether you're an engineer, a programmer, an analyst, people at the sharp end like to collaborate. It's bean counters who all too often don't see the benefit of collaboration, but they do see the cost of getting sued.

What we need is a GPL-lite, that contains all the downstream protections, and rather than saying "you have to share the source" replaces it with "you must develop in public, and tell your customers where to find it". Basically, it has to be publicly readable, 3rd-party hosted, and advertised to upstream and downstream alike.

At the end of the day, engineers want to share, but they don't want all the GPL Administrative Hassle that comes with the GPL. All bean counters can see is the cost. The GPL is making the wrong person pay! There's a good chance I will push my changes upstream because I can see the benefit. If I don't, upstream may (or may not) mine my respository because they see a benefit. And any customer who wants the source may have a bit of grief working out exactly which source they've got, but they have got it (and if I can't tell them, that may well be a cost to me). (Programming in Excel it's costing me dear at the moment!)

Cheers.
Wol

resource usage concerns

tialaramex — Fri, 21 Mar 2025 23:29:52 +0000

Re-reading your original comment I observe that it is pretty emphatic about VLAs and yet I somehow ended up thinking only about the ordinary cases (the VLAs probably existed by the time I started getting paid to write C but I think I was still writing more or less C89 well into this century).

So that's on me. It did cause me to go find out what the current status is of (formally supported rather than as a hack) VLA-like Rust objects (ie a runtime sized object lives on the stack) and seems like they're not close.

resource usage concerns

wahern — Fri, 21 Mar 2025 22:42:57 +0000

Yes, tweaking that example program it seems a statically sized array, even if declared within a runtime conditional block at the end of the routine (after the VLA loop), is indeed allocated at the start. Not surprising (I didn't disbelieve in that respect), but now I wonder if, in the days before GCC and clang implemented stack probing, that behavior posed a security issue for functions that attempted to conditionally use a [non-VLA] stack allocation based on its own stack size check. Perhaps still something to keep in mind for some other compilers, both C and non-C.

For posterity: I've been using gcc version 14.2.0 (MacPorts gcc14 14.2.0_3+stdlib_flag) on an ARM M1 with these test cases. (__builtin_stack_address was too convenient, but not supported by the installed Apple clang toolchain, though it seems it is supported by the latest upstream clang release.)

Weakened license protection

ndiddy — Fri, 21 Mar 2025 22:39:06 +0000

> I kinda doubt a company will take these utils, close source them, and resell them without redistributing sources. It would bring only marginal benefit.

There was a podcast interview here: https://youtu.be/5qTyyMyU2hQ?t=1270 with the lead uutils maintainer where he brought up that some car manufacturers had already started using uutils in their products instead of the GNU core utils because it means they don't have to comply with the GPL. From a corporate standpoint, when you have one set of tools where you have to comply with the GPL, and then a drop-in replacement for them where you don't, of course you'll use the tools that don't require GPL compliance.

resource usage concerns

anton — Fri, 21 Mar 2025 17:18:34 +0000

Stack allocations always last until the function returns. That's not a Rust limitation, it's just how the stack works (at least, in any language that has a call stack).

No. In every language with guaranteed tail-call optimization, all stack allocation ends at the latest before the tail-call. In Prolog, compilers sort the variables by lifetime, and stack-deallocate before every call, such that only live variables consume stack memory on the call. This reduces the memory consumption for recursive predicates that are not tail-recursive; I expect that there are other implementations of languages where recursion is important that use the same technique.

resource usage concerns

farnz — Fri, 21 Mar 2025 16:35:29 +0000

If you count deallocation as access then we are back to square one with tracing-GC based language never having any leaks while real-world Java programs wasting gigabytes for stuff they would never need.

This is why I said The details around "deallocation" are, IMO, the hard chunk of defining "will not be accessed in the future".

At one extreme, all allocated memory is leaked, because there's always a time when it's still allocated but has not yet been deallocated; at the other extreme, no allocated memory is leaked because all memory is implicitly freed by program exit.

You need to find a definition of "deallocated" that is useful for the case you're considering; for example, memory is considered deallocated for leak purposes if you use the language level facility^† to release it before the next language level memory allocation, function return, or function call. That way, you've allowed for RAII (you say that destructors run just before function return), but you've ensured that any route by which memory usage can grow unbounded is considered to be a leak, as are bounded leaks where you're "merely" late deallocating (such as your tracing GC example of having gigabytes allocated but never used).

^† A language level facility could be something like C's free for heap objects and leaving the scope for stack objects, but for a language like Python, you could define it as "set the last reference to None or a different object, and ensure that there are no cyclical references" to get a useful definition.

resource usage concerns

tialaramex — Fri, 21 Mar 2025 13:39:11 +0000

Ah, a VLA, yes it makes sense that the VLA has to actually allocate. I hadn't considered VLAs in what I wrote.

I assume, since your example is a VLA that if you write a conventional C89 array or any other type it is not in fact creating and destroying the allocation.

resource usage concerns

khim — Fri, 21 Mar 2025 13:27:17 +0000

> but also deallocations count as accesses

If you count deallocation as access then we are back to square one with tracing-GC based language never having any leaks while real-world Java programs wasting gigabytes for stuff they would never need.

Not a very useful definition.

But if you would look on issue of memory leaks from layman perspective, more precisely, CFO perspective then situation is much simpler: we don't care about bounded memory leaks at all. They don't raise our bill of materials unpredictably.

What we do are about are unbounded leaks: situations when ratio between memory spent on “useful work” and “memory leaks” goes to zero.

And it's much easier to define what is unbounded memory leak. Imagine that you program runs alongside of that oracle that tells it whether certain object would be touched in the future or not (without counting destructors/deallocators). Count the amount of memory it needs. Now run real program with the same inputs. How much memory that run needs? The smaller the ratio the better and if it's not bounded by anything then you have an unbounded memory leak.

P.S. Note that most real world programs use more memory then they, theoretically, could. Tracing GC based ones are especially egregious since they usually need at least 2x more than theoretical minimum (simple, naïve, mark-and-sweep algorithm simply require 2x more to even be usable, while modern approaches can work with more but their efficiency becomes drastically reduced). But as long as ratio is bounded (you need to pay for 16GiB of memory if you plan to process 1GiB files or something like that) CFO can easily adjust bill of materials. Unfounded leak, on the other hand, means you have no idea how much would you need to pay. And that is the critical difference.

/bin/true bloat, and /bin/cat

excors — Fri, 21 Mar 2025 11:26:41 +0000

One of the significant concerns about number of dependencies is the vulnerability to supply chain attacks, and I think small dependencies actually make that worse, even if the number remains constant.

In C++, if I want something very simple like a circular buffer class, I might find it as part of Boost. That's a huge dependency for such a little feature, which does have some drawbacks. But because it's huge I can be confident there are many developers working on the project. There are review processes, and if one developer tries to slip in something naughty then there's a reasonable chance another developer will spot it before it's released. Security researchers will be running their tools over it. If a vulnerability is reported, there are responsible maintainers who will respond promptly.

If I want the same in Rust, I'll probably find a library that is just one random guy on GitHub. A lot of the code has probably been reviewed by exactly zero other people. There is nothing to mitigate against that developer being malicious, or having their GitHub account compromised, or carelessly accepting a pull request from another random user. They might ignore a vulnerability report for months. They're lacking all the processes and shared responsibility that comes from being in a large project.

I'd agree the huge dependencies will probably have more accidental vulnerabilities, because the sheer quantity of code will outweigh the improved review processes - but Rust's memory safety should already mitigate a lot of that risk, compared to C/C++. That means deliberate backdoors are a relatively greater risk, even before attackers realise there aren't enough buffer overflows and use-after-frees left for them to exploit and they'll have to shift towards more supply chain attacks.

/bin/true bloat, and /bin/cat

taladar — Fri, 21 Mar 2025 08:36:40 +0000

I will never understand the people who complain about number of dependencies without taking into account size of dependencies. Sure, C or C++ have a lower number but that is mostly because each dependency is artificially inflated to a huge size because the build tooling is so bad that nobody wants to split them up into separate libraries.

I'd much rather have a hundred small Rust dependencies than one Qt or openssl that comes with hundreds of critical bugs and security holes that do not even affect the part of it I am using but I have to deal with the related upgrades and CVEs anyway.

More robust oxidizr behavior?

raven667 — Fri, 21 Mar 2025 02:47:29 +0000

PATH exists for the interactive user convenience, but robust scripts don't operate in the same environment and it's reasonable to either sanitize $PATH to a known quantity or skip relying on it at all and hardcode all paths to system binaries that rarely change on the platform/version you support. Scripts have to take a whole bunch of defensive measures like pervasive quoting, using quoted arrays for arguments, explicit exit checking/set -e and other techniques that aren't at all like someone using a shell interactively. The two use cases regularly conflict in their wants and needs, which is why stuff gets reimplemented in perl or Python sometimes and things like suid shell scripts are impossible.

Weakened license protection

jwakely — Thu, 20 Mar 2025 23:22:25 +0000

>I don't believe that the GNU project has a problem in principle with Rust, the language. The fact that a Rust frontend for gcc is in the works seems to suggest otherwise.

The GNU project doesn't control GCC, so I don't think you can draw any conclusions about GNU's view on Rust from the existence of gccrs.

resource usage concerns

wahern — Thu, 20 Mar 2025 22:39:00 +0000

The stack does shrink. Example program:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

__attribute__((noinline))
static void showfp(unsigned n, intptr_t otop) {
	intptr_t top = (intptr_t)__builtin_stack_address();
	printf("n:%u off:%td\n", n, (otop > top)? otop - top : top - otop);
}

int main(void) {
	unsigned n = 0;
	intptr_t top = (intptr_t)__builtin_stack_address();
	while (1 == scanf("%u", &n)) {
		char buf[n];
		memset_s(buf, n, 0, n);
		showfp(n, top);
	}
	return 0;
}

For `echo 200 100 5 | ./a.out` I get:

n:200 off:272
n:100 off:176
n:5 off:80

As the size of successive stack allocations decrease, so does the frame size.

Integrate Ash/Dash?

gmatht — Thu, 20 Mar 2025 22:24:16 +0000

I understand that Busy Box has an integrated shell. Presumably a uutils integrated shell could calls to the integrated utilities with a simple function call, which should be pretty fast?

More robust oxidizr behavior?

jrtc27 — Thu, 20 Mar 2025 19:50:45 +0000

Yes, distributions don't want the archive to be a Wild West of diverting each other, and in an ideal world coreutils would cooperate with other providers of the same tools. But absent that cooperation, dpkg-divert is at least more robust than just moving the files out of the way with no package manager knowledge, and does not require inherent cooperation from the package for dpkg-divert to be used, unlike alternatives, so it's no worse in that regard than just moving the files directly.

More robust oxidizr behavior?

jrtc27 — Thu, 20 Mar 2025 19:48:12 +0000

If coreutils gets an upgrade then the diverted files remain diverted and the Rust versions don't get overwritten. That's the whole point of dpkg-divert.

OIL RIG

jorgegv — Thu, 20 Mar 2025 16:24:32 +0000

Niiiiice one... :-D

bummer the uutils licence is MIT and not GPL

patrick_g — Thu, 20 Mar 2025 15:59:53 +0000

Lots of information here :
https://fosdem.org/2025/schedule/event/fosdem-2025-6196-r...

The license issue is addressed during the talk.

Ubuntu going downhill...

pj — Thu, 20 Mar 2025 15:27:41 +0000

The coming chaos around this makes me glad I recently ditched Ubuntu (after ~15years of dedicated use), though I did it because of the banner popups advertising Ubuntu Pro whenever I used apt.

resource usage concerns

tialaramex — Thu, 20 Mar 2025 13:40:29 +0000

IIUC Although the _scope_ ends, the _allocation_ does not. The loop re-uses the allocation, and that's how some of the GC'd languages get that design mistake where they re-assign a single variable for each iteration rather than destroying that variable and conjuring a new one into existence with the same name. In C because it doesn't have RAII or GC the behaviour looks like it could be either and so it's harder to realise that one of these approaches is wrong.

[If only one language had that mistake, or, even if several did this but they don't regard it as a mistake and fix it, that would be a different matter but in fact this mistake has happened several times and been fixed in IIRC at least C# and Go]

bummer the uutils licence is MIT and not GPL

h7KdD8Z — Thu, 20 Mar 2025 12:56:57 +0000

Real bummed to learn the the uutils project is MIT licenced and not GPL. Anyone have any background on that decision? Curious to read more about the justification.

/bin/true (was Performance concerns when heavily used in scripts ?)

MortenSickel — Thu, 20 Mar 2025 12:54:06 +0000

Sorry, reading a bit further down, I realised that also in bash true is a shell builtin. Running /usr/bin/true definately returns text on --help and -version, but not --usage.

/bin/true (was Performance concerns when heavily used in scripts ?)

MortenSickel — Thu, 20 Mar 2025 12:51:35 +0000

Just tried true --help, true --version and true --usage on my rocky linux 9 box. No output from either.

resource usage concerns

farnz — Thu, 20 Mar 2025 12:12:13 +0000

You need a good definition of accessed for this, too - it's not just "accesses" in the sense of reads and writes, but also deallocations count as accesses (otherwise all memory is leaked by this definition, since there's a period between the last read/write and the deallocation, even if the program is careful to keep this small). It also needs to focus on the "right" set of accesses - you want, for example, to not always count main's stack as leaked since it's not freed until the end of the program, but you also don't want to count something as "not leaked" just because it happens that RAII will free it before the end of the program.

The details around "deallocation" are, IMO, the hard chunk of defining "will not be accessed in the future". We wouldn't consider let mut foo = Foo::new(); foo.do_the_thing(); /* 1 */ drop(foo); as having a leak just because at point /* 1 */ there's an allocated object that will not be accessed again, but you might want to define the program as having a leak if, at /* 1 */, it spawned a thread that did all the rest of the program's work apart from freeing foo.

/bin/true bloat, and /bin/cat

chris_se — Thu, 20 Mar 2025 11:25:45 +0000

> Anyway, this particular rust project explicitly opted into the dependency hell pattern, and thus IMO it is too much of a dependency chain vulnerability for something that I'd run :-(

Yes, that's my main issue with the current state of affairs w.r.t. Rust. I rather like the language itself, but I'm utterly baffled that many Rust people saw what was going on with npm and thought "sure, let's do more of that". (Ok, it's not quite as bad yet as leftpad, but still...)

resource usage concerns

taladar — Thu, 20 Mar 2025 09:13:00 +0000

Technically most applications have a few pieces of data that won't be accessed in the future but are still kept around, e.g. if you have an object that stores all your command line options including the listen IP and port and no restart mechanism those values likely won't be needed after the initial bind but will still be kept around.

A more elaborate example might be a work queue where a priority field is only used on enqueuing but still kept around until the task has been processed to completion.

Mostly that falls under your "is large enough to care about" but in general it is just a trade-off between being worth restructuring your entire application data structures to be able to free pieces you won't need independently and the amount of extra memory used.

Weakened license protection

taladar — Thu, 20 Mar 2025 09:06:52 +0000

Also, if using GNU means always being a whole patent expiry behind everyone else they might as well shut down the project now.

More robust oxidizr behavior?

riking — Thu, 20 Mar 2025 08:33:50 +0000

NixOS takes the position that env, sh, and ld-linux are in fact the only absolute paths to binaries you get:

$ ls /bin
sh
$ ls /usr/bin
env
$ ls /lib64
ld-linux-x86-64.so.2

resource usage concerns

NYKevin — Thu, 20 Mar 2025 04:19:46 +0000

A cache with a bad policy might not be a memory leak even under the uncomputable definition. For example, it might be the case that every element is eventually accessed, but most of them are only accessed incredibly rarely (so rarely that it would be cheaper to evict them and recreate them as needed).

The definition I use (at my day job as an SRE) is even more pragmatic: A program is leaking memory if, when you graph its memory usage over the last (e.g.) 12 hours, it's roughly a straight line going up and to the right. But that requires you to actually have real monitoring, which some people apparently don't.

resource usage concerns

NYKevin — Thu, 20 Mar 2025 04:13:32 +0000

Yes, sure, you can allocate and deallocate multiple separate blocks per function, but the point is that you *cannot* point to an arbitrary stack allocation and say "just deallocate that right now, without touching anything else." The physical structure of the stack is incapable of representing such an operation.

resource usage concerns

wahern — Thu, 20 Mar 2025 01:06:19 +0000

> Stack allocations always last until the function returns. That's not a Rust limitation, it's just how the stack works (at least, in any language that has a call stack)

That's not how C works. Automatic variables, *including* VLAs, are scoped to blocks. If they weren't, than you'd have problems with loops and stack overflow. Allocations using the common "alloca" builtin do last for the entire function, but VLAs were deliberately given different semantics.

resource usage concerns

NYKevin — Wed, 19 Mar 2025 23:05:16 +0000

Just to clarify for those less familiar with Rust:

Rust is a systems language with manual memory management, just like C, but drenched in a thick layer of syntactic sugar (to automatically free things when you're done using them) and static analysis (to detect when you free something before you're done using it). It does not make arbitrary decisions about when to deallocate things (contrast with a GC'd language, which does make such decisions). If the compiler did not deallocate something for you, it means that you have (knowingly or not) asked the compiler to keep that thing alive.

In most cases, if something no longer needs to exist, you can std::mem::drop() it, or just return from whichever scope owns the allocation. drop() is a safe function, meaning the compiler will not let you use anything that has potentially been dropped (by either means), and in fact drop() is really just a convenience function that takes ownership, does nothing, and immediately returns. You can't drop static variables or anything that you don't own.

There are objects with "more complicated" ownership models than that (a simple example being Rc/Arc), but those objects still have some notion of dropping (you can std::mem::drop() any variable, but if that variable is participating in some shared ownership chicanery, the shared allocation might outlive it).

There is also one other catch: Stack allocations always last until the function returns. That's not a Rust limitation, it's just how the stack works (at least, in any language that has a call stack). If a stack variable is moved from (or dropped early, but that's equivalent to moving it), what actually happens is that the variable's contents are memcpy'd into the new location, the drop flags are updated to indicate that the variable is now uninitialized garbage (and must not be dropped or otherwise used again), and the variable binding is deleted from the current namespace (so you can't use it again). But the stack allocation is still physically occupied until the function returns. This is rarely a problem because we usually allocate large objects on the heap (plus, the optimizer can do all sorts of things with the physical stack layout anyway).

resource usage concerns

excors — Wed, 19 Mar 2025 22:08:35 +0000

> ... what a ‘memory leak’ really is. If you try to define it formally you’ll probably end up with ‘allocated memory with no reachable references’

I think a more useful formal definition is "allocated memory that will not be accessed in the future". (Formal definitions are happy to rely on oracles that can see the future). "Unreachable" is just an approximation with the (very useful) property of being a computable function, so that's what practical GCs use.

But there are many variations of "reachable": referenced by another allocated object (cycles won't be collected), reachable from a root set (cycles will be collected), reachable from some integer on the stack that happens to look like a pointer (conservative vs precise), reachable even if you ignore weak references, etc. Those details are quality-of-implementation issues, they're not a fundamental part of what a memory leak is.

"Not accessed in the future" is much more fundamental. It's uncomputable in general, but a human (or sophisticated algorithm) can sometimes determine that a reachable object will never be used, and I think it's fair to call that a memory leak. Then you can say e.g. "A cache with a bad policy is another name for a memory leak" - it doesn't matter that the cache contents are technically reachable (https://devblogs.microsoft.com/oldnewthing/20060502-07/?p...)

(https://inside.java/2024/11/22/mark-scavenge-gc/ expresses the same idea: "An object is said to be live if it will be accessed at some time in the future execution of the mutator" and "GCs typically approximate liveness using pointer reachability". And e.g. https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf implements a "liveness-based oracle" in Java, by recording every allocation and memory access and then replaying the program, to test how a GC implementation compares against a theoretically optimal freeing of memory.)

Informally you'd add "...and is large enough and long-lived enough to care about" to the definition of memory leak, but that's very subjective. Neither GC nor RAII can completely save you from wasting memory on non-live objects, so you'll always end up having to profile and debug to find the ones worth caring about. (They'll save you a lot of effort compared to manual memory management, though.)

Weakened license protection

jmalcolm — Wed, 19 Mar 2025 21:38:17 +0000

I guess the "safe" language in the current GCC suite is Ada (GNAT).