Rewriting essential Linux packages in Rust

By Joe Brockmeier
February 12, 2025

Most Linux systems depend on a suite of core utilities that the GNU Project started development on decades ago and are, of course, written in C. At FOSDEM 2025, Sylvestre Ledru made the case in his main stage talk that modern systems require safer, more maintainable tools. Over the past few years, Ledru has led the charge of rewriting the GNU Core Utilities (coreutils) in Rust, as the MIT-licensed uutils project. The goal is to offer what he said are more secure, and more performant drop-in replacements for the tools Linux users depend on. At FOSDEM, Ledru announced that the uutils project is setting its sights even higher.

Ledru has been a Debian developer for more than 20 years, and is a contributor to LLVM/Clang as well as other projects. He is a director at Mozilla, but said that the work he would be talking about is unrelated to his day job.

Bread, woodworking... or Rust?

Ledru said that learning Rust was a project he started during the COVID lockdown. Some people chose to learn to bake, others took up woodworking, and he took up rebuilding all the LEGO kits in the house with his son. But he wanted a project in the evening that would help him learn Rust. "I have been surrounded by upstream Rust developers inside the Paris office of Mozilla, so I wanted to learn it myself." He didn't want to take on a side project that would just sit on his hard drive, he wanted to do something with impact.

"So I was thinking, what about reimplementing the coreutils in Rust?" Well before COVID, Ledru had worked on rebuilding the Debian archive using Clang, a project that is documented at clang.debian.net. Ledru said that he had been inspired by Chris Lattner's work on Clang. One of the core fundamentals of Clang, he said, is the philosophy that "if you have different behaviors than GCC, it's a bug."

Next, he asked the audience who knew which programs were in GNU coreutils. Most people knew at least one, many knew at least five. But he was pretty sure no one in the audience knew everything in the list of coreutils, unless they happened to be an upstream developer on the project. Ledru said that pr, used to format text for printing "on actual paper", is one of his favorite coreutils programs. To start on the project he selected "all the fancy ones" like chmod, chown, ls, and mkdir—the commands that people on Linux and macOS use almost every day for their work.

Near full completion

Now, five years later, Ledru said that he has more gray hairs, and the project has Rust replacements for all of the more than 100 commands in coreutils. The project has more than 530 contributors, and more than 18,000 stars on GitHub, "[they're] meaningless, but it's one of the metrics we have".

A more meaningful measurement of the project's success is how well it fares against the GNU coreutils test suite. He said that the project only passed 139 tests when it started testing in 2021, and it was now close to 500 out of 617. "I should have worked last weekend to be at 500, but I didn't have the time," Ledru said. (According to the JSON file of results, it crossed 500 tests passed on February 4, with 42 skipped and 75 failed.) He displayed a slide with a graph of test suite runs from April 2021 to late January 2025, shown below.

Most of the tests that the project still fails have pull requests to fix the problems, or may be things that "nobody cares about" as well as some "weird ordering" problems. For example, if a user tries to use rm in a directory without the appropriate permissions, the output may be slightly different. "GNU is going to show something first, and we show it at the end, and it can make a small difference".

More and more of the programs are passing all of the GNU tests, but that may not mean the Rust version is fully compatible with the GNU implementation because the test suite itself has some limitations. Ledru said that the project has been contributing to the GNU coreutils project when it finds things that are not tested.

He said that the Rust coreutils are now "production ready", and that they support Linux, FreeBSD, NetBSD, OpenBSD, Illumos, Redox, Android, macOS, and Windows. There are also Wasm versions. According to Ledru, the Rust coreutils are used by the Debian-based Apertis distribution for electronic devices, the Spectacles smartglasses, and Microsoft is using the project for Visual Studio Code for the web. The Serpent OS distribution is using them by default, he said. He noted that, since the project is open source, it is probably in use elsewhere without his knowledge and asked any company using them to let him know. "I'm always excited to know when people are using our code".

Why Rust

Ledru's talk was immediately after Miguel Ojeda's keynote on Rust for Linux. Ledru said that Ojeda had already covered some of the reasons for using Rust, but that he would offer his point of view as well.

As part of the release-management team for Firefox, he was involved when the browser started shipping with Rust code. That made him "completely biased" about the suitability of Rust for the browser use case. He pointed out that Chrome, the Linux kernel, and Microsoft Windows are starting to include Rust. "I'm going to state the obvious, that Rust is very good for security, for parallelism, for performance".

The idea to replace GNU coreutils with Rust versions was not about security, though, because the GNU versions were already quite secure. "They did an amazing job. They almost don't have any security issues in their code base." And it's not about the licensing, he said. "I'm not interested in that debate."

One of the reasons that Ledru liked Rust for this project, he said, is that it's very portable. He is "almost certain" that code he writes in Rust is going to work well on everything from Android to Windows. That is very surprising, he said, given the complexity of "everything we do" but the cost of porting a change to an operating system is small "thanks to the ecosystem and quality of Rust".

Ledru cited laziness as another reason for using Rust. "So if there is a crate or library doing that work, I'm going to use it. I'm not going to implement it [myself]." There are between 200 and 300 dependencies in the uutils project. He said that he understood there is always a supply-chain-attack risk, "but that's a risk we are willing to take". There is more and more tooling around to help mitigate the risk, he said.

He is thinking about "what we are going to leave to the next generation". Developers starting out don't want to use COBOL, Fortran, or C, he said. They want to work with fancy stuff like Rust, Swift, Go, or Kotlin. It is a good investment to start planning for the future now and transition to new languages for Linux and the computer ecosystem in general.

Demo time

Even though the goal is for the Rust coreutils to be drop-in replacements, Ledru said, "we take the liberty at times to differentiate ourselves from the GNU implementation". Here he showed a demo of the cp command with a --progress option that is not available with the standard GNU version of cp. He said it was available with the mv command too, and invited the audience to ask if there were other places the project should add it. "In Rust, it's pretty easy to add that".

He also walked through a demo that compared the Rust implementation of sort to GNU's. He used the hyperfine command-line benchmarking tool to run a test ten times; sorting a text file containing all of Shakespeare's works to see which implementation was faster. The first time he performed the test, he used a debug build of the Rust version of sort. In that demo, Rust's version was 1.45x faster than the GNU version. Then he ran the test again using a non-debug version, which showed the Rust version performing the test six times faster than GNU's implementation.

Currently, the project has continuous integration (CI) and build systems for most of its supported platforms, with almost 88% of the code covered by a test suite. "If you are above 80, you usually are very happy. Here we are even happier with nearly 90%". Despite trying to demonstrate how the Rust implementation was better than GNU's, Ledru stressed that there is a friendly collaboration between projects and that they have been sending bug reports and patches upstream for GNU coreutils.

What's next

Rewriting more than 100 essential Unix and Linux utilities in a new language is an impressive achievement. Some, upon nearing completion of such a project, might stand back and admire the work and think about calling it a day. Ledru, apparently, is not one of those people.

He displayed a slide with the familiar adage "the best time to plant a tree is 20 years ago, the second-best time is now", and talked a bit about the age of the Unix core utilities. The original Unix utilities will be 55 years old in four or five months, he said. Despite their age, the utilities (albeit newer implementations of them) live on and continue to evolve. Ledru pointed out that the GNU project continues to add new options to the coreutils and introduce new algorithms "and we are going to do the same".

In parallel, Ledru said that the project had started working on rewrites of GNU Findutils and GNU Diffutils. Those have been less of a focus, he said, but people have been doing great work on implementing those and improving their performance on the bfs test suite used to test find. Now, Ledru said, "I'd like to do the same with most of the essential packages on Debian and Ubuntu". There isn't a better time than now to start that task, he said. What are the essential packages? He displayed the command he uses to find them:

    $ dpkg-query -Wf '${Package;-40}${Essential}\n' | grep yes$

The list includes procps (/proc file system utilities), util-linux (miscellaneous system utilities), acl (access-control-list utilities), bsdutils (standard BSD utilities) and several others.

Ledru then published a blog post from the stage formally announcing the plan to rewrite other parts of the modern Linux stack in Rust. He said that many people were already contributing to the project, and uutils was using the work that has already been done for the coreutils in Rust.

There are many, many functions we have in the coreutils that can be used for other programs as part of that world. So when you need to mount a file system or when you need to look at permissions, we already have all those functions.

He said that the project has a lot of low-hanging fruit and good first bugs for contributors to get started if they would like to learn Rust. "It's the way that I learned, and that's why we have so many contributors". Projects like rewriting and reinventing the wheel might sound crazy, he said, but he thought it would work because there is an appetite from the community.

As I was saying earlier, I'm getting older. We all do, but the new generation is not going to want to do C. And paving the way to do Rust is also a good opportunity for them to be involved in the open-source ecosystem.

The old tools are still using old build pipeline systems, some do not have CI. They are still using mailing lists to send patches. "I apologize for the Linux developers here, I still think that using mailing lists to do a patch review sucks". With GitLab and GitHub, he said, there was an opportunity to have a proper pipeline with CI, quality tools, and so on, which is one of the advantages that uutils has over the old projects. "To be clear, that works for them, so I'm happy, but we can do better as a community."

Finally, he wanted to mention again that uutils is not a Mozilla project and has no company behind it, no interest from big tech. "I'm doing it as a passion because I care about Debian and Linux in general, it's really a community effort, there is no structure behind it". He then opened the floor for questions with a little bit of time remaining.

Questions

The first question was about portability, and whether Ledru was tracking the other efforts toward additional Rust compilers to extend uutils coverage. Ledru said "we will when they are ready". Another audience member asked if there were any plans to develop a shell, like Zsh. Ledru said no, he wanted to replace existing utilities written in C, not to rewrite a shell, that there was "no need in that space".

The question that followed was about Ledru's stance on utilities that are written in Rust that are part of the "rewrite in Rust trend" like ripgrep, bat, and others. He said that they are amazing projects, and he uses ripgrep daily, but they are not drop-in replacements.

The final question was about packaging. A member of the audience observed that it is really hard to package Rust packages due to dependency management, and wanted to know what Ledru's experience with that was and how to work around it in future. Ledru agreed that it was a big deal and that Rust was "quite hard to package", but he thought it was going to stabilize.

[I was unable to attend FOSDEM in person, but watched the talk as it live-streamed on Saturday. Many thanks to the video team for their work in live-streaming all FOSDEM sessions.]

Index entries for this article
Conference	FOSDEM/2025

A shell in Rust could be really cool

Posted Feb 12, 2025 14:35 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link] (25 responses)

Rust already has really good metaprogramming, the missing point for that sort of thing would be first class environments (from Kernel Scheme; the interesting thing to note here is that compiled languages already implement all the functionality of first class environments in an ad-hoc way as debug info) - and eval.

That would get us a shell with full access to the whole Rust library ecosystem.

And the thing I keep running into in shell programming is they always start as little bits of glue, but then as they grow (particularly in test environments, where I use them the most) the first thing you need is real error handling - and Rust is first in class there.

A shell in Rust could be really cool

Posted Feb 12, 2025 14:51 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (23 responses)

Microsoft did that with PowerShell. The result was "meh".

It sure sounds cool, but in practice, an interactive shell is not a good environment for general-purpose code.

A shell in Rust could be really cool

Posted Feb 12, 2025 15:07 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link] (22 responses)

And yet, we keep using our interactive shells for writing general-purpose code...

A shell in Rust could be really cool

Posted Feb 12, 2025 15:12 UTC (Wed) by dskoll (subscriber, #1630) [Link] (19 responses)

Yes, because it's so (too?) convenient and guaranteed to exist on pretty much any UNIX-like system.

It's a tough habit to break, but I generally switch to a real scripting language (Perl is my favorite... don't judge) if a shell script gets longer than about 15 lines.

A shell in Rust could be really cool

Posted Feb 12, 2025 16:45 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (8 responses)

On the opposite I wrote some complex programs (thousands of lines to manage network configurations) in shell for the sole reason that they had to be maintainable *and fixable* in the field by their own users. This is one often underestimated value of the shell. Every user knows a little bit of it, can easily test manually in their terminal, adapt the changes to the script and try again.

A long time ago when I begun with unix, my default shell was tcsh. I didn't understand why some of my scripts didn't work the same way on the command line. I thought this had to do with interactive vs non-interactive. When I realized that the shell was different, I switched to bash and lost a lot of stuff (it was 1.14 by then), but I started to write working scripts. When 2.05b arrived scripting abilities improved a lot! I once tried to switch to zsh because it was compatible with bourne, but admins hated it due to the way it displays completion and moves the screen up. And one difficulty was that you quickly get used to features that again don't work in your portable scripts. So I went back to bash, the (in)famous bug-again-shell that everyone loves to hate. But in the end, scripts written with it work basically everywhere. And if you need to get rid of it because you end up with a real bourne (like dash), that's not that dramatic, you spend a few hours getting rid of bashisms and that's done. Or the user installs bash or ksh, since their compatible subset is quite large.

That's why people continue to write (sometimes long) shell scripts: that's easy, portable and transferable to someone else to take over them. And they come with basically 0 dependencies (aside external programs) so that makes them future-proof contrary to many interpreted languages whose interpreter breaks compatibility, or whose libraries love to change API all the time by adding a new argument to a function...

A shell in Rust could be really cool

Posted Feb 13, 2025 9:09 UTC (Thu) by taladar (subscriber, #68407) [Link] (7 responses)

One thing that shouldn't be underestimated, however, is how easily shell scripts can break with changes in language and locale settings and it is a pain to debug.

A shell in Rust could be really cool

Posted Feb 13, 2025 12:50 UTC (Thu) by pizza (subscriber, #46) [Link] (6 responses)

> One thing that shouldn't be underestimated, however, is how easily shell scripts can break with changes in language and locale settings and it is a pain to debug.

This turned out to be the root cause of an infuriating intermittent bug I was trying to figure out -- nobody could give me a reproducible test case. Turns out they were using something other than English locales.

A shell in Rust could be really cool

Posted Feb 14, 2025 15:35 UTC (Fri) by taladar (subscriber, #68407) [Link] (5 responses)

Another "fun" bug to consider with shell (and similar text-based languages I guess) is what I mentally call the August/September bug that can happen when you using leading zeroes on stuff like months and aren't aware that you used it in a context (e.g. shell arithmetic) where leading zeroes signify octal.

A shell in Rust could be really cool

Posted Feb 14, 2025 23:51 UTC (Fri) by himi (subscriber, #340) [Link] (3 responses)

The constraints of shell arithmetic are a thing that has bitten me far too often (I've resorted to feeding stuff through `bc -l`, and occasionally even running calculations through python one-liners), but this is one I've never managed to run into - how exactly does this manifest?

A few simple experiments with bash suggest "08" used in an arithmetic expansion will error ('bash: 08: value too great for base (error token is "08")') - pretty much what I'd expect, though it'd take me by surprise the first time it happened (at least the error is nice and informative). Are you talking about something like iterating through 0-padded numeric months and hitting errors once you hit August?

A shell in Rust could be really cool

Posted Feb 15, 2025 0:16 UTC (Sat) by Wol (subscriber, #4433) [Link]

Sounds like dates in Excel. If you paste *text* into a cell, and then try to interpret it as a date, it barfs when fed a "wrong" date. I've just yesterday had to fix that problem ...

Copy a column of date-times from one spreadsheet to another. Format the date-time as a time. Save as a csv. If you're not in America the csv will contain a "random" mix of times and date-times. But all the date-times will not be a valid American date, for example today - 15/02/2025 - will be a date-time, while fast-forward a fortnight to 01/03/2025 and it will be correctly converted to a time.

The fix is easy - explicitly format/force the text into a European date, and everything then works, but it's a pain in the arse. Especially if every time you hit the problem, you've forgotten the previous time ...

Cheers,
Wol

A shell in Rust could be really cool

Posted Feb 17, 2025 13:42 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

You get exactly that error but if you e.g. write a script in e.g. March and test it only with dates occurring while you write it (and maybe 01 and 12 for over/underflow edge cases) you can easily end up with a script that breaks with that error in August or September (month 08 and 09).

The solution is easy once you know about it, you can simply prefix every variable in the arithmetic expression with 10# to force base 10 but you need to know about.

A shell in Rust could be really cool

Posted Feb 18, 2025 21:46 UTC (Tue) by himi (subscriber, #340) [Link]

Ah, thanks for that explanation, it makes sense, and adds one more thing to the list of gotchas for me to keep in mind with shell code . . .

I try to use unix timestamps ($(date +%s)) whenever I'm dealing with times and dates of any sort in shell scripts, and then use $(date -d @...) to convert to something human readable - not particularly portable, but I'm fortunate enough to only really deal with Linux systems (aside from an occasional Mac, which has caused me serious headaches at times).

I do sometimes use sub-second precision (e.g. $(date +%s.%N)) for higher resolution timing, which then necessitates using bc -l or similar for any calculations. That recently bit me, in combination with scoping issues - I was mixing second and sub-second precision and the (perfectly logical) variable name 'now', and getting errors from arithmetic expansions when a sub-second precision value was still active where I thought it should be second precision . . . Fixing that just required fixing the scoping and naming issues ('local' is one of my favourite keywords), but it took *far* to long to spot the bug.

And to bring this back on topic, that's one monstrosity of a shell script that *absolutely* needs to be rewritten - though probably not in Rust, sadly, since I'm the only person in my team who'd have any hope of working on a Rust version . . .

A shell in Rust could be really cool

Posted Feb 17, 2025 21:12 UTC (Mon) by raven667 (subscriber, #5198) [Link]

...And 8am, I just fixed a bug in a cron job that was rotating some archives and would truncate the timespec where I want the leading zero (eg 0600-0859) for the filename `timenow=$( date +%H )`, just looked it up and I fixed it by referencing the previous hour as `$(( ${timenow#0} - 1 ))` which forced the evaluation to not be octal. good times.

A shell in Rust could be really cool

Posted Feb 13, 2025 2:21 UTC (Thu) by interalia (subscriber, #26615) [Link]

*judging intensifies*

A shell in Rust could be really cool

Posted Feb 13, 2025 6:01 UTC (Thu) by marcH (subscriber, #57642) [Link] (8 responses)

No shell script should be longer than some number of lines and that threshold is... "it depends" but it's always longer than 15 lines....

Error handling is a real pain in shell (just like in... C) but "set -e" and "trap EXIT" mitigate a bit.

The truth is: the shell ability to "glue" processes, files and pipes is still unmatched. Why? Simply because that's exactly why it was designed for. Everyone who has ever used Python's "subprocess" knows that. Also, "shellcheck" is incredibly good; saved me hours and hours of code reviews thanks to: "please run shellcheck, it will tell you what's wrong with this line". Last but not least: the shell is a decent... functional language! You can put a function in a variable which is amazing and very useful for a language that limited otherwise.

A shell in Rust could be really cool

Posted Feb 13, 2025 7:13 UTC (Thu) by himi (subscriber, #340) [Link] (7 responses)

> Also, "shellcheck" is incredibly good; saved me hours and hours of code reviews thanks to: "please run shellcheck, it will tell you what's wrong with this line".

I would like to second, third and fourth this (and then some) - shellcheck is amazing, it makes creating maintainable code in shell massively more practical.

The biggest issue I have with shellcheck is actually that it's /too/ good - it dramatically raises the threshold where my gut starts screaming at me that it's essential to rewrite this thing (some random hack grown into a monstrosity) in a "real" language, leaving me with constant cost/benefit anxiety. I've caught myself more times than I'd like to admit thinking "This really needs to be redone in Python . . . . but the incremental cost of adding X feature to the shell code is less than the cost of completely rewriting 1000+ lines of shell code from scratch, so . . . "

Without shellcheck making such a massive improvement in the quality, consistency and coherence of my shell code that would never have become a thing for me - it truly is both a blessing and a curse.

A shell in Rust could be really cool

Posted Feb 13, 2025 9:11 UTC (Thu) by taladar (subscriber, #68407) [Link] (6 responses)

I will never understand why people even consider Python when rewriting a shell script considering shell runs basically everywhere and any given Python version runs basically nowhere in terms of distros over time.

A shell in Rust could be really cool

Posted Feb 13, 2025 10:01 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

Because I don't need my quick and dirty script to run everywhere. I need it to run on the one system that is sitting right in front of me. That system has a reasonably recent Python, and Python is far less painful to program in than bash or other shells, so I will write scripts in Python.

A shell in Rust could be really cool

Posted Feb 13, 2025 12:48 UTC (Thu) by pizza (subscriber, #46) [Link]

> Because I don't need my quick and dirty script to run everywhere.

"It's only temporary if it doesn't work"

A shell in Rust could be really cool

Posted Feb 13, 2025 12:48 UTC (Thu) by pizza (subscriber, #46) [Link]

> I will never understand why people even consider Python when rewriting a shell script considering shell runs basically everywhere and any given Python version runs basically nowhere in terms of distros over time.

...And given how perl goes through great pains to be "bugward" compatible indefinitely.

(I have literally decades-old perl scripts that still JustWork(tm). During the python2->python3 migration debacle I decided to just rewrite a lot of p2 (and early p3) into perl instead, and it's not needed to be touched since then. Ran faster, too)

A shell in Rust could be really cool

Posted Feb 13, 2025 18:43 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

If you just stick to the built-in Python library (subprocess and the general IO), then you can get reasonable coverage across modern distros.

A shell in Rust could be really cool

Posted Feb 13, 2025 22:53 UTC (Thu) by himi (subscriber, #340) [Link] (1 responses)

Because my quick and dirty hack that grew into a thousand line monstrosity isn't portable? Either in the sense of the tooling and configuration it uses, or in applicability.

It may have been moderately portable when it really was a quick hack, but even then it was probably only used in exactly one location and probably wasn't very useful anywhere else. The growth pattern that turned it into a monstrosity certainly didn't increase portability - it made it into something with even more detailed ties to the environment it's used in.

Shell /is/ good for quick and dirty hacks that are moderately portable, far better than Python. But quick and dirty hacks tend to keep getting hacked on, and the point of the whole thread here is where you stop hacking on the shell code and rewrite it in something that's a bit more amenable to proper software engineering practises . . .

A shell in Rust could be really cool

Posted Feb 14, 2025 2:05 UTC (Fri) by marcH (subscriber, #57642) [Link]

> But quick and dirty hacks tend to keep getting hacked on, and the point of the whole thread here is where you stop hacking on the shell code and rewrite it in something that's a bit more amenable to proper software engineering practises . . .

Note this is not so specific to shell. How many times have we looked at some piece of code and thought "This has grown organically, it should really be rewritten". Of course the bar tends to be much higher when a language switch is required, notably because the _incremental_ change possibilities are much more limited. But at a very high level it's the same https://wiki.c2.com/?PlanToThrowOneAway logic.

A shell in Rust could be really cool

Posted Feb 12, 2025 15:25 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Who "we"? Shell overuse is really a phenomenon seen in low-level packages, where people for some reason don't want to reach out to perl/python/whatever.

A shell in Rust could be really cool

Posted Feb 12, 2025 16:10 UTC (Wed) by stijn (subscriber, #570) [Link]

In bioinformatics there are many pipelines where the units are processes/scripts/programs. They might be sequence aligners, statistics packages, clustering programs and a whole lot more. Typically these do not come as libraries attached to a language but as separate programs. These days they are usually orchestrated with tools such as Nextflow, Snakemake, Cromwell and others. There is still a fair bit of use for shell scripting when manipulating the outputs (which are usually big dataframes). It gets tiresome very quickly to write a python script with parameters that loads a large dataframe into memory when all you want to do is some stream data processing. The downside is the marshalling/unmarshalling of data. I wouldn't mind something like Apache Arrow formatted data that I could compose with unix pipes in a shell-like environment. It might exist, I haven't checked as there is more than just one obstacle.

A shell in Rust could be really cool

Posted Feb 12, 2025 21:20 UTC (Wed) by walters (subscriber, #7396) [Link]

There is https://www.nushell.sh/ and I use it as my interactive shell, which I'm *mostly* happy with - but man I have to say the biggest problem is I often spawn "stock" containers and virtual machines and those all default to bash so there's this unavoidable context switching.

But on the topic of scripts and testing - in bootc I started to try to use nushell for some of our integration tests which are *mostly* just forking external processes, but have about 10% where you really want things like arrays.

You can get a good sense of what this is like from e.g. this test:
https://github.com/containers/bootc/blob/main/tests/boote...

A notable plus is that it's super easy to parse JSON from a subprocess without falling back to jq and thinking about its mini-language and then parsing that back into bash.

But a downside is we've been caught twice by subtle incompatible nushell changes:
- https://github.com/containers/bootc/pull/911/commits/6f80...
- https://github.com/containers/bootc/pull/937/commits/28e8...

I hope this won't happen too much more often...

---

If you want to approach this a different way and write a regular Rust program but fork subprocesses with as little ceremony and overhead as possible, I heartily recommend https://crates.io/crates/xshell

It's really cool because it makes use of Rust macros to do what you can't easily do in Go or Python - correctly quote subprocess arguments referring to Rust local variables.

fix uniq -c

Posted Feb 12, 2025 15:37 UTC (Wed) by stijn (subscriber, #570) [Link] (17 responses)

Sounds like a good project; itches scratched and flowers blooming. uniq -c is one of the most irritating commands I know, as the first (count) field is right-justified without any option available to avoid this white-space padding. It is quite puzzling that Richard Stallman let this program loose on the world as it violates usual Unix well-behavedness of textual interfaces. I will make my way over to uutils to see if it has righted this wrong by providing such an option or is open to such. Checking the manual page I see A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars. which again makes me shudder as it seems to indicate failure to respect empty fields in tab-separated data.

fix uniq -c

Posted Feb 12, 2025 19:05 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (2 responses)

Not the author but I imagine his stance is the same as for ripgrep. Writing "better" utilities that fix old wrongs is a different undertaking. These utilities are meant to be drop-in replacements and that means compatibility which means replicating old behaviour regardless of design quality. He is open to extensions like --progress but those are window dressing and not changes in behaviour.

fix uniq -c

Posted Feb 13, 2025 13:30 UTC (Thu) by stijn (subscriber, #570) [Link] (1 responses)

It would equally be possible to add new options that add new behaviour, such as for example outputting well-formatted output, that is, usable for further consumption by adhering to a standard table format such as tab-separated data. I'll enquire.

fix uniq -c

Posted Mar 1, 2025 17:00 UTC (Sat) by jkingweb (subscriber, #113039) [Link]

The trouble with writing a drop-in replacement for X while trying to also improve on X is that as X changes, you may cease to be a drop-in replacement because your enhancements become incompatible with new functionality. MariaDB ran into this problem.

fix uniq -c

Posted Feb 12, 2025 19:06 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

You may dislike that interpretation of "field," but it is specified in the POSIX standard[1]:

> A field is the maximal string matched by the basic regular expression:
> [[:blank:]]*[^[:blank:]]*

But note that this is describing how the -f flag works, not the -c flag (which is also the case for the man page you quote).

As far as I can tell, POSIX makes no allowance whatsoever for right-justifying the -c output, and in fact specifies the opposite:

> If the -c option is specified, the output file shall be empty or each line shall be of the form:
> "%d %s", <number of duplicates>, <line>

(%d means "a number in decimal," not "a number in decimal, but possibly with some whitespace in front of it.")

If a uniq implementation right-justifies its -c output, that is either a bug or a deliberate non-conformance to the standard. I would suggest reporting it upstream if you have not already done so.

[1]: https://pubs.opengroup.org/onlinepubs/9799919799/utilitie...

fix uniq -c

Posted Feb 13, 2025 13:55 UTC (Thu) by stijn (subscriber, #570) [Link]

> But note that this is describing how the -f flag works, not the -c flag (which is also the case for the man page you quote).

Yep I know. It is a horror I noted in passing.

> You may dislike that interpretation of "field," but it is specified in the POSIX standard[1]:

I very much dislike that POSIX standard then. POSIX probably codified existing behaviour, but anyway it does not matter. Clearly existing options have to continue to carry the meaning that they have, I'm not tilting at that windmill.

Coreutils combined with utilities such as datamash can be quite powerful in composing filters, maps or computations in a streaming paradigm, but it does require having a meaningful definition of field. One workhorse in scientific computing is the dataframe encoded as tab-separated data with column headers. This format (bar column headers) is already half-embraced by a lot of unix utilities, I hope this progresses further. A good check is always 'does it work for the empty array / empty string', and this is required to make the tab-separated format usable.

> As far as I can tell, POSIX makes no allowance whatsoever for right-justifying the -c output, and in fact specifies the opposite:

In the savannah gnu git repo the code for uniq has 'if (count_occurrences) printf ("%7jd ", linecount + 1);'

http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;...

I'm not interested in left-justification either - datamash -g 1 count 1 is a more usable alternative.

fix uniq -c

Posted Feb 15, 2025 22:59 UTC (Sat) by pdewacht (subscriber, #47633) [Link]

The "-c" behavior would be a bug in POSIX I assume, since historic Unix uses the equivalent of "%4d %s" and I can't find anybody who has adopted the strict POSIX reading.

fix uniq -c

Posted Feb 16, 2025 16:33 UTC (Sun) by ck (subscriber, #158559) [Link] (3 responses)

You might be setting yourself up for disappointment. uutils is aims to be bug for bug compatible with coreutils. Any deviation is considered a bug.
While I generally like the push to abolish the old coreutils, this compatibility makes it a huge waste of time on my opinion.

fix uniq -c

Posted Feb 17, 2025 13:52 UTC (Mon) by taladar (subscriber, #68407) [Link] (2 responses)

The compatibility is a necessary first step to make anyone consider replacing the old coreutils with these new ones. Once that replacement has happened everywhere I am sure some adjustments could be made (in 10-20 years).

fix uniq -c

Posted Feb 17, 2025 13:55 UTC (Mon) by pizza (subscriber, #46) [Link] (1 responses)

...Ah yes, the classic embrace-extend-extinguish approach.

I thought that was a bad thing?

Embrace-extend-extinguish

Posted Feb 17, 2025 14:15 UTC (Mon) by farnz (subscriber, #17727) [Link]

That depends on the licence of the new project; when Linux embraced, extended and then extinguished proprietary UNIXes, that was a good thing because it moved from a world of gatekeepers to an open source world. When Microsoft attempted to embrace-extend-extinguish the open web with ActiveX, that was a bad thing because it would move us from an open world, to one in which you paid Microsoft as a gatekeeper for technology access.

fix uniq -c

Posted Feb 16, 2025 16:57 UTC (Sun) by dskoll (subscriber, #1630) [Link] (6 responses)

... | uniq -c | sed -e 's/^ *//'

You could wrap that in a shell function or script, I guess.

fix uniq -c

Posted Feb 17, 2025 9:30 UTC (Mon) by stijn (subscriber, #570) [Link] (5 responses)

> ... | uniq -c | sed -e 's/^ *//'

The thought had occurred to me. Implicit in my point here is that this is a fudge, easily fixed by adding a new option to uniq that does the right thing. Shell programming with unix pipes can be an elegant and very concise way to mutate data in a functional way. Having to include fudges like the above (at the expense of a process) grates and creates an impression of crummy (shell) programming that should be completely unnecessary.

fix uniq -c

Posted Feb 17, 2025 11:18 UTC (Mon) by mbunkus (subscriber, #87248) [Link] (4 responses)

I fail to see how this is a problem, let alone a bug. The most important thing here is that the output of such programs can be meant for widely different audiences, each with their own peculiarities:

• humans: we need data formatted so that it visually very clear where columns start & end. We also prefer to be able to determine at a glance when a number is bigger than another number. This means that columns must be aligned in the first place in order to satisfy the first requirement, and for numbers right-aligning satisfies the second. We (humans) might even profit from table borders.
• other programs: here it depends on what the other program is & what it expects as input. For example, if you want to process it further via pipes then then awk & bash don't care at all about the right-aligned numbers[1], whereas other programs might. If your goal isn't pipe-processing but e.g. copy-pasting into spreadsheets, then CSV-formatted data might be much better (though that would make processing in awk/bash much harder)

You cannot satisfy all those requirements with a single format. Therefore I consider your argument to be completely wrong. The default output for uniq is to be easily readable by humans. That's a design choice. It's not a bug.

[1] Examples with bash:

[mosu@velvet ~]$ printf "moo\nmoo\ncow\n" | uniq -c | awk '{ sum += $1 } END { print sum }'
3
[mosu@velvet ~]$ printf "moo\nmoo\ncow\n" | uniq -c | ( while read line ; do set - $line ; echo $1 ; done )
2
1
[mosu@velvet ~]$

fix uniq -c

Posted Feb 17, 2025 18:46 UTC (Mon) by stijn (subscriber, #570) [Link] (2 responses)

I chose a very poor title ('fix unic -c') for what I wrote, implying the presence of a bug. What I meant was

- current default behaviour of uniq -c is poor for composing.
- let's add an option so that we can have the behaviour that I like.

With this, we can have both a 'visually clear' format and a suitable-for-composing format. For compatibility the current format is of course the default in that scenario.

> For example, if you want to process it further via pipes then then awk & bash don't care at all about the right-aligned numbers[1], whereas other programs might.

I work a lot with dataframes, which are essentially mysql tables in tab-separated format with column headers, or equivalently a single table in a spreadsheet, or the things you might want to read with Python pandas or in R. Tab separated is preferred, as I've never encountered a need to escape embedded tab characters. In this wider ecosystem there is no automatic white-space scrubbing of data and a there is a requirement that tables are well-formatted. Programs such comm, join, datamash, shuf and a fair few more can be very handy in summarising, QC'ing or (even) manipulating this data. Hence I clamour for the ability (not necessarily as default) to have all tuple/table type data formatted as tab-separated tables, with or without column names. This should go well with unix composability of processes.

fix uniq -c

Posted Feb 17, 2025 19:18 UTC (Mon) by mbunkus (subscriber, #87248) [Link] (1 responses)

Alright, I think I understand where you're coming from a lot better now. It wasn't just your title, though; in your first comment you wrote:

> It is quite puzzling that Richard Stallman let this program loose on the world as it violates usual Unix well-behavedness of textual interfaces.

And to that my argument was that first and foremost `uniq -c` was most likely designed to be easy to read by humans. By that metric it is very much well-behaved & doesn't violate anything. Furthermore, even with it being designed to be human-readable its output is actually useable as-is by a lot of other traditional Unix programs, making it arguably even less of a "wrong" that has to be "righted" (your choice of words, again from your first post). Apart from awk & bash which I mentioned earlier, "sort -n" works fine as well.

fix uniq -c

Posted Feb 17, 2025 20:15 UTC (Mon) by stijn (subscriber, #570) [Link]

I let a pointless/misdirected rhetorical flourish get the better of me again. I understand the history - and view it as beneficial to strive towards (recasting it from wrong/right) to make outputs (tuples/tables) preferably by default composable or at least add options to do that without assumptions about white-space padding. The table format (mysql dump, spreadsheet, dataframe) is pervasive and super useful. In the right setup shell pipelines can do amazing things with them. Working in those type of table environments has made the various stripes of unix white-space padding and stripping jar with me.

fix uniq -c

Posted Feb 18, 2025 8:11 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

I still think the current behavior is bad.

> • humans: we need data formatted so that it visually very clear where columns start & end. We also prefer to be able to determine at a glance when a number is bigger than another number. This means that columns must be aligned in the first place in order to satisfy the first requirement, and for numbers right-aligning satisfies the second. We (humans) might even profit from table borders.

Then why is it a static number of columns wide? I know to make it actually the right width, the entire output needs to be known so that you can't output anything until the whole thing is read, but if human viewing is most important, why not buffer and Do It Right™? Either the output is small and quick enough to not really matter or it is so large that…what human is really going to be looking at it directly anyways?

> • other programs: here it depends on what the other program is & what it expects as input. For example, if you want to process it further via pipes then then awk & bash don't care at all about the right-aligned numbers[1], whereas other programs might. If your goal isn't pipe-processing but e.g. copy-pasting into spreadsheets, then CSV-formatted data might be much better (though that would make processing in awk/bash much harder)

Sure, awk and bash do separator coalescing. But `cut` doesn't, so one needs to `sed` before `cut`, but not before `bash` and `awk`. Great. Yet another paper cut to remember in shell scripts. Given that `bash` and `awk` do support the mechanism that `cut` would understand and the general human-ness usefulness is of questionable quality…it really seems like an unnecessary quirk of a tool's output.

Supply-chain-attack risk

Posted Feb 12, 2025 16:46 UTC (Wed) by excors (subscriber, #95769) [Link] (13 responses)

> There are between 200 and 300 dependencies in the uutils project. He said that he understood there is always a supply-chain-attack risk, "but that's a risk we are willing to take".

This sounds unfortunately similar to Lord Farquaad from Shrek: "Some of you may die, but it's a sacrifice I am willing to make". The developers choosing to add convenient dependencies aren't going to be the ones suffering the costs of any such attacks, so I think there's some misalignment of incentives.

Hopefully any companies or large projects making use of this will be aware of the risk they're taking on, and will contribute to the tooling and code review and maintenance efforts needed to mitigate it. (Is there much activity around cargo-crev/cargo-vet nowadays, and is there anything better than that?)

Supply-chain-attack risk

Posted Feb 12, 2025 19:17 UTC (Wed) by lunaryorn (subscriber, #111088) [Link] (1 responses)

As far I know Google and Mozilla actively use vet and routinely audit crates, and some other ecosystems (eg Gnome) at least plan to. For non-commercial projects it's always a matter of resources tho, auditing 3rd party code isn't exactly fun after all, not something that folks enjoy doing in their free time. Ideally, someone like Microsoft who are apparently using uutils in one of their products would work towards vetting the dependency tree.

That said, I believe that adding cargo-vet to Rust project is universally a good idea, even if dependencies are routinely exempted. At least it makes the cost of dependencies visible, and helps to separate the big and trustworthy crates (eg libc, rustix, Tokio, etc.) from small. little know dependencies which need a closer look.

Supply-chain-attack risk

Posted Feb 13, 2025 9:17 UTC (Thu) by taladar (subscriber, #68407) [Link]

I would generally consider the big ones the ones that need a closer look in terms of the question if you actually need them. Small dependencies have a high likelihood that you use a significant part of their functionality while big ones tend to be the ones where you might just use a small percentage and pull in a large tree of their own dependencies with them that are unrelated to what you are doing.

Supply-chain-attack risk

Posted Feb 12, 2025 19:43 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (9 responses)

I am not sure that I totally agree that "developers choosing to add convenient dependencies aren't going to be the ones suffering the costs of any such attacks". With that logic, developers have no incentive to prevent bugs in their code. After all, they are not the ones that are going to encounter them.

Supply-chain attacks are a legitimate concern and I completely agree with your second paragraph. That said, there are benefits and I would like to see solutions around how to do it rather than treating having distributed dependencies as a design flaw.

I mean, we could say the same thing about dynamic linking (and some do I realize) or having any dependencies at all. I mean, I hope that the users of all software realize the risks that they are taking when trusting some random C library or C++ compiler to provide foundational functionality to the applications that they compile. Reflections on trusting trust I guess.

Supply-chain-attack risk

Posted Feb 12, 2025 19:54 UTC (Wed) by jmalcolm (subscriber, #8876) [Link]

By "random C library", I mean standard C library (eg. glibc, musl, BSD libc, Solaris libc, MS CRT, libcmt, runtime.c, etc). Similarly for the C++ standard library (often provided by the compiler).

Supply-chain-attack risk

Posted Feb 12, 2025 21:39 UTC (Wed) by excors (subscriber, #95769) [Link] (7 responses)

> I am not sure that I totally agree that "developers choosing to add convenient dependencies aren't going to be the ones suffering the costs of any such attacks". With that logic, developers have no incentive to prevent bugs in their code. After all, they are not the ones that are going to encounter them.

I think the same logic arguably does apply in that case too, and explains why developers have kept using C for so long. There is certainly _some_ incentive to prevent bugs in their code - pride, reputation, contracts, etc - but the developers aren't the people whose hospital is disrupted by ransomware exploiting a code execution vulnerability in some packet parsing code. If they felt those consequences directly, they might have been more eager to adopt memory-safe languages decades ago.

Many are adopting Rust now, but I think that's not primarily because of memory safety: it's because Rust is generally a much nicer language to work in than C/C++, with a helpful type system and good performance and modern package management and IDEs and all the other stuff that developers enjoy. The memory safety is a compelling bonus, but I believe very few people would use Rust if it wasn't better at everything else too.

Regarding third-party dependencies, C/C++'s lack of good package management means developers face a significant cost to pulling in dependencies: you've got to download tarballs and fight with incompatible build systems and keep on top of API changes and do it all differently for Windows and it's a big pain, so you're likely to limit yourself to a few widely-used libraries that are too large and/or too tricky to write yourself (and are therefore probably maintained by a team). For anything not so large/tricky, it's easier (though still a pain) to rewrite it yourself inside your application, or just remove the feature from your application because it's too hard to support.

It's a happy accident that the cost to developers (caused by poor package management) correlates with the cost to users (from the wider supply-chain attack surface), so the incentives are aligned and it works out okay in practice. But Rust greatly reduces the cost to developers of adding a tiny dependency (or 300) from some random guy on GitHub, without reducing the risk to users from each dependency, so it creates an imbalance that needs to be addressed somehow. Hopefully cargo-vet helps by both increasing the cost to developers (not much, but they're at least aware of the lengthening code review backlog every time they add a dependency) and reducing the risk to users (when any code reviews are actually performed).

Supply-chain-attack risk

Posted Feb 12, 2025 22:19 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

One good feature of cargo vet in this respect is the ability to import trusted audits and maintain a registry of significant auditing entities; this enables a "big" entity who cares (like Google, for example), to publish a list of audits they've done that smaller projects can import.

Assuming that big projects adopt cargo vet, this allows smaller projects to "ride on their coattails" and shrink the exemptions list by trusting Google, Mozilla, or other big names to provide audits.

Supply-chain-attack risk

Posted Feb 12, 2025 22:48 UTC (Wed) by koverstreet (✭ supporter ✭, #4296) [Link] (3 responses)

The next incremental improvement will be adding gpg style "web of trust"; you trust people you know who vet their dependencies, and recursively trust the people they trust (by an amount that falls off with distance, people that trust them, what have you).

Couple that with tools that show you "you have x dependencies that haven't been sufficiently vetted" (or have had significant changes since then), and we could efficiently farm out the auditing.

Supply-chain-attack risk

Posted Feb 13, 2025 0:11 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Not really a great idea... You'll be trusting Kim Jong-Un and Putin after about 3 levels of handshakes.

Supply-chain-attack risk

Posted Feb 14, 2025 23:23 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

I don't know…having them review Rust crates for issues might help distract them from their other more…problematic behaviors.

Supply-chain-attack risk

Posted Feb 13, 2025 10:14 UTC (Thu) by farnz (subscriber, #17727) [Link]

That already exists, separately to cargo vet - it's provided by cargo crev, which aims to crowdsource review.

Supply-chain-attack risk

Posted Feb 13, 2025 9:20 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

There is a hidden cost to those internal versions of functionality that could be a well vetted and tested dependency too though, they are just generally not considered by projects trying to argue that any dependency is bad. It is much more likely that your half-thought out internal code you just wrote for a single user has some bad design flaws and security holes and critical bugs than a library used by thousands of others.

Supply-chain-attack risk

Posted Feb 14, 2025 6:19 UTC (Fri) by raven667 (subscriber, #5198) [Link]

true, that's why you have to weigh the cost of writing your own in-house utility library vs accepting an external dependency tree and it's a judgement call specific to yourself and your team's capability on which you choose. having a read through the library you are considering can help clarify if its something that adds sufficient value, and extends your capabilities or if it's over-complicated for what you need, a future liability for you, and a more focused in-house tool that uses the built-in functionality of the language runtime would be better. Either way you aren't exactly building the universe from scratch, you are still depending on the language, operating system, kernel, hardware, etc. built by tens of thousands of people, no one is writing code by manipulating magnetic flux with a steady hand or anything.

Supply-chain-attack risk

Posted Feb 14, 2025 6:56 UTC (Fri) by hsivonen (subscriber, #91034) [Link]

Check out the “Audit” tab for the crates that you find interesting to spot check on https://lib.rs . The popular ones tend to have an audit from Google, Mozilla, or the Bytecode Alliance. It also shows whether an old version has been packaged in Debian or Guix at some point.

Of course, being packaged for Debian involves less auditing than people generally believe, but it’s somewhat inconsistent to believe that C code becomes trustworthy by distro packing and not believe the same of Rust code.

(Furthermore, dependencies don’t tend to be random crates from random authors but there’s a cluster of popular dependencies by people who have had community name recognition for a while.)

using coreutils in production

Posted Feb 12, 2025 17:14 UTC (Wed) by garyguo (subscriber, #173367) [Link] (8 responses)

We actually use uutils in production, and I think this is an interesting story to share.

We had some scripts that invoke `nproc` to determine the parallelism to use, and we want to move this into containers. Then we realized that in containers with resource restrictions, `nproc` from coreutils still report number of total cores, but not the amount of CPUs available to it by the cgroup. It turns out that the nproc haven't received meaningful updates since its inception and obviously doesn't know what to do with cgroups.

The fix? We simply replace coreutils nproc with uutils nproc in the container. uutils nproc simply calls into Rust std's available_parallelism, which is cgroup-aware.

using coreutils in production

Posted Feb 12, 2025 17:27 UTC (Wed) by intelfx (subscriber, #130118) [Link] (7 responses)

> We had some scripts that invoke `nproc` to determine the parallelism to use, and we want to move this into containers. Then we realized that in containers with resource restrictions, `nproc` from coreutils still report number of total cores, but not the amount of CPUs available to it by the cgroup. It turns out that the nproc haven't received meaningful updates since its inception and obviously doesn't know what to do with groups.

You might have been using a very old coreutils. I've been relying on this behavior of coreutils' nproc for a few years already, and as far as I can tell, it very much exists:


# pacman -Qo nproc
/usr/bin/nproc is owned by coreutils 9.6-2

# nproc
8

# systemd-run --pipe -p AllowedCPUs=0,1 nproc
Running as unit: run-p1960742-i1961042.service
2

using coreutils in production

Posted Feb 12, 2025 18:40 UTC (Wed) by jengelh (guest, #33263) [Link]

Jan 13 2010 coreutils-8.4.tar.gz.
Using 15-year old systems as a base to justify today's decisions... *sigh* that's a new low.

using coreutils in production

Posted Feb 12, 2025 19:09 UTC (Wed) by garyguo (subscriber, #173367) [Link] (4 responses)

> You might have been using a very old coreutils.

No, coreutils 9.5 still has the same issue.

`AllowedCPUs` restricts what CPUs can be seen, so nproc reports what you expected. This is true even with very old nproc. The difference is `CPUQuota`, which is what gets used for k8s resource limits.

$ systemd-run --pipe -p CPUQuota=200% nproc
32

$ systemd-run --pipe -p CPUQuota=200% uutils-nproc
2

using coreutils in production

Posted Feb 12, 2025 19:13 UTC (Wed) by intelfx (subscriber, #130118) [Link] (3 responses)

That's working as intended, then? CPUQuota=200% means 2 seconds total CPU time per second *across all cores*, so it might be argued that the optimal way to utilize that quota is still to spawn a thread/process per core and let them share the quota.

using coreutils in production

Posted Feb 12, 2025 19:36 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (1 responses)

> so it might be argued that the optimal way to utilize that quota is still to spawn a thread/process per core and let them share the quota.

I'm skeptical of that. It seems like you should get preempted more often and lose some performance to context switching (as compared to the approach of dividing CPUQuota by 100% to get the optimal number of threads, so that each thread has enough quota to run for the maximum allowable timeslice without preemption).

using coreutils in production

Posted Feb 12, 2025 19:37 UTC (Wed) by intelfx (subscriber, #130118) [Link]

On the other hand, this is certainly the most energy-efficient interpretation.

Correct behaviour under a CPUQuota= setting

Posted Feb 12, 2025 19:53 UTC (Wed) by farnz (subscriber, #17727) [Link]

It depends on whether you've got bursty work, or saturate your threads.

If your workload is bursty - a lot of work for a small fraction of the period used by CPUQuota, then idle for the rest of the period - you might want a thread per core so that when work comes in, you spread across all cores, get it done, and go back to idle. If your workload saturates all possible threads - so it'll be throttled by CPUQuota - you usually want just enough threads to allow you to saturate your quota, and no more (e.g. 3 threads for a quota of 250%); doing so means that you benefit from the cache effects of holding onto a single CPU core for longer, rather than being throttled most of the time and hitting a cooler cache when you can run.

And if you're I/O bound, chances are good that you can't make good use of a large number of threads anyway, because you're spending all your time stuck waiting for I/O on one thread or on 128 threads. You might as well, in this situation, just use 2 threads and save the complexity of synchronising many threads.

I thus believe it's rare to want thread per CPU core when you have a CPUQuota control group limit.

using coreutils in production

Posted Feb 13, 2025 0:08 UTC (Thu) by sionescu (subscriber, #59410) [Link]

Interestingly, if I run that on a Fedora it returns 2 when running as root, but the total number of cores when run with --user.

Reasons for speedup?

Posted Feb 12, 2025 18:00 UTC (Wed) by jreiser (subscriber, #11027) [Link] (6 responses)

> sorting a text file containing all of Shakespeare's works ...
> the Rust version performing the test six times faster than GNU's implementation.

Why? (Algorithm strategy? Parallelism of CPU? Of I/O?) In what environment? (The same GNU code works in 16-, 32-, and 64-bit environments. 32-bit MS Windows offers only a cramped 2GiB of user address space, while the input and data structures fit easily in a minimal 64-bit machine with 4 GiB of RAM.)

Reasons for speedup?

Posted Feb 12, 2025 19:29 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

I have not looked at the code at all. But there's one straightforward answer that applies to nearly any rewrite-it-in-Rust project: Rust is really, really good at emitting noalias to the IR layer, whereas C is only mediocre at it (unless you liberally sprinkle the restrict keyword all over your codebase, and I suspect that a lot of C programmers have not bothered to do that). This is the same reason that FORTRAN has a reputation for outperforming C.

6x is a pretty large improvement, so I expect that they also made some substantive changes to how it works, which is probably easier in Rust-with-dependencies than it would be in C-without-dependencies. For example, Rayon provides a parallelized sorting implementation for data that fits in memory[1], whereas that would probably take tens or even hundreds of lines of C to write from scratch. Sorting a full file probably also benefits from Rayon's high-level API (par_iter() is much easier to use than pthread_create()). I don't know if they actually used Rayon, or for that matter if the GNU people did parallel sorting in their sort(1), but it's just one example of the kind of thing that is much easier in Rust-with-dependencies than in C-without-dependencies.

[1]: https://docs.rs/rayon/latest/rayon/slice/trait.ParallelSl...

Reasons for speedup?

Posted Feb 12, 2025 20:53 UTC (Wed) by roc (subscriber, #30627) [Link]

Sprinkling 'restrict' through in your C code is not even an option; it would be a disaster. Sooner or later you're going to accidentally violate the restrict constraint, the code will start misbehaving in weird ways (but only with certain optimization levels and compilers), and since no sanitizers address this feature, you'll have a nightmarish time trying to figure out the bug.

For this reason and others, the performance ceiling for single-threaded Rust code is potentially quite a bit higher than for similar C or C++ code. A lot more compiler work is needed though.

Reasons for speedup?

Posted Feb 25, 2025 20:10 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> But there's one straightforward answer that applies to nearly any rewrite-it-in-Rust project: Rust is really, really good at emitting noalias to the IR layer, whereas C is only mediocre at it

Case in point: https://trifectatech.org/blog/zlib-rs-is-faster-than-c/ - zlib port into Rust is outperforming C.

Reasons for speedup?

Posted Feb 12, 2025 19:49 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

In my crude testing, uutils is the same speed as GNU coreutils with LANG=C, but coreutils with LANG=C.UTF-8 (the default in my environment) is several times slower.

With LANG=C.UTF-8, coreutils spends most of its time in strcoll_l, and it sorts by what I presume is some Unicode collation algorithm.

As far as I can see, uutils has no locale support. It aborts if the input is not valid UTF-8 ("sort: invalid utf-8 sequence of 1 bytes from index 0"). It simply sorts by byte values (equivalent to sorting by codepoint), regardless of LANG.

So in this case it's only faster because it doesn't implement Unicode collation.

Reasons for speedup?

Posted Feb 12, 2025 20:08 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

(I should probably add, in practice I never want `sort` to do Unicode collation. On systems which default to e.g. LANG=en_US.UTF-8 I find it actively annoying: it ignores capitalisation and leading whitespace, and I definitely don't want that, so I have to remember to add LANG=C to every invocation. If I wanted to do proper Unicode-aware text processing, I'd do it in a real programming language where I can configure it fully, I wouldn't do it in locale-dependent shell scripts; so I don't mind that uutils is missing that feature. On the other hand, I do sometimes want to run `sort | uniq -c | sort -n` over non-UTF-8 files (e.g. looking for common patterns in binary data), so I do mind that that isn't supported.)

Reasons for speedup?

Posted Feb 18, 2025 3:10 UTC (Tue) by ehiggs (subscriber, #90713) [Link]

> (I should probably add, in practice I never want `sort` to do Unicode collation. On systems which default to e.g. LANG=en_US.UTF-8 I find it actively annoying

Regardless it should do Unicode canonicalization or it will miss sort depending on how different runes are composed. This is fine for diacritic free languages like English but as soon as you get some diacritics then LANG=C's naïve handling of text breaks in my experience.

Great fit for Rust

Posted Feb 12, 2025 19:32 UTC (Wed) by jmalcolm (subscriber, #8876) [Link] (56 responses)

It is great to see these kinds of low-level utils being written in Rust.

The author states that GNU utils already have sufficient security but also points out that they continue to evolve. The fact remains that it is so easy to accidentally write insecure code in C is still an issue. Rust is not perfect but tends to result in more secure code by default. I am sure we have all seen the "70% of vulnerabilities stem from memory allocation" stories from Microsoft, Google, and others.

Writing code in C that takes advantage of multiple cores is also tricky and makes the security problem worse. Rust again makes this a lot easier and offers greater assurance that code that builds is also correct. For this reason alone, I would expect performance to be quite a bit better for many kinds of tools.

Pulling from so many crates certainly creates packaging challenges but it also distributes the innovation. Tools in Rust are going to benefit from improvements in the overall ecosystem. Tools in C tend to be more fully stand-alone and only improve when you improve them.

Cross-platform consistency also appears to be a strength of the Rust ecosystem. The article mentions that it is easier to implement cross-platform behaviour (even complex behaivour). It has been my experience that the behaviour itself is more consistent as well.

My biggest concern is that these kinds of low-level tools in Rust complicate the bootstrap problem. In order to build these tools from source, I need to have built Rust first which also means building LLVM. I also need to have built any of the crates that they depend on and of course other parts of the toolchain like Cargo. And of course Rust and LLVM are not even available on some of the systems that you might want utilities like this to be available for. GCC certainly wins there. That said, if GCC gets its own Rust front-end, this problem will go away. Perhaps this addresses most of my point here actually as, if you can build GCC, I guess you can build the Rust front-end too.

Not that the bootstrap situation is perfect in C GNU land. The author of Chimera Linux explains his choice of using the BSD userland in his distro instead of GNU largely because of the chicken-and-egg nature of things like the GNU coreutils and util-linux.

One thing that was not touched on is executable size. I wonder how the Rust and C versions compare.

Great fit for Rust

Posted Feb 13, 2025 8:22 UTC (Thu) by joib (subscriber, #8541) [Link] (55 responses)

> One thing that was not touched on is executable size. I wonder how the Rust and C versions compare.

There is some overhead due to the static linking Rust does (for Rust dependencies), but OTOH uutils uses (by default, or only optionally, not sure?) the "busybox trick" of generating only one binary, and all the /usr/bin/foo utilities are hardlinks to that binary, and the utility checks argv[0] to determine which functionality to invoke.

Great fit for Rust

Posted Feb 21, 2025 9:40 UTC (Fri) by ras (subscriber, #33059) [Link] (54 responses)

> There is some overhead due to the static linking Rust does (for Rust dependencies), but OTOH uutils uses (by default, or only optionally, not sure?) the "busybox trick" of generating only one binary,

On my debian laptop, which doesn't have a lot of things installed, there are over 2,200 programs dynamically linked to libc.

Rust's monomorphization has lots of benefits, but it would be nice if you could draw a line and say "not here, at this boundary conventional linker is all you need". It wouldn't just make dynamic linking easier, it would speed up compile times. It's not as if the language doesn't support it in principle with things dyn, but in practice the language positively begs you to cross the boundary at every opportunity.

Improved dynamic linking ABIs

Posted Feb 21, 2025 10:45 UTC (Fri) by farnz (subscriber, #17727) [Link] (53 responses)

You can say "not here, at this boundary conventional linker is all you need" already, by using the psABI (repr(C) and extern "C") instead of the unstable Rust ABI.

The issue right now is that the psABIs aren't that great for Rust and Rust-like languages; the crABI experiment is aiming to come up with a stable ABI that is suitable for expressing the things that you lose by saying "not here", such as Vec<u32>, since Vec is monomorphized over its type argument.

Improved dynamic linking ABIs

Posted Feb 21, 2025 11:33 UTC (Fri) by ras (subscriber, #33059) [Link] (52 responses)

> Vec<u32>, since Vec is monomorphized over its type argument.

That's a good illustration. You want the benefits of the type checking that monomorphization provides, but without it effecting the code generated. You can write that sort of code in Rust, but as Vec<u32> demonstrates it would be near impossible to write it without the compiler telling you "hey, you can't do that here" because type checking is usually bundled with implied monomorphization.

Improved dynamic linking ABIs

Posted Feb 21, 2025 12:11 UTC (Fri) by farnz (subscriber, #17727) [Link] (51 responses)

And the issue is that, under the current Rust ABI, the layout of Vec<T> is not specified until after monomorphization - you literally don't know what order the capacity, length, and data fields are in, since without knowing what T is, you don't know whether the data field is one or two pointers in size - and Rust is allowed to split up the two halves of a "fat pointer" (one that's two pointers in size), and to elide any parts of the Vec that aren't used (e.g. eliding length because it's always the same as capacity).

crABI's role is to specify layouts such that knowing that Vec is two usize and either a thin or fat pointer in size depending on T means that you know how Vec<T> is laid out in memory for any known T, at the expense of preventing the compiler from applying optimizations that come from different layout.

For Vec, crABI is unlikely to give up much performance; it's just not big enough that completely removing 2 or 3 fields would make a difference. But it's possible that for other data types, this optimization will matter; hence crABI will be opt-in for quite a long time (although I can see it becoming the default for all items exported from a crate, eventually).

Improved dynamic linking ABIs

Posted Feb 21, 2025 22:43 UTC (Fri) by ras (subscriber, #33059) [Link] (50 responses)

> And the issue is that, under the current Rust ABI, the layout of Vec<T> is not specified until after monomorphization - you literally don't know what order the capacity, length, and data fields are in, since without knowing what T is,

I'm not an expert, but I'm guessing there are 3 things Vec has to know. Yes the size of an element and the capacity are two. The current length doesn't count because it's not encoded in the type. Neither of those two are particularly problematic.

The 3rd is one you didn't mention: drop. That is more difficult. C++ would use a vtable, which alters the binary format of the element. The alternative solution is embed a pointer to the items drop in the Vec along with the capacity and element length. I don't know Rust well enough to know if it can do that. Well, I'm sure it can with some helper macros but's that an admission the language doesn't support it well.

> For Vec, crABI is unlikely to give up much performance

Actually, the big speed gains from monomorphization come from small types. Consider Vec.get() for Vec<u32>. If get() doesn't know the length it's probably going to be implemented using memcpy(), but memcpy() is orders of magnitude slower than "mov %rax,nnn(%rbp,%rdx)". But if the u32 was a much bigger type the overhead imposed by implementing get() as a function that uses memcpy() rather than inline becomes tolerable.

That hints to the solution you see in C and C++: you need a mixture of monomorphization and shared libraries. C and C++ can do that with inline. For that to work you need separate .h, .c and .o (compiler output). Or perhaps the required information gets folded into the .o. In any case, the compiler gets two sorts of information from the library: the compiled code which can potentially be put in a dynamic library, and the parts the library author thinks need to be monomorphized like for speed, like Vec.get().

Rust has effectively dropped that separation. No one uses binary libraries because it's too hard . The result is everything is monomorphized. But is reality, in a large library the bits that need to be monomorphized for speed are a very small percentage of the code. Given most Rust code a typical Rust program uses comes from these large libraries (interestingly the kernel will be a notable exception), this blows up the complied time of most small programs enormously. It also thawrts the current Linux distributions practice of fixing security holes in libraries by just shipping a new version of the .so. Now they have to ship every program that depends on it. As I said, on my laptop that means every time libc.so has a bug, I'd have to reinstall over 2,000 programs.

I consider this a major wart in the language. But as I said, it likely won't effect the kernel.

Improved dynamic linking ABIs

Posted Feb 22, 2025 10:16 UTC (Sat) by farnz (subscriber, #17727) [Link] (49 responses)

Drop is fairly trivial to handle, since it's a function call to Vec's code for this particular instance of Vec<T>, and if you have the ability to handle function calls on Vec at all (including push, insert, get, indexing and pop, which also get affected by the type of T), you can handle drop that way, too. You fundamentally can't implement anything useful if your handling of something like Vec doesn't have different function behaviour for many functions, because of this.

But note that the current length, size of a pointer to an element (since Vec does not store elements directly, but rather a pointer to a contiguous list of elements), and capacity are all part of Vec itself, and right now in Rust, the memory layout of the 3 or 4 items that make up a Vec (capacity, length, pointer to contiguous array of T, optional pointer to additional data for T like T's vtable) is arbitrary for each instantiation of Vec in each compilation unit.

You appear to be aiming for something slightly different to the goals of crABI and other stable Rust ABI efforts; you're talking about removing monomorphization completely, so that the compile unit for Vec can output code for all of Vec's functions, and rely on runtime polymorphic behaviour to get the correct implementation, because that's what C++ does. Instead, though, the Rust stable ABI efforts (including crABI) try to come up with ways to guarantee that the visible behaviour of two separate instantiations of Vec::<T>::push have the same visible ABI, and thus it becomes possible to rely on just choosing one instantiation of each. Then, for functions that are externally polymorphic, you'd use the trick used by File::open today, to give yourself a tiny generic shim (correct by inspection) that calls into a much larger chunk of monomorphic code.

Improved dynamic linking ABIs

Posted Feb 22, 2025 18:30 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> Then, for functions that are externally polymorphic, you'd use the trick used by File::open today, to give yourself a tiny generic shim (correct by inspection) that calls into a much larger chunk of monomorphic code.

In the C++ world, this is called PIMPL (pointer to implementation).

Improved dynamic linking ABIs

Posted Feb 23, 2025 1:47 UTC (Sun) by ras (subscriber, #33059) [Link] (47 responses)

> Drop is fairly trivial to handle, since it's a function call to Vec's code for this particular instance of Vec<T>

As understand it, Vec.drop() has to call the inner types drop(). There is only two ways it can know what to call. If you don't know at compile time the only option is put a function pointer to the drop() function somewhere. If all you have is traditional .so you're linking to, this is your only option. OO puts the pointer in the inner type. C++ does this via a vtable stored in each instance of the inner type. Alternatively you could store a point to T.drop() it in each Vec<T> instance, which you can do because the code that instantiated the Vec<T> object.

If you know the type of T compile time, monomorphization can create a special version of Drop<T> for Vec<T>. You need the source of Vec<T>.drop() to do that, but there are many ways of skinning that cat.

> you're talking about removing monomorphization completely,

No, I'm not. I'm saying there are various ways of doing it. Rust does it by recompiling everything, all the time, and as a consequence doesn't support pre-compiled libraries. That's an extreme. Both C and C++ allow programmer complete control over what is monomorphized, and what isn't.

Backtracking for a second, the requirement to avoid a function pointer is the compiler have access to the source for Vec<T>.drop(). That's all you need. Rust achieves that by giving the compiler access to the entire source for Vec and recompiles it every time for every Vec<T>, but as I said that's extreme. C and C++ achieve it by distributing two things: the compiled output (.a or .so), and the .h. The .h contains the source code for Vec<T>.drop() using inline, or templates or whatever. This is not "removing monomorphization completely". This is giving control of what is monomorphized and what is not to the programmer.

Rust effectively does not give you that choice, it monomorphizes all the time. That is the root cause it's compile time issues, and it's inability to support dynamic linking.

Improved dynamic linking ABIs

Posted Feb 23, 2025 12:55 UTC (Sun) by farnz (subscriber, #17727) [Link] (46 responses)

Rust allows, as far as I can tell, exactly the same level of control over what's monomorphized and what's not as C++ does. I can find no difference in how Clang treats std::vector to how Rust treats std::vec::Vec, complete with similar behaviour around how std::vector::~vector is treated as compared to std::vec::Vec::drop.

And once you start using C++20 modules, the distinction you're making between compiler output and types of compiler input goes away, too - the compiler does basically the same thing as Rust's compiler does.

Unlike C++, C does completely avoid this, by having no language-level support for generics. But if you use Rust without generics, you also don't get any monomorphization - monomorphization in Rust only takes place when you (a) have a type parameter, and (b) the code's behaviour changes when the type parameter's value changes. This is the same as C++, as far as I can see, bar the fact that C++ makes you do more manual work to use monomorphization correctly, so people are more likely to hand-write a buggy reimplementation of a generic than to share source.

Improved dynamic linking ABIs

Posted Feb 24, 2025 8:06 UTC (Mon) by ras (subscriber, #33059) [Link] (45 responses)

> But if you use Rust without generics, you also don't get any monomorphization - monomorphization in Rust only takes place when you (a) have a type parameter, and (b) the code's behaviour changes when the type parameter's value changes. This is the same as C++, as far as I can see

It is the same. What is not the same is C++ (and C) splits the source into .h and .cpp, when you compile against a cpp library you recompile the .h and link with the pre-compiled .cpp files. It's so simple everyone does it that way, and it yields the two benefits I mentioned - fast compile times, and the ability to fix a library by shipping a single shared library rather than all the binaries that depend on it.

In Rust, if doing that is possible it must be very hard, because one one does it that way.

Improved dynamic linking ABIs

Posted Feb 24, 2025 10:35 UTC (Mon) by farnz (subscriber, #17727) [Link] (42 responses)

C++ modules do not split the source like that, and there are many C++ projects (like Boost) which end up putting all the source in the .h file because it's the only way to guarantee that all use cases compile correctly. As a result, a lot of libraries in C++ land are in the same position as Rust, because the important parts of the code are in the .h file, and the precompiled object is tiny. Any bugfix requires a full rebuild of all dependents, as well as a new shared library file.

C is different precisely because it doesn't have compile-time generics, but instead requires cut-and-paste programming, with associated bugs where you notice an issue with (e.g.) StringVector, but don't fix it for your DoubleVector as well. If you do make use of the macro system to get generics, then you have the same problem.

And C++ build times aren't fast compared to Rust; that's not a benefit of the C++ model in my experience, where it's often slower than Rust at development builds (since Rust does a good job of incremental compilation, splitting the files into codegen units in ways a human wouldn't), and no faster at release builds (where LLVM's optimizer dominates in both cases).

Improved dynamic linking ABIs

Posted Feb 24, 2025 13:05 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (41 responses)

> C++ modules do not split the source like that

While they don't *require* that, any PIMPL-like mechanisms or desire to hide what modules are used to implement some functionality will still end up with splitting sources between interface and implementation. Because module interfaces need to be provided to consumers, any desire to keep things from them will require using separate implementation units. Compilers may be able to *help* with this by only updating BMI files as needed, but this will require behavior like ninja's `restat = 1` which (AFAIK) `make` completely lacks to not recompile consumers anyways.

Splitting implementation and interface in C++

Posted Feb 24, 2025 13:29 UTC (Mon) by farnz (subscriber, #17727) [Link] (40 responses)

I suspect that, in practice, people splitting sources and implementations will be rare in open source C++ modules; you have no reason to hide the implementation from consumers of the interface, since it's all open source, but it is extra work to separate them cleanly. You see similar in Rust, where things like Windows use Rust behind an ABI barrier (and only export the interface), but open source Rust projects tend not to bother.

Splitting implementation and interface in C++

Posted Feb 24, 2025 14:26 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Note that all modules used (transitively) need BMI-built in all consuming projects (e.g., if you use Boost and it uses, say, a modularized libarchive, your project needs BMIs for libarchive when importing the relevant Boost modules; consuming BMIs from an install tree is largely a dead end path due to how they work in practice), so it may be beneficial to completely hide module usage not otherwise needed in the interface.

Note that Windows is still only exposing a C ABI, so the fact that it is Rust, C, C++, or Fortran behind the scenes isn't really visible to consumers.

Splitting implementation and interface in C++

Posted Feb 24, 2025 17:03 UTC (Mon) by viro (subscriber, #7872) [Link] (38 responses)

Seriously? The only reason to split interface and implementation you is license considerations? Not, say it, keeping things feasible to reason about? The amount of code that needs to be reviewed for assessing validity of a change, perhaps? It does affect the complexity of analysis, and it's not even close to linear by size...

Splitting implementation and interface in C++

Posted Feb 24, 2025 17:06 UTC (Mon) by viro (subscriber, #7872) [Link]

grrr... s/implementation you/implementation for you/, sorry

Splitting implementation and interface in C++

Posted Feb 24, 2025 17:22 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

Yes. The extra work of splitting something into an interface module and an implementation module is significant, and does not reduce the complexity of reasoning about a change, nor the amount of code that has to be reviewed for assessing validity of a change.

This is distinct from splitting the implementation up inside a single C++ module; having multiple module units, one for the exported interface and many for the internal implementation, makes a lot of sense, but having two separate C++ modules, one of which exports an unimplemented interface, and the other of which exports an implementation of that interface, is a mess, since it means I have to make sure that the two separate modules are kept in sync manually.

Why create that extra workload when I can have a single module with an internal module partition such that it's very obvious when I change the interface without changing the implementation to match, and where I have one thing to release instead of two?

Splitting implementation and interface in C++

Posted Feb 25, 2025 0:00 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

> Why create that extra workload when I can have a single module with an internal module partition such that it's very obvious when I change the interface without changing the implementation to match, and where I have one thing to release instead of two?

If the partition is imported into the interface for any reason, it must be shipped as well.

Splitting implementation and interface in C++

Posted Feb 25, 2025 10:01 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

Sure, but I'm shipping full source anyway, and I can check for people exporting parts of the internal partition in CI, just as I'd have to have similar checks in place to stop people moving code from the interface module to the implementation module.

Remember that the goal here is one module, nicely structured for ease of maintenance, and thus split across multiple module units, with an internal module partition to make the stuff that's for internal use only invisible from outside the module, rather than multiple modules.

Splitting implementation and interface in C++

Posted Feb 25, 2025 11:57 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

FWIW, CMake does this by enforcing that `PRIVATE TYPE CXX_MODULES` files are never imported from `PUBLIC TYPE CXX_MODULES` sources, so at least it can help enforce the "don't expose private bits in public interface units" part.

Splitting implementation and interface in C++

Posted Feb 24, 2025 17:24 UTC (Mon) by Wol (subscriber, #4433) [Link]

> Seriously? The only reason to split interface and implementation you is license considerations?

That's not what he said. He said "extra work". Which in reality usually means "push the work down the road until I get to it". Which also often in reality means "I'll never get to it".

It wouldn't get done in commercial circles either, if secrecy didn't have a (at least nominal) value.

Time pressure usually turns out to be an extremely important consideration.

Cheers,
Wol

Splitting implementation and interface in C++

Posted Feb 24, 2025 17:32 UTC (Mon) by mb (subscriber, #50428) [Link] (27 responses)

>The amount of code that needs to be reviewed for assessing validity of a change, perhaps?

All Rust installs ship a tool that extracts the public interface of your crate and puts it into a nice html document for review:
cargo doc

This is much better than manually typing in the redundant code for the public interface declarations.
It's easy to navigate and includes all details of your public interfaces.

Splitting implementation and interface in C++

Posted Feb 25, 2025 1:27 UTC (Tue) by ras (subscriber, #33059) [Link] (26 responses)

> All Rust installs ship a tool that extracts the public interface of your crate and puts it into a nice html document for review:
cargo doc

I had said earlier I wanted the ability to say to the compiler "not here, at this boundary conventional linker is all you need". I also said C++ gives you the ability to do that, by splitting stuff into .h and .cpp. @franz said "but that's the old way, boost for example doesn't do that". That's true, but the point is the people who wrote boost made the decision to adopt the way Rust does it. C++'s std makes a different decision. @taladar said C++ compiles are slow. I'm guessing that's because the packages / libraries he is working adopt this newfangled way, and everything gets recompiled all the time. He's blaming C++ for that, but I'd argue that fault lies at least as much with the package authors for making that choice.

@franz then said "oh but it's hard to think about what has to be monomorphized and what isn't, and besides redeclaring everything in .h is verbose and a lot of work". I don't have much sympathy for the first part - I did it all the time when I wrote C++. The second is true, the information in .h is redundant. A modern language shouldn't make you type the same thing twice without good reason.

Those language differences were swirling around in my head when I wrote: "Or perhaps the required information gets folded into the .o". It was a thought bubble. But it's key point illustrated the idea nicely with your "cargo doc" comment. Rust could add something to the language that says "this source is to be exported (made available) to people who want to link against my pre-compiled library", in the same way "cargo doc" exports stuff. That information would be roughly equivalent to what's put in a .h file now. But where would you put it? The thought bubble was place it a section of the .elf object that holds the compiled code. Call it say a ".h" section. Then when someone wants to compile against your library, they give that .o / .so / .a to both the compile phase (which looks for the equivalent of the .h sections) and the link phase (which just wants the compiled code for the non-monomorphized stuff, which - if the programmer has done their job - should be the bulk of it).

The ultimate goal is to allow the programmer to decide what needs to be monomorphized, and what can be pre-compiled. And to have Rust tell the programmer when they've mucked that boundary up. I guess it would get an error message like: "This type / function / macro has to be exported to the .h, because it depends on type T the caller is passing in". Right now Rust programmers don't have that option, and that leads to the trade-offs I mentioned.

Splitting implementation and interface in C++

Posted Feb 25, 2025 3:50 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (24 responses)

One big problem with this approach is versioning. It's less of a problem for internal libraries in monolithic projects like systemd or uutils, but it will become a problem if the package is exposed via system-level package managers. And if you're in a monolithic project, then there's no need to embed the pre-instantiated type information into .so files, you can just store it in an ".h" file.

Splitting implementation and interface in C++

Posted Feb 25, 2025 8:06 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

Hmm ... pile of musings here ...

But if you had stuff that was specifically meant to be a library, why can't you declare "I want to monomorphise these Vec<T>s". Any others are assumed to be internal and might generate a warning to that effect, but they're not exported.

And then you add rules about how the external interface is laid out, so any Rust compiler is guaranteed to create the same export interface. Again, if the programmer wants to lay it out differently, easy enough, they can declare an over-ride.

And then lastly, the .o or whatever the Rust equivalent is, contains these declarations to enable a compiler of another program to pull them in and create the correct linkings.

Okay, it's more work having to declare your interface, but I guess you could pull the same soname tricks as C - extending your interface and exports is okay, but changing it triggers a soname bump.

Cheers,
Wol

Splitting implementation and interface in C++

Posted Feb 25, 2025 10:43 UTC (Tue) by farnz (subscriber, #17727) [Link]

When you provide T, it's monomorphized for you. The hard case is when I want to provide Foo<T>, and allow you to provide an arbitrary T that I haven't thought about up-front.

There are, currently, two reasons for Rust to not have a stable ABI, both being worked on by experts in the field (often overlapping with people solving this problem for Swift and for C++ modules:

Rust explicitly allows the memory layout of most data structures to vary between compilations, which lets it do things like completely elide fields that never change at runtime, along with all the code to modify them. This is obviously not compatible with a stable ABI for that structure.
The generics problem. This is a generally hard problem, and no-one has a great solution; there are tricks that reduce the scale of the problem (which are also needed for static linking, because of the compile time and binary size issues), and Rust has had a working group looking into how many of the tricks can be machine-applied to naïve code, as opposed to requiring a human to split the code into generic and monomorphic parts.

There is, however, serious work going into a #[export] style of ABI marker that allows you to mark the bits (or an entire crate) as intended to have a stable ABI, and errors if the compiler can't support that. This will, inevitably, be a restricted subset of the full capabilities of Rust (since macros, generics, and other forms of compile-time code creation can't be supported in an exported ABI), but it's being actively thought about as a research project with a goal of allowing as much code as possible to be dynamically linked while not sacrificing any of the safety promises that Rust makes today using static linking.

Splitting implementation and interface in C++

Posted Feb 25, 2025 21:29 UTC (Tue) by ras (subscriber, #33059) [Link] (21 responses)

> One big problem with this approach is versioning.

I don't get the problem. libc.so.X.Y already handles versioning pretty well.

Putting the .h's in the .elf does solve one problem that bites me on occasion - the .h's don't match the .so I'm linking against. It would be nice to see that nit disappear.

Splitting implementation and interface in C++

Posted Feb 26, 2025 9:19 UTC (Wed) by taladar (subscriber, #68407) [Link] (13 responses)

That might help if you have the updated .so at compile time but is completely useless if you switch to a new .so for an already compiled application or other library using the updated one.

Splitting implementation and interface in C++

Posted Feb 26, 2025 11:36 UTC (Wed) by ras (subscriber, #33059) [Link] (12 responses)

> but is completely useless if you switch to a new .so for an already compiled application

I expect it would be the same story as C or C++. It has the same traps - don't expect an inline function (or template in C++'s case) in a .h to be effected by distributing a new .so. Despite that limitation shipping updated .so's to fix security problems happens all the time. The rule is always new stuff can be added, but existing stuff can't be changed. It would be the same deal with Rust, but would cover the ".h" section too, meaning you can add new exported types of monomorphized functions, but not changed existing ones.

Putting the .h section in the .so brings one advantage. There is no way for a C program to know if the .h it is compiled against matches the one the .so was compiled against. But a Rust program compiled against a .so could check the types in the .h section match the ones it was compiled with, and reject it if they aren't.

Splitting implementation and interface in C++

Posted Feb 26, 2025 12:26 UTC (Wed) by farnz (subscriber, #17727) [Link] (11 responses)

The problem is that shipping an updated .so in the way C or C++ do it runs the risk of invoking UB from the "safe" subset of Rust, and one of Rust's promises is that invoking UB requires you to use "unsafe Rust". Thus, just copying the C way of doing things isn't acceptable, because it can take a safe program and cause it to invoke UB.

For example, if you go deep into how Vec::shrink_to_fit is implemented internally, you find that you have a set of tiny inline functions that guarantee that an operation is safe that leads down to a monomorphic unsafe shrink_unchecked function that actually does the shrinking.

Because these are all shipped together, it's OK to rearrange where the various checks live; it would be acceptable to move a check out of shrink_unchecked into its callers, for example. But, in the example you describe, you've separated the callers (which are inlined into your binary) from the main body of code (in the shared object), and now we have a problem with updating the shared object; if you move a check from the main body into the callers, you now must know somehow that the callers are out-of-date and need recompiling before you can update the shared object safely.

C and C++ implementations handle this by saying that you must just know that your change (and a security fix is a change to existing stuff, breaking your rule that "new stuff can be added, but existing stuff can't be changed") is one that needs a recompile of dependents, and it's on you to get this right else you face UB for your mistakes. Rust is trying to build a world where you only face UB if you explicitly indicate to the compiler that you know that UB's a risk here, not one where a "trivial" cp new/libfoo.so.1 /usr/lib/libfoo.so.1 can create UB.

Splitting implementation and interface in C++

Posted Feb 26, 2025 13:13 UTC (Wed) by Wol (subscriber, #4433) [Link] (10 responses)

> Because these are all shipped together, it's OK to rearrange where the various checks live;

But if you've explicitly declared an interface, surely that means rearranging the checks across the interface is unsafe in and of itself, so the compiler won't do it ...

Cheers,
Wol

Rearranging across the interface

Posted Feb 26, 2025 14:15 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

That's why I chose that particular example; the explicitly declared interface is:


#[inline]
impl<T, A: Allocator> Vec<T, A> {
    pub fn shrink_to_fit(&mut self);
}

The compiler could stop you changing shrink_to_fit quite easily, because it's an external interface, but it uses a RawVec<T, A> as an implementation detail, which uses a heavily unsafe RawVecInner<A> as a monomorphic implementation detail. The current implementation of Vec::shrink_to_fit checks to see if the length of greater than the capacity, and if it is, calls the inline function RawVec::shrink_to_fit(self.buf, length). In turn, RawVec::shrink_to_fit simply calls the inline function RawVecInner::shrink_to_fit(self.inner, cap, T::LAYOUT) (which is a manual monomorphization so that RawVecInner is only generic over the allocator chosen, not the type in the vector). Following that, RawVecInner::shrink_to_fit arranges to panic if it can't shrink, and calls the inline function RawVecInner::shrink(&mut self, cap, layout). This then panics if you're trying to grow via a call to shrink, then calls the unsafe function RawVecInner::shrink_unchecked.

There's a lot of layers of inline function here, each doing one thing well and calling the next layer. But it would not be unreasonable to change things so that RawVecInner::shrink_unchecked does the capacity check that's currently in RawVecInner::shrink, and then have a later release move the capacity check back to RawVecInner::shrink; the reason they're split the way they are today is that LLVM's optimizer is capable of collapsing all of the checks in the inline functions into a single check-and-branch, but not of optimizing RawVecInner::shrink_unchecked on the assumption that the check will pass, and doing all of this means that LLVM correctly optimizes all the inline functions down to a single check-and-branch-to-cold-path, followed by the happy path code if all checks pass.

And note that the reason that this is split into so many tiny inline functions is that there's other callers in Vec that call different sequences of inline functions - rather than duplicate checks, they've been split into other functions so that you can call at the "right" point after your function-specific checks.

But, going back to the "compiler shouldn't do it"; why should it know that moving a check in one direction inside RawVecInner (which is an implementation detail) is not OK, but moving it in the other direction is OK? For this particular call chain, only RawVecInner::shrink_unchecked is going to be in the shared object, because the remaining layers (which are critical to the safety of this specific operation) are inlined.

Rearranging across the interface

Posted Feb 26, 2025 15:45 UTC (Wed) by Wol (subscriber, #4433) [Link] (8 responses)

So what you're saying is, if you the programmer move the checks across the interface boundary, the compiler has no way of knowing you've done it?

Hmmm ...

That is an edge case, but equally, you do want the compiler to catch it, and I can see why it wouldn't ... but if you're building a library I find it hard to see why you the programmer would want to do it - surely you'd either have both sides of the interface in a single crate, or you're explicitly moving stuff between a library and an application ... not good ...

Cheers,
Wol

Rearranging across the interface

Posted Feb 26, 2025 16:32 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked?

Put concretely, in the private module raw_vec.rs (none of which is exposed as an interface boundary), I move a check from shrink_unchecked to shrink; how is the compiler supposed to know that this is not a safe movement to make, given that shrink is the only caller of shrink_unchecked? Further, how it is supposed to know that moving a check from shrink to shrink_unchecked is safe? And, just to make it lovely and hard, how is it supposed to distinguish "this check is safe to move freely" from "this check must not move"?

And note that "checks" and "security fixes" look exactly the same to the compiler; some code has changed. How is the compiler supposed to distinguish a "good" change from a "bad" change?

Rearranging across the interface

Posted Feb 26, 2025 21:43 UTC (Wed) by Wol (subscriber, #4433) [Link] (3 responses)

> No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked?

Because if the whole aim of this is to create a dynamic library, the compiler NEEDS to know this is an interface boundary, no?

Cheers,
Wol

Rearranging across the interface

Posted Feb 27, 2025 10:44 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

But then you're getting into a mess around defining what is, and is not, a safe code change inside the ABI boundary. If you do make the internals of a crate (not the exported interface) the ABI boundary, you're now in a position where the compiler has to make a judgement call - "is this change inside the internals of a library a bad change, or a good change?".

Note that when making this judgement call, it can't just look at things like "is this moving a check across an internal boundary", since some moves across an internal boundary are safe, nor can you condition it on removing a check from inside the boundary (since I may remove an internal check that is guaranteed to be true since all the inline functions that can call this have always done an equivalent check, and I'm no longer expecting more inline functions without the check).

Rearranging across the interface

Posted Feb 27, 2025 14:07 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

but if the crate IS the compilation object (as it would be if it's a library, no?) then surely the external boundary is the external declaration - what the library provides to all and sundry - then there's no problem with any internal moves?

If an external application cannot see the boundary, then it's not a boundary! So you'd need to include the definition of all the Ts in Vec<T> you wanted to export, but the idea is that the crate presents a frozen interface to the outside world, and what goes on inside the crate is none of the caller's business. So internal boundaries aren't boundaries.

Cheers,
Wol

Rearranging across the interface

Posted Feb 27, 2025 14:12 UTC (Thu) by farnz (subscriber, #17727) [Link]

No, for performance reasons. We inline parts of our libraries (even in C, where the inlined parts go in the .h file) into their callers because the result of doing so is a massive performance boost from the optimizer - which can do things like reason "hey, len can't be zero here, so I can eliminate the code that handles the empty list case completely".

To get the sort of boundary you're describing, we do static linking and carefully hand-crafted interfaces for plugins. That's the state of play today, for everything from assembly through C to Agda and Idrs; the goal, however, is to dynamically link, which means that we need to go deeper. And then we have a problem, because the moment you go deeper, your boundaries stop applying, thanks to inlining.

Rearranging across the interface

Posted Feb 26, 2025 18:04 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

As I understand it, the issue is that "interface boundary" can mean either "API boundary" or "ABI boundary". `Vec<T, A>::shrink_to_fit` is an API boundary; it's publicly documented and safe and has stability guarantees. But in a hypothetical Rust ABI, that API couldn't be an ABI boundary, because it depends on generic type parameters that aren't known when the .so is compiled.

`RawVecInner<A>::shrink_to_fit` could be an ABI boundary, because that doesn't depend on `T` (and we'll ignore `A`), but it's currently not an API boundary. It can't be made into a public API because its safety depends on non-trivial preconditions (like being told the correct alignment of `T`) and that'd be terrible API design - preconditions should be as tightly scoped as possible, within a function or module or crate. So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.

Rearranging across the interface

Posted Feb 26, 2025 22:46 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

> So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.

Like putting the equivalent of a C .h in the crate?

But I would have thought if the compiler can prove the preconditions as part of a monolithic compilation, surely it must be able to encode them in some sort of .h interface in a library crate?

Of course, if you get two libraries calling each other, then the compiler might have to inject glue code to rearrange the structures passed bwtween the two :-)

Cheers,
Wol

Rearranging across the interface

Posted Feb 27, 2025 12:43 UTC (Thu) by farnz (subscriber, #17727) [Link]

The compiler doesn't prove anything for the danger cases; it relies on the human assertion that they've checked that this unsafe block is safe, given the code that they can see today.

The challenge is that we're talking about separating the unsafe block (in an inline function) from the unsafe fn it calls (in the shared object); this means that the human not only has to consider the unsafe code as it stands today, but all possible future and past variants on the unsafe code, otherwise Rust's safety promise is not upheld.

That's clearly an intractable problem; the question is about reducing it down to a tractable problem. There's three basic routes to make it tractable:

Ignore the problem, and rely on the humans being infallible and remembering to change a "version identifier" when making a change to both the unsafe fn and its inlined callers. This makes it far too likely that you'll breach Rust's safety promises for Rust to adopt this.
Ensure that there's an ABI change whenever anything in the body of the unsafe fn changes. This is problematic, because security fixes are likely to change the body, and those are the cases where you most want to be able to swap in a new shared object.
Build a reverse tree of inline functions in this library that call the unsafe fn, and ensure that if any of them changes, the unsafe fn's ABI changes. This means that you can change the unsafe fn freely as needed for a security fix, but if you change any inlined caller, or add a new inlined caller, you change the ABI of the unsafe fn and require a new shared object, even if this change was harmless.

There's room to be sophisticated with symbol versioning in all cases; for example, you can have a human assert that this version of the unsafe fn is compatible with the inlined callers from older versions (thus allowing a swap of a shared object), or in case 3 you can use it to allow new inlined callers to use a new shared object, while allowing the existing ones to use either old or new shared objects.

In all cases, though, the trouble is preventing the human proofs of correctness being invalidated by creating new combinations of inline functions and out-of-line unsafe code that weren't present in any source version; you want the combinations to be ones that a human has approved.

Versioning of shared objects

Posted Feb 26, 2025 10:53 UTC (Wed) by farnz (subscriber, #17727) [Link]

Note that libc handles versioning well because the glibc maintainers do a lot of hard and non-trivial work to have things like compatibility symbols so that you can swap to a later glibc without breakage. You can do a similar level of work in Rust today to get dynamic linking working, and working well.

What C has that Rust doesn't is that it's fairly trivial to take a C library, build it into a .so, and have it work as long as upstream doesn't make a silently breaking change (which can result in UB, rather than a failure); it's also fairly simple to patch the build system so that the .so is versioned downstream of the library authors, so that they are ignorant of the use as a shared library. This is being worked on for Rust, but the goal in Rust is to ensure that any breaking changes upstream result in a failure to dynamically link, rather than a risk of UB.

Splitting implementation and interface in C++

Posted Feb 26, 2025 21:14 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

There are several issues. Rust is very happy to use slightly different versions of libraries, if your dependencies are locked at a different time.

This is fine for static linking, but you don't generally want to end up with 15 versions of the same shared library with a slightly different patch version. So you ideally should be able to control the versions so that distro-provided libraries are used as much as possible, overriding Cargo's resolution mechanism. Ideally, making sure that you get CVE fixes.

Doing it properly is not trivial.

Splitting implementation and interface in C++

Posted Feb 26, 2025 21:19 UTC (Wed) by mb (subscriber, #50428) [Link] (4 responses)

>if your dependencies are locked at a different time.

Dependency locks are ignored.
Only the top level application crate lock matters.

>slightly different versions of libraries

No. It can include several *incompatible* versions of the libraries with a different major semantic version.

Splitting implementation and interface in C++

Posted Feb 26, 2025 21:25 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> Only the top level application crate lock matters.

Yes, that's what I mean. One app can lock somelibrary#1.1.123, and another one at somelibrary#1.1.124 If this is packaged naïvely, you'll end up with two shared objects for `somelibrary`.

Splitting implementation and interface in C++

Posted Feb 27, 2025 5:27 UTC (Thu) by mb (subscriber, #50428) [Link] (2 responses)

It's fully up to the builder/distributor what to do with application level locks.
(I almost never use them and I almost always provide them.)

As 1.1.124 and 1.1.123 are semantically compatible versions you can just upgrade all packages to 1.1.124. And it's also likely that it would work with 1.1.123, too.

This is really not different at all from C library dependencies with backward compatible versions.
Except that typical C applications simply don't provide a lock information, so you're fully on your own.
Providing lock information is better than providing no lock information.

Splitting implementation and interface in C++

Posted Feb 27, 2025 18:41 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I checked a few Rust applications, and most of them do have lockfiles. So something will need to be done with that. I dislike just ignoring them, but then perhaps some form of a limited override should be OK?

And yep, it's strictly better than C. It's just not a trivial task...

Splitting implementation and interface in C++

Posted Feb 27, 2025 19:11 UTC (Thu) by mb (subscriber, #50428) [Link]

The default for cargo install is to ignore the lock file:
https://doc.rust-lang.org/cargo/commands/cargo-install.ht...

There's no need to use a lock file or to use an online crates forge.
You can just use what is --offline available in your distribution.

I don't really get it why this would be a nontrivial task.

>I dislike just ignoring the

Well, you can either use it or ignore it.
If you don't want to use it, because you want to use your own packaged dependencies there's only the option to ignore it, right?

Splitting implementation and interface in C++

Posted Feb 25, 2025 6:48 UTC (Tue) by mb (subscriber, #50428) [Link]

> That information would be roughly equivalent to what's put in a .h file now. But where would you put it?

Put it into a my-public-interface crate and generate the docs for it? I don't see the problem.

Splitting implementation and interface in C++

Posted Feb 24, 2025 17:39 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

To express it slightly differently, why is the Linux kernel a single module with multiple internal partitions, rather than separate modules for the VFS interface, MM interface, network interface etc, along with implementation modules that implement each of those interfaces? If the benefits are as big as you're claiming, surely it would make sense to separate out the interfaces and implementations into separate modules, rather than just have separate partitions internally?

Splitting implementation and interface in C++

Posted Feb 25, 2025 11:16 UTC (Tue) by taladar (subscriber, #68407) [Link] (2 responses)

Whenever a project is a large single library or repository the answer is usually "because the tooling makes it painful to split it up", not "there is a good reason to have a humongous pile of code".

Splitting implementation and interface in C++

Posted Feb 25, 2025 11:24 UTC (Tue) by farnz (subscriber, #17727) [Link]

Note, though, that once you have tooling that makes splitting it up easy, the dividing line is rarely as simple as "interfaces" and "implementations". You're more likely to have splits like "VFS interface and implementation", "ext4 interface and implementation" etc.

Splitting implementation and interface in C++

Posted Feb 25, 2025 21:51 UTC (Tue) by ras (subscriber, #33059) [Link]

> the answer is usually "because the tooling makes it painful to split it up",

Or "it's faster because we can optimise".

My favourite counter example to this is LVM / DM vs ZFS, possibly because I'm a recent user of ZFS. LVM / DM / traditional file system give you similar a outcome to ZFS, albeit with a more clunky interface because "some assembly is required". The zfs CLI is nice. However by every other metric I can think of LVM / DM / ... stack wins. The stack is faster, the modular code is much easier to understand, they have less bugs (I'm tempted to far less), and you have more ways of doing the same thing.

This is surprising to me. I would have predicted to the monolithic style to win on speed at least, and are easier to extend (which evidence against "it's easier to develop that way").

I guess there is a size when the code base becomes too much for one person. At that point it should become modular, with each module maintained by different people sporting an interface that requires screaming and yelling to change. But by that time it's already a ball of mud, I guess the tooling is a convenient thing to blame for not doing the work required to split it up.

Improved dynamic linking ABIs

Posted Feb 24, 2025 11:27 UTC (Mon) by taladar (subscriber, #68407) [Link]

As a past C++ programmer, current Rust programmer and current Gentoo user I would like to just call the idea that C++ compile times are anything that could be described as "fast" utter nonsense.

Improved dynamic linking ABIs

Posted Feb 24, 2025 11:55 UTC (Mon) by excors (subscriber, #95769) [Link]

> What is not the same is C++ (and C) splits the source into .h and .cpp, when you compile against a cpp library you recompile the .h and link with the pre-compiled .cpp files. It's so simple everyone does it that way

Not everyone - even disregarding the cases where templates force you to put all your code in the .h file, there are plenty of libraries that choose to put all their code in the .h file so users don't have to fight with C/C++ build systems or package managers. You just download the code and #include it and it works.

E.g. https://github.com/nothings/stb explains "The idea behind single-header file libraries is that they're easy to distribute and deploy because all the code is contained in a single file" and "[these libraries] are only better [than other open source libraries] in that they're easier to integrate, easier to use, and easier to release", and it's pretty popular as a result of that. There's a big list of header-only libraries at https://github.com/p-ranav/awesome-hpp . It's a good way to make your library more attractive to developers, because package management in the C/C++ world is so bad.

RIIR

Posted Feb 12, 2025 20:41 UTC (Wed) by da4089 (subscriber, #1195) [Link] (15 responses)

While I appreciate Rust as a language, and the idea that in order to maintain things in the future when no one knows C they need to be evolved to newer languages, my most powerful reaction to all the rewrite-it-in-Rust efforts is dismay at the loss of innovation.

As stated, these 55 year old tools are remarkably bug free, have very few dependencies, and continue to work as designed. Their maintenance burden is not high.

But instead of putting effort into discovering new better approaches to tools (and operating systems) we instead spend the innovative potential of a generation to get back to where we started, even losing some features in the process.

We did this already through the 90’s and 00’s to extract our toolset from corporate stasis and sharding. Is doing it again to replace C with Rust really the best use of our effort’s potential?

I don’t mean this as a personal attack, nor as any a criticism of the quality and dedication involved. But it feels to me like an inward turning of the collective vision that quite dismays me.

RIIR

Posted Feb 12, 2025 21:04 UTC (Wed) by kleptog (subscriber, #1183) [Link] (4 responses)

> But instead of putting effort into discovering new better approaches to tools (and operating systems) we instead spend the innovative potential of a generation to get back to where we started, even losing some features in the process.

They're replicating the results of those 55 years of development in few years, so what "innovative potential" is being lost here? Now you have a sound base to continue building, except every year of effort put in the Rust version is equivalent to multiple years of effort on the C version.

I don't understand this idea that these tools should have minimal dependencies. Why should "sort" implement its own sorting algorithm instead of using the battle-tested UltraSpeedParallelSuperSorting crate that is used by everyone else because it sorts blazingly fast?

We can build great programs because we are standing on the shoulder of powerful libraries. Re-use is almost always better than writing it yourself.

Let's face it: C programmers aim for minimal dependencies because dependencies in C programs are really difficult. In more modern language dependencies are one line in your package configuration and you're done.

RIIR

Posted Feb 12, 2025 23:56 UTC (Wed) by dvdeug (guest, #10998) [Link] (3 responses)

> In more modern language dependencies are one line in your package configuration and you're done.

But they're not, for many reasons.

Many programmers have used a cool new library and discovered that it was no longer supported after a few years and wouldn't build with newer versions of other dependencies, and had to replace it. One can recall the left-pad catastrophe, where one author withdrew a 17-line package from npm and thousands of programs stopped building. One can also remember the endless chase of libgtk versions.

You lose control over supported platforms. E.g. the Python-Cryptography package started using Rust, and now no longer works on many of the platforms it used to. Coreutils compiles on HPUX 11 on hppa and ia64. Even if Rust supported those, you're still depending on every dependencies you use to support those. Good luck with your x86 box when your dependency pivots to using Oxide, the programming language that's going to replace Rust.

The bug envelope changes. If sort implements its own sorting algorithm, then if it breaks, you look at the sort code, if necessary checking that it works on the same libc and gcc version. If it uses the UltraSpeedParallelSuperSorting crate, then you have to look at that crate, and any crates it uses. Coreutils supports a variety of C libraries and C compilers (even with the restriction to C99); it's easy to prove that any new bugs are due to your changes. You can say that this crate is used by "everyone else", but you'll end up using a crate that does what you need but not everyone else needs, or using a crate in a way that other people aren't, triggering bugs that nobody else is meeting.

That was benign bugs, too. The Jia Tan attack was made possible by the fact that libxz was linked into ssh. To make it worse, it wasn't even code that ssh used; it was unused code from using another dependency to support systemd. Every dependency adds security risk, and when they're built in this fashion themselves, they bring in more security risks with their dependencies.

If you're writing a KDE program or GNOME program, both come with an array of libraries you can solidly depend on, no problem. A major frontend program can pull in a key dependencies without issue; most people get Gimp or Darktable through their distro or Flatpak anyway. (But note that this idea that dependencies have no cost amplifies the problem. Installing a C library that depends on nothing or a few standard dependencies is low cost. Installing a dependency that depends on a host of other dependencies amplifies most of the problems above.)

But coreutils? Programs that are frequently run as root, are called from shell scripts all over the system, and thus are security critical? Programs that are critical to running on every system? Be it Rust or C, they should carefully consider every dependency.

RIIR

Posted Feb 13, 2025 9:41 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

What I don't like about this argument is that it treats writing your own version of the dependency inside your own code base as if it was free and had no associated risks when that is far from true.

Your own internal version is very likely to be orders of magnitude less tested, optimized, maintained,... than even a moderately used third party dependency. It is also less likely to see the same scrutiny from security researchers or the same amount of tracking from security tracking systems like CVE or GHSA.

The number of programmers who know how it works will be lower. When it isn't touched for a while nobody (including the original author) will know how it works or how to change it. You will have more trouble to find people to add cross-platform support because they will need to use exactly your program, not just any one of the programs using it.

Also, what you say about the bug envelope doesn't work at all. If there is a bug in the sorting algorithm you wrote, you have to find it and invest time that will only benefit that one project. If the bug is in a dependency you have a good chance someone else already found it and even if not, your effort to find and fix it benefits everyone using that library.

You look at it from the perspective of comparing a Rust dependency that is maybe 5 years old with some internal version in some 50 year old C tool and there the cross platform support will likely come out on top because cross platform used to be so much more important in the bad old days when nothing was standardized and the hardware and OS landscapes were more fractured in terms of what baseline features you could expect from your hardware or OS but honestly, reusable and shared code is the better choice in the long term.

RIIR

Posted Feb 21, 2025 2:02 UTC (Fri) by dvdeug (guest, #10998) [Link]

> Your own internal version is very likely to be orders of magnitude less tested, optimized, maintained,...

If we're talking about coreutils, no, it's not. That's the context of this.

> When it isn't touched for a while nobody (including the original author) will know how it works or how to change it.

Good code is understandable. Not to mention, code that does what you need is sometimes easier to understand than interfacing with a powerful library with many features and options.

> Also, what you say about the bug envelope doesn't work at all. If there is a bug in the sorting algorithm you wrote, you have to find it and invest time that will only benefit that one project. If the bug is in a dependency you have a good chance someone else already found it and even if not, your effort to find and fix it benefits everyone using that library.

If there's anyone else using that library. If everyone else hasn't already moved on to the next big thing. If you didn't pick the library that was better, but never hit critical mass and the maintainer dropped it. If you aren't calling with a set of options that nobody else is using.

There are libraries like libz and libpng where there's little reason not to include. But so many libraries, crates, packages, etc. are created, uploaded, maybe even get a few releases, then get dropped. If you're writing coreutils in Rust and hope to replace the existing coreutils, you need to plan for a 50 year existence, most of which is going to spent doing minor changes on programs that work.

Cross platform is a choice. But Debian won't switch until it supports all release platforms, of which there are nine, and failing to support GNU/Hurd or several other semiactive ports is going to lose you some support. Sell to your audience.

> reusable and shared code is the better choice in the long term.

Making a tight program that runs where you need it to and runs even when other stuff is broken is incredibly useful. We have busybox with 0 dependencies, because coreutils today has too many dependencies in some cases. Reusable code is good, but again Jai Tan got in because ssh was being linked against a library which linked against a library that ssh didn't need. You've never had to deal with an upgrade mess when dealing with an incompatible ABI break at the base of a tree of dependencies. Or having to change a program to work with the new version of the library, because the old version of the library has an entirely different API but isn't getting bug fixes any more.

Again, you're writing a GNOME program, go wild with GNOME libraries. You're writing a program that needs a solid solver for 3-SAT, link in z3. You get a choice between writing 5 lines of code using a hash-map that's fast enough, or linking in a custom data structure that is optimized... if it's code for mass distribution and hopefully a long life, save yourself some headache down the road and write the 5 lines of code.

RIIR

Posted Feb 15, 2025 5:53 UTC (Sat) by marcH (subscriber, #57642) [Link]

> > In more modern language dependencies are one line in your package configuration and you're done.

> But they're not, for many reasons.

Indeed not.

https://cacm.acm.org/practice/surviving-software-dependen...
https://medium.com/@sdboyer/so-you-want-to-write-a-packag...
etc.

To be fair, it is much worse with C/C++ because then you have both the technical _and_ the less technical issues.

RIIR

Posted Feb 15, 2025 6:06 UTC (Sat) by marcH (subscriber, #57642) [Link] (9 responses)

Imagine someone invents some new plumbing material that never dissolves harmful pollutants, is cheaper to manufacture, easy and cheap to work with, does not burst when freezing and lasts forever. But in the daily life of the people using water, it makes absolutely no difference[*]. So, no innovation? I think plumbers and house builders would have a very different opinion.

[*] never having to call a plumber and not having cancer is huge but makes no "obvious" difference. And we can rather rarely tell where we got cancer from.

RIIR

Posted Feb 15, 2025 12:04 UTC (Sat) by pizza (subscriber, #46) [Link] (8 responses)

> Imagine someone invents some new plumbing material that never dissolves harmful pollutants, is cheaper to manufacture, easy and cheap to work with, does not burst when freezing and lasts forever.

And the production facilities and distribution are essentially free, because this magical material is produced by self-replicating spherical unicorns that feed on greenhouse gasses.

But even if all these properties were somehow true, it still makes zero economical sense to preemptively rip out and replace the existing plumbing in literally billions of structures.

RIIR

Posted Feb 16, 2025 19:27 UTC (Sun) by marcH (subscriber, #57642) [Link] (6 responses)

> it still makes zero economical sense to preemptively rip out and replace the existing plumbing in literally billions of structures.

This is where software analogies always breaks down: the Economy.

- With software, replacing the existing plumbing in ONE house is barely cheaper than replacing it in ALL houses.
- The duplication cost is so negligible that a single guy doing some DIY work in his garage for fun - NOT for economical reasons - can (and here: does!) end up replacing the plumbing in all houses across the world.

The previous point was not about the economy at all. It was about "innovation". While it's not visible to users at all, there is "behind the scenes" innovation in uutils because it's a production-grade, potentially mass-deployed Rust project. It's admittedly less rocket science and more engineering but the latter matters too.

RIIR

Posted Feb 16, 2025 22:41 UTC (Sun) by pizza (subscriber, #46) [Link] (5 responses)

> - With software, replacing the existing plumbing in ONE house is barely cheaper than replacing it in ALL houses.

That would be true if everyone, and every device, ran identical software in lock-step. Heck, even when the software components _are_ identical, they're often (usually!) integrated in different (if not completely bespoke) manners.

In the longer run, integration (and subsequent maintenance) is usually more work than writing the original software. (not unlike plumbing!)

> The previous point was not about the economy at all. It was about "innovation". While it's not visible to users at all, there is "behind the scenes" innovation in uutils because it's a production-grade, potentially mass-deployed Rust project.

"innovation" implies something new, not a 100% re-implementation (and necessarily quirk-for-quirk compatible) of existing (and well-maintained!) code.. Would you still be calling uutils "innovative" if it was written in oh, Ada, Java, Python, or PHP?

RIIR

Posted Feb 16, 2025 23:23 UTC (Sun) by marcH (subscriber, #57642) [Link] (4 responses)

> > - With software, replacing the existing plumbing in ONE house is barely cheaper than replacing it in ALL houses.

> That would be true if everyone, and every device, ran identical software in lock-step.

This is more and more irrelevant... I just gave a reminder that software marginal costs are negligible, which limits software analogies. How many devices are actually running what software does not matter: marginal software costs are still negligible and the economic aspect of software analogies still fails.

> "innovation" implies something new,...

Obviously.

> ... not a 100% re-implementation (and necessarily quirk-for-quirk compatible) of existing (and well-maintained!) code

That's part of your definition. My definition is "not in mass production yet" and it does not require cars to fly.

> Would you still be calling uutils "innovative" if it was written in oh, Ada, Java, Python, or PHP?

No because:
1) There would most likely be significant performance regressions (which obviously matter in "core" utils) instead of the progressions observed with Rust.
2) None of these languages are recent in 2025, they have all been in production for decades none of them needs any sort uutils-like project to help them get battle-hardened and "upgraded" to mass production.

So I suspect uutils in Java would be non-event.

Here's another analogy: designing and selling an electric car with a brand new, solid state battery design today would be "innovative" because it would help the solid state battery company scale up. That would be innovative even if the car company did nothing innovative besides trusting, partnering with and ultimately helping the innovative battery company doing all the innovative research and even when customers see "only" more range and less fire risk (= the battery equivalent of memory corruption?)

Once a few other Rust rewrites like uutils will be in "mass production" stage (e.g.: enabled by default in some distributions), then I agree the next ones won't be as "innovative" - and will get less press coverage. But there are several domains: first graphical application, first kernel driver, etc. Each domain has different engineering challenges requiring different "innovative" solutions.

RIIR

Posted Feb 17, 2025 14:32 UTC (Mon) by pizza (subscriber, #46) [Link] (3 responses)

> This is more and more irrelevant... I just gave a reminder that software marginal costs are negligible, which limits software analogies. How many devices are actually running what software does not matter: marginal software costs are still negligible and the economic aspect of software analogies still fails.

Just because *you* dismiss it as irrelevant doesn't make it so. The world is far larger than you.

"Marginal software costs" refer to the production of additional identical _copies_. If said copies are not identical, that margin rapidly grows. The computing world is anything but homogeneous.

Or are you seriously arguing that a binary driver written for a specific Windows version will enable that hardware to work on MacOS, all currently-supprorted Linux LTS distributions, and one of countless RTOSes running on the dozen-ish major microcontroller architectures? Heck, even if that driver was provided in source form, it's going to be a ton of work to enable the others -- and it's most likely to be simpler, faster, and cheaper to start over from scratch for each one.

> 2) None of these languages are recent in 2025, they have all been in production for decades none of them needs any sort uutils-like project to help them get battle-hardened and "upgraded" to mass production.

So... the "innovation" here is "Figuring out how to make Rust good enough to compete with stuff that's existed for decades?"

> Here's another analogy: designing and selling an electric car with a brand new, solid state battery design today would be "innovative" because it would help the solid state battery company scale up.

"helping the battery company scale up" is an example of longer-term planning and/or investment, not innovation.

(The battery technology itself, including specific manufacturing processes, may be innovative)

RIIR

Posted Feb 17, 2025 17:49 UTC (Mon) by marcH (subscriber, #57642) [Link] (2 responses)

> Just because *you* dismiss it as irrelevant doesn't make it so.

It's irrelevant to "Is uutils innovation?" which is the very specific disagreement in this subthread. Feel free to discuss software marginal costs or any other very interesting but unrelated question anywhere else.

> The world is far larger than you.

Thanks, that's very useful to know.

> So... the "innovation" here is "Figuring out how to make Rust good enough to compete with stuff that's existed for decades?"

Yes. It's a very important innovation milestone towards getting rid of memory corruption and 70% of vulnerabilities in operating systems, browsers (the other operating system) and more. You're entitled to your definition of "innovation" but mine does not stop outside the lab.

It's just a word definition; let's agree to disagree.

> "helping the battery company scale up" is an example of longer-term planning and/or investment, not innovation.

Not for the very first customer, no. It's most often a partnership and a huge bet.

RIIR

Posted Feb 17, 2025 17:57 UTC (Mon) by marcH (subscriber, #57642) [Link]

> ... your definition of "innovation" but mine does not stop outside the lab.

Simply because many "innovations" die once they leave the lab.

This is especially true with software which gets created and dies orders of magnitude more than in other fields (darn, and now I'm close to making your economic digression relevant...)

RIIR

Posted Feb 19, 2025 7:00 UTC (Wed) by marcH (subscriber, #57642) [Link]

> > So... the "innovation" here is "Figuring out how to make Rust good enough to compete with stuff that's existed for decades?"

> Yes. It's a very important innovation milestone towards getting rid of memory corruption and 70% of vulnerabilities in ...

From https://lwn.net/Articles/1008721/

> Ultimately, Poettering said that he is "happy to play ball" but does not want systemd to be the ones to solve the problems that need to be solved in order to use Rust.

That's the boring, last part of innovation: making it "mainstream". Rust is getting very close but not quite there yet. If ever?

RIIR

Posted Feb 20, 2025 7:07 UTC (Thu) by ssmith32 (subscriber, #72404) [Link]

Compared to the cost of the production and distribution of pipes (particularly large pipes and fittings for multi-unit buildings)?

Yeah, production/packaging and distribution is essentially free for the software in question ("small" command line tools).

You never have the "lone maintainer" problem for the production and distribution of large pipes and fittings, at least produced at the scales we're discussing, because you simply _can't_.

It may not feel that way for the lone maintainer, for sure, but an objective comparison of the costs demonstrates otherwise.

Fish Shell 4 is written in Rust

Posted Feb 12, 2025 23:05 UTC (Wed) by Nikratio (subscriber, #71966) [Link] (4 responses)

With regards to the question about rewriting a shell in Rust: the fish project is going that (https://fishshell.com/blog/rustport/)

Fish Shell 4 is written in Rust

Posted Feb 13, 2025 23:17 UTC (Thu) by intgr (subscriber, #39733) [Link] (3 responses)

Fish is great as an interactive shell, but it's explicitly not POSIX compatible. It cannot run nontrivial shell scripts that work in most other shells. So not a replacement for /bin/sh.

Fish Shell 4 is written in Rust

Posted Feb 15, 2025 6:25 UTC (Sat) by marcH (subscriber, #57642) [Link] (2 responses)

Many quick and dirty scripts start from my command line history. Conversely, I often copy/paste and "import" functions or other code from scripts to my command line for interactively using and/or testing it. This sort of interactive back and forth is too convenient, productive and important and I really don't want to lose it. So, "fish" is out.

Is zsh "compatible enough" with POSIX? Does not have to be 100% for the above. If not, is there some other, fancy new shell that is significantly better than bash and that I should try? I'd like not to lose too much readline "compatibility" either. readline key strokes are close to universal so why should I learn new, different ones? Except for new features of course. It's painful enough to switch between readline and vi commands already.

Fish Shell 4 is written in Rust

Posted Feb 15, 2025 23:33 UTC (Sat) by intgr (subscriber, #39733) [Link]

> zsh "compatible enough" with POSIX?

My understanding is yes, zsh is a proper POSIX shell and even implements some bashisms.

Fish Shell 4 is written in Rust

Posted Feb 17, 2025 13:50 UTC (Mon) by taladar (subscriber, #68407) [Link]

zsh can run pretty much any POSIX command unaltered but there are definitely some of the more advanced features in zsh interactive use you couldn't just copy into a bash script without modification (a lot of the globbing stuff comes to mind).

There are also some subtleties that make it unwise to test if a command in a bash script behaves exactly as desired in zsh instead of bash, particularly around sub shells and undefined variables.

sort -E

Posted Feb 13, 2025 9:10 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (4 responses)

If someone adds a flag to sort compiler error messages, so that filenames are sorted normally but line numbers are sorted numerically:

aaa.c:2: error: ...
aaa.c:20: error: ...
bbb.c:10: error:...

it will be very much appreciated.

And "sort -I" too for in-place file sorting, too...

sort -E

Posted Feb 13, 2025 9:19 UTC (Thu) by alx.manpages (subscriber, #145117) [Link]

I recently wrote a program to sort manual-page file names in the order that they would appear in the book.

intro.2
membarrier.2
umask.2
intro.3
printf.3
id_t.3type
useconds_t.3type

<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.g...>

Something similar could be written to implement your sorterrors program. It's such a niche thing, and different programs format errors slightly differently, that I don't think it would make much sense to add a sort(1) option. But it would make sense to write your own script if it's useful to you.

sort -E

Posted Feb 13, 2025 9:21 UTC (Thu) by alx.manpages (subscriber, #145117) [Link]

For in-place file editing in a pipe, there's a tool, which allows you to use any pipeline with it: sponge(1). It's available in the moreutils package.

<https://manpages.debian.org/bookworm/moreutils/sponge.1.e...>

sort -E

Posted Feb 13, 2025 15:05 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

How would that flag differ from already existing -V sort?

sort -E

Posted Feb 13, 2025 16:22 UTC (Thu) by adobriyan (subscriber, #30858) [Link]

I can't easily deduce from the description what are the broken cases but in practice it is good enough
https://github.com/coreutils/gnulib/blob/master/lib/filev...

Writing clarity request

Posted Feb 13, 2025 20:16 UTC (Thu) by antiphase (subscriber, #111993) [Link] (9 responses)

Campaign to say "higher performance" if that's what you mean. "Performant" doesn't really imply anything other than that something performs a task as instructed.

Writing clarity request

Posted Feb 14, 2025 15:04 UTC (Fri) by jzb (editor, #7867) [Link] (8 responses)

IIRC "performant" was the term used by the speaker. Sometimes we do paraphrase and I admit there are some "words" I avoid using if at all possible (e.g., "impactful") that I also have a bias against, but, generally, I prefer to stick close to the way a speaker talks. This is in no small part because there've been times when I was the person speaking and being quoted, and I want as much fidelity as possible when what I'm saying is reported on.

Also, I've traditionally seen "performant" used this way rather than simply "performs a task as standard"—and at least one online dictionary agrees with the usage. So I'm curious where it's defined otherwise. I'd look in my OED, but all the volumes are currently in boxes. I hope to solve that by next week... :)

Writing clarity request

Posted Feb 20, 2025 7:15 UTC (Thu) by ssmith32 (subscriber, #72404) [Link] (7 responses)

You have physical copies of the OED?
As much as I like seeing Rust in action (and enjoy a passion project) - _that_ is a treasure worth reporting on 😁.

Writing clarity request

Posted Feb 20, 2025 7:58 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

*WHICH* OED?

The concise? The standard? Or the complete?

I'm not sure how big the complete is, but I'd love a copy. Only snag is, it would probably take up a bookcase all by itself :-)

Cheers,
Wol

Paper copy of the full OED

Posted Feb 20, 2025 10:31 UTC (Thu) by farnz (subscriber, #17727) [Link]

I just looked at the OED paper version online. The "complete" is 20 volumes, and currently has three "additional" volumes to get up to date.

Writing clarity request

Posted Feb 20, 2025 13:32 UTC (Thu) by jzb (editor, #7867) [Link]

The complete 20-volume set. It does take up a hefty amount of shelf space -- but it's worth it. I wish that the online subscription wasn't so pricey -- though looking at it now it seems to have come down a bit. (It's $100 a year. At that price I'm almost tempted, it used to be north of $200 if I remember right.)

Writing clarity request

Posted Feb 20, 2025 13:29 UTC (Thu) by jzb (editor, #7867) [Link]

I do. The 20-volume set and the additional 3 volumes. I fell in love with the OED when I was attending a community college, and promised myself I'd buy a set someday. It took about 20 years before I could swing it, but I found a used set in a bookstore on Grand St. in St. Louis. Hope to get to unpacking it this weekend.

Writing clarity request

Posted Feb 20, 2025 18:02 UTC (Thu) by sfeam (subscriber, #2841) [Link] (2 responses)

I have the "Compact Edition" of the full OED on the shelf behind me, two large volumes of tiny print. It came in a custom case with a built-in drawer contain a magnifying glass. Each page contains images of four pages from the full-size version, sort of like the output of psnup -4. Back when I got it the magnifying glass was an unnecessary extra, now these old eyes appreciate the thought. It's still my go-to reference if I'm sitting at my desk.

Writing clarity request

Posted Feb 21, 2025 8:50 UTC (Fri) by geert (subscriber, #98403) [Link]

I guess back in those days they also offered a version on microfiche?

Writing clarity request

Posted Mar 4, 2025 11:16 UTC (Tue) by sammythesnake (guest, #17693) [Link]

My dad reached a certain moment in life when he had to get glasses, both because his eyesight was getting worse, he insisted, but because his arms were getting shorter...

I passed the same milestone myself some time ago, too :-/

Great stuff

Posted Feb 14, 2025 2:01 UTC (Fri) by gnu (guest, #65) [Link] (3 responses)

uutils/coreutils is a wonderful effort. I only wish it had a copyleft license.

Great stuff

Posted Feb 14, 2025 2:52 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (2 responses)

This has been brought up a few times, and I don't really understand it. I can't imagine a strong market for proprietary coreutils variants, and most of the GPL-violating embedded world isn't using coreutils anyway so its license is irrelevant there. The coreutils licence was a much bigger deal when proprietary unix vendors had an incentive to rip off the GNU version, but what's the real-world impact of this specific project being MIT?

Great stuff

Posted Feb 14, 2025 11:23 UTC (Fri) by gnu (guest, #65) [Link] (1 responses)

I can't predict future. Neither can you.

Great stuff

Posted Feb 14, 2025 17:54 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

That feels like it applies equally well to *any* permissively licensed code - we can't predict the future, so we don't know which codebases are going to be important. Why coreutils in particular? Why not criticise musl to the same degree?

What a great project

Posted Feb 17, 2025 5:43 UTC (Mon) by jsakkine (subscriber, #80603) [Link]

One thing that I'd wish for this would be a rudimentary support for systemd units. It wouldn't have to be as efficient or even complete as the implementation in systemd but just barely enough to e.g. use this as a replacement for busybox, when doing kernel testing and stuff like that.

GPL license does matter vs MIT license

Posted Feb 20, 2025 20:00 UTC (Thu) by Alterego (guest, #55989) [Link] (1 responses)

I find it very sad that they choose a non GPL license.

The GPL "obligation" to provide source code is a very good thing, and IMO a key of Linux success.

GPL license does matter vs MIT license

Posted Feb 20, 2025 20:15 UTC (Thu) by mjg59 (subscriber, #23239) [Link]

Who's going to produce a proprietary version of coreutils, and why wouldn't they just start with toybox or the BSD versions instead?