Rewriting essential Linux packages in Rust
Most Linux systems depend on a suite of core utilities that the GNU Project started development on decades ago and are, of course, written in C. At FOSDEM 2025, Sylvestre Ledru made the case in his main stage talk that modern systems require safer, more maintainable tools. Over the past few years, Ledru has led the charge of rewriting the GNU Core Utilities (coreutils) in Rust, as the MIT-licensed uutils project. The goal is to offer what he said are more secure, and more performant drop-in replacements for the tools Linux users depend on. At FOSDEM, Ledru announced that the uutils project is setting its sights even higher.
Ledru has been a Debian developer for more than 20 years, and is a contributor to LLVM/Clang as well as other projects. He is a director at Mozilla, but said that the work he would be talking about is unrelated to his day job.
Bread, woodworking... or Rust?
Ledru said that learning Rust was a project he started during the
COVID lockdown. Some people chose to learn to bake, others took up
woodworking, and he took up rebuilding all the LEGO kits in the house
with his son. But he wanted a project in the evening that would help
him learn Rust. "I have been surrounded by upstream Rust developers
inside the Paris office of Mozilla, so I wanted to learn it
myself
." He didn't want to take on a side project that would
just sit on his hard drive, he wanted to do something with impact.
"So I was thinking, what about reimplementing the coreutils in
Rust?
" Well before COVID, Ledru had worked on rebuilding the
Debian archive using Clang, a project that is documented at clang.debian.net. Ledru said that
he had been inspired by Chris Lattner's work on Clang. One of the core
fundamentals of Clang, he said, is the
philosophy that "if you have different behaviors than GCC, it's a
bug.
"
Next, he asked the audience who knew which programs were in GNU
coreutils. Most people knew at least one, many knew at least
five. But he was pretty sure no one in the audience knew
everything in the list of coreutils, unless they happened to be an
upstream developer on the project. Ledru said that pr, used
to format text for printing "on actual paper
", is one of his
favorite coreutils programs. To start on the project he selected "all
the fancy ones
" like chmod, chown, ls,
and mkdir—the commands that people on Linux and macOS
use almost every day for their work.
Near full completion
Now, five years later, Ledru said that he has more gray hairs, and
the project has Rust replacements for all of the more than 100
commands in coreutils. The project has more than 530
contributors, and more than 18,000 stars on GitHub, "[they're]
meaningless, but it's one of the metrics we have
".
A more meaningful measurement of the project's success is how well
it fares against the GNU coreutils test suite. He said that the
project only passed 139 tests when it started testing in 2021, and it
was now close to 500 out of 617. "I should have worked last
weekend to be at 500, but I didn't have the time
," Ledru
said. (According to the JSON
file of results, it crossed 500 tests passed on
February 4, with 42 skipped and 75 failed.) He
displayed a slide with a graph of test suite runs from April 2021
to late January 2025, shown below.
Most of the tests that the project still fails have pull requests
to fix the problems, or may be things that "nobody cares about
"
as well as some "weird ordering
" problems. For example, if a
user tries to use rm in a directory without the appropriate
permissions, the output may be slightly different. "GNU is going to
show something first, and we show it at the end, and it can make a
small difference
".
More and more of the programs are passing all of the GNU tests, but that may not mean the Rust version is fully compatible with the GNU implementation because the test suite itself has some limitations. Ledru said that the project has been contributing to the GNU coreutils project when it finds things that are not tested.
He said that the Rust coreutils are now "production ready
",
and that they support Linux, FreeBSD, NetBSD, OpenBSD, Illumos, Redox,
Android, macOS, and Windows. There are also Wasm versions. According
to Ledru, the Rust coreutils are used by the Debian-based Apertis distribution for
electronic devices, the Spectacles
smartglasses, and Microsoft is using
the project for Visual Studio Code for the web. The Serpent OS distribution is using
them by default, he said. He noted that, since the project is open
source, it is probably in use elsewhere without his knowledge and
asked any company using them to let him know. "I'm always excited
to know when people are using our code
".
Why Rust
Ledru's talk was immediately after Miguel Ojeda's keynote on Rust for Linux. Ledru said that Ojeda had already covered some of the reasons for using Rust, but that he would offer his point of view as well.
As part of the release-management team for Firefox, he was involved
when the browser started shipping with Rust code. That made him
"completely biased
" about the suitability of Rust for the
browser use case. He pointed out that Chrome, the Linux kernel, and
Microsoft Windows are starting to include Rust. "I'm going to
state the obvious, that Rust is very good for security, for
parallelism, for performance
".
The idea to replace GNU coreutils with Rust versions was not about
security, though, because the GNU versions were already quite
secure. "They did an amazing job. They almost don't have any
security issues in their code base.
" And it's not about the
licensing, he said. "I'm not interested in that debate
."
One of the reasons that Ledru liked Rust for this project, he said,
is that it's very portable. He is "almost certain
" that code he
writes in Rust is going to work well on everything from Android to
Windows. That is very surprising, he said, given the complexity of
"everything we do
" but the cost of porting a change to an
operating system is small "thanks to the ecosystem and quality of
Rust
".
Ledru cited laziness as another reason for using Rust. "So if
there is a crate or library doing that work, I'm going to use it. I'm
not going to implement it [myself].
" There are between 200 and 300
dependencies in the uutils project. He said that he understood
there is always a supply-chain-attack risk, "but that's a risk we
are willing to take
". There is more and more tooling around to
help mitigate the risk, he said.
He is thinking about "what we are going to
leave to the next generation
". Developers starting out don't want
to use COBOL, Fortran, or C, he said. They want to work with fancy
stuff like Rust, Swift, Go, or Kotlin. It is a good investment to
start planning for the future now and transition to new languages for
Linux and the computer ecosystem in general.
Demo time
Even though the goal is for the Rust coreutils to be drop-in
replacements, Ledru said, "we take the liberty at times to
differentiate ourselves from the GNU implementation
". Here he
showed a demo of the cp command with a --progress
option that is not available with the standard GNU
version of cp. He said it was available with the
mv command too, and invited the audience to ask if there were
other places the project should add it. "In Rust, it's pretty easy
to add that
".
He also walked through a demo that compared the Rust implementation of sort to GNU's. He used the hyperfine command-line benchmarking tool to run a test ten times; sorting a text file containing all of Shakespeare's works to see which implementation was faster. The first time he performed the test, he used a debug build of the Rust version of sort. In that demo, Rust's version was 1.45x faster than the GNU version. Then he ran the test again using a non-debug version, which showed the Rust version performing the test six times faster than GNU's implementation.
Currently, the project has continuous integration (CI) and build
systems for most of its supported platforms, with almost 88% of the code
covered by a test suite. "If you are above 80, you usually are very
happy. Here we are even happier with nearly 90%
". Despite trying
to demonstrate how the Rust implementation was better than GNU's,
Ledru stressed that there is a friendly collaboration between projects
and that they have been sending bug reports and patches upstream for
GNU coreutils.
What's next
Rewriting more than 100 essential Unix and Linux utilities in a new language is an impressive achievement. Some, upon nearing completion of such a project, might stand back and admire the work and think about calling it a day. Ledru, apparently, is not one of those people.
He displayed a slide with the familiar adage "the best time to
plant a tree is 20 years ago, the second-best time is now
", and
talked a bit about the age of the Unix core utilities. The original
Unix utilities will be 55 years old in four or five months, he
said. Despite their age, the utilities (albeit newer implementations
of them) live on and continue to evolve. Ledru pointed out that the
GNU project continues to add new options to the coreutils and
introduce new algorithms "and we are going to do the same
".
In parallel, Ledru said that the project had started working on
rewrites of GNU
Findutils and GNU
Diffutils. Those have been less of a focus, he said, but people
have been doing great work on implementing those and improving their
performance on the bfs
test suite used to test find. Now, Ledru said, "I'd like
to do the same with most of the essential packages on Debian and
Ubuntu
". There isn't a better time than now to start that
task, he said. What are the essential
packages? He displayed the command he uses to find them:
$ dpkg-query -Wf '${Package;-40}${Essential}\n' | grep yes$
The list includes procps (/proc file system utilities), util-linux (miscellaneous system utilities), acl (access-control-list utilities), bsdutils (standard BSD utilities) and several others.
Ledru then published a blog post from the stage formally announcing the plan to rewrite other parts of the modern Linux stack in Rust. He said that many people were already contributing to the project, and uutils was using the work that has already been done for the coreutils in Rust.
There are many, many functions we have in the coreutils that can be used for other programs as part of that world. So when you need to mount a file system or when you need to look at permissions, we already have all those functions.
He said that the project has a lot of low-hanging fruit and
good first bugs for contributors to get started if they would like to
learn Rust. "It's the way that I learned, and that's why we have so
many contributors
". Projects like rewriting and reinventing the
wheel might sound crazy, he said, but he thought it would work because
there is an appetite from the community.
As I was saying earlier, I'm getting older. We all do, but the new generation is not going to want to do C. And paving the way to do Rust is also a good opportunity for them to be involved in the open-source ecosystem.
The old tools are still using old build pipeline systems, some do not
have CI. They are still using mailing lists to send patches. "I
apologize for the Linux developers here, I still think that using
mailing lists to do a patch review sucks
". With GitLab and GitHub,
he said, there was an opportunity to have a proper pipeline with CI,
quality tools, and so on, which is one of the advantages that uutils has
over the old projects. "To be clear, that works for them, so I'm
happy, but we can do better as a community.
"
Finally, he wanted to mention again that uutils is not a
Mozilla project and has no company behind it, no interest from big
tech. "I'm doing it as a passion because I care about Debian and Linux
in general, it's really a community effort, there is no structure
behind it
". He then opened the floor for questions with a little
bit of time remaining.
Questions
The first question was about portability, and whether Ledru was
tracking the other efforts toward additional Rust compilers to extend uutils coverage. Ledru
said "we will when they are ready
". Another audience member
asked if there were any plans to develop a shell, like Zsh. Ledru said no,
he wanted to replace existing utilities written in C, not to rewrite a
shell, that there was "no need in that space
".
The question that followed was about Ledru's stance on utilities that are written in Rust that are part of the "rewrite in Rust trend" like ripgrep, bat, and others. He said that they are amazing projects, and he uses ripgrep daily, but they are not drop-in replacements.
The final question was about packaging. A member of the audience
observed that it is really hard to package Rust packages due to
dependency management, and wanted to know what Ledru's experience with
that was and how to work around it in future. Ledru agreed that it was
a big deal and that Rust was "quite hard to package
", but he
thought it was going to stabilize.
[I was unable to attend FOSDEM in person, but watched the talk as it live-streamed on Saturday. Many thanks to the video team for their work in live-streaming all FOSDEM sessions.]
Index entries for this article | |
---|---|
Conference | FOSDEM/2025 |
Posted Feb 12, 2025 14:35 UTC (Wed)
by koverstreet (✭ supporter ✭, #4296)
[Link] (25 responses)
That would get us a shell with full access to the whole Rust library ecosystem.
And the thing I keep running into in shell programming is they always start as little bits of glue, but then as they grow (particularly in test environments, where I use them the most) the first thing you need is real error handling - and Rust is first in class there.
Posted Feb 12, 2025 14:51 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (23 responses)
It sure sounds cool, but in practice, an interactive shell is not a good environment for general-purpose code.
Posted Feb 12, 2025 15:07 UTC (Wed)
by koverstreet (✭ supporter ✭, #4296)
[Link] (22 responses)
Posted Feb 12, 2025 15:12 UTC (Wed)
by dskoll (subscriber, #1630)
[Link] (19 responses)
Yes, because it's so (too?) convenient and guaranteed to exist on pretty much any UNIX-like system.
It's a tough habit to break, but I generally switch to a real scripting language (Perl is my favorite... don't judge) if a shell script gets longer than about 15 lines.
Posted Feb 12, 2025 16:45 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (8 responses)
A long time ago when I begun with unix, my default shell was tcsh. I didn't understand why some of my scripts didn't work the same way on the command line. I thought this had to do with interactive vs non-interactive. When I realized that the shell was different, I switched to bash and lost a lot of stuff (it was 1.14 by then), but I started to write working scripts. When 2.05b arrived scripting abilities improved a lot! I once tried to switch to zsh because it was compatible with bourne, but admins hated it due to the way it displays completion and moves the screen up. And one difficulty was that you quickly get used to features that again don't work in your portable scripts. So I went back to bash, the (in)famous bug-again-shell that everyone loves to hate. But in the end, scripts written with it work basically everywhere. And if you need to get rid of it because you end up with a real bourne (like dash), that's not that dramatic, you spend a few hours getting rid of bashisms and that's done. Or the user installs bash or ksh, since their compatible subset is quite large.
That's why people continue to write (sometimes long) shell scripts: that's easy, portable and transferable to someone else to take over them. And they come with basically 0 dependencies (aside external programs) so that makes them future-proof contrary to many interpreted languages whose interpreter breaks compatibility, or whose libraries love to change API all the time by adding a new argument to a function...
Posted Feb 13, 2025 9:09 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (7 responses)
Posted Feb 13, 2025 12:50 UTC (Thu)
by pizza (subscriber, #46)
[Link] (6 responses)
This turned out to be the root cause of an infuriating intermittent bug I was trying to figure out -- nobody could give me a reproducible test case. Turns out they were using something other than English locales.
Posted Feb 14, 2025 15:35 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (5 responses)
Posted Feb 14, 2025 23:51 UTC (Fri)
by himi (subscriber, #340)
[Link] (3 responses)
A few simple experiments with bash suggest "08" used in an arithmetic expansion will error ('bash: 08: value too great for base (error token is "08")') - pretty much what I'd expect, though it'd take me by surprise the first time it happened (at least the error is nice and informative). Are you talking about something like iterating through 0-padded numeric months and hitting errors once you hit August?
Posted Feb 15, 2025 0:16 UTC (Sat)
by Wol (subscriber, #4433)
[Link]
Copy a column of date-times from one spreadsheet to another. Format the date-time as a time. Save as a csv. If you're not in America the csv will contain a "random" mix of times and date-times. But all the date-times will not be a valid American date, for example today - 15/02/2025 - will be a date-time, while fast-forward a fortnight to 01/03/2025 and it will be correctly converted to a time.
The fix is easy - explicitly format/force the text into a European date, and everything then works, but it's a pain in the arse. Especially if every time you hit the problem, you've forgotten the previous time ...
Cheers,
Posted Feb 17, 2025 13:42 UTC (Mon)
by taladar (subscriber, #68407)
[Link] (1 responses)
The solution is easy once you know about it, you can simply prefix every variable in the arithmetic expression with 10# to force base 10 but you need to know about.
Posted Feb 18, 2025 21:46 UTC (Tue)
by himi (subscriber, #340)
[Link]
I try to use unix timestamps ($(date +%s)) whenever I'm dealing with times and dates of any sort in shell scripts, and then use $(date -d @...) to convert to something human readable - not particularly portable, but I'm fortunate enough to only really deal with Linux systems (aside from an occasional Mac, which has caused me serious headaches at times).
I do sometimes use sub-second precision (e.g. $(date +%s.%N)) for higher resolution timing, which then necessitates using bc -l or similar for any calculations. That recently bit me, in combination with scoping issues - I was mixing second and sub-second precision and the (perfectly logical) variable name 'now', and getting errors from arithmetic expansions when a sub-second precision value was still active where I thought it should be second precision . . . Fixing that just required fixing the scoping and naming issues ('local' is one of my favourite keywords), but it took *far* to long to spot the bug.
And to bring this back on topic, that's one monstrosity of a shell script that *absolutely* needs to be rewritten - though probably not in Rust, sadly, since I'm the only person in my team who'd have any hope of working on a Rust version . . .
Posted Feb 17, 2025 21:12 UTC (Mon)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 13, 2025 2:21 UTC (Thu)
by interalia (subscriber, #26615)
[Link]
Posted Feb 13, 2025 6:01 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (8 responses)
Error handling is a real pain in shell (just like in... C) but "set -e" and "trap EXIT" mitigate a bit.
The truth is: the shell ability to "glue" processes, files and pipes is still unmatched. Why? Simply because that's exactly why it was designed for. Everyone who has ever used Python's "subprocess" knows that. Also, "shellcheck" is incredibly good; saved me hours and hours of code reviews thanks to: "please run shellcheck, it will tell you what's wrong with this line". Last but not least: the shell is a decent... functional language! You can put a function in a variable which is amazing and very useful for a language that limited otherwise.
Posted Feb 13, 2025 7:13 UTC (Thu)
by himi (subscriber, #340)
[Link] (7 responses)
I would like to second, third and fourth this (and then some) - shellcheck is amazing, it makes creating maintainable code in shell massively more practical.
The biggest issue I have with shellcheck is actually that it's /too/ good - it dramatically raises the threshold where my gut starts screaming at me that it's essential to rewrite this thing (some random hack grown into a monstrosity) in a "real" language, leaving me with constant cost/benefit anxiety. I've caught myself more times than I'd like to admit thinking "This really needs to be redone in Python . . . . but the incremental cost of adding X feature to the shell code is less than the cost of completely rewriting 1000+ lines of shell code from scratch, so . . . "
Without shellcheck making such a massive improvement in the quality, consistency and coherence of my shell code that would never have become a thing for me - it truly is both a blessing and a curse.
Posted Feb 13, 2025 9:11 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (6 responses)
Posted Feb 13, 2025 10:01 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Feb 13, 2025 12:48 UTC (Thu)
by pizza (subscriber, #46)
[Link]
"It's only temporary if it doesn't work"
Posted Feb 13, 2025 12:48 UTC (Thu)
by pizza (subscriber, #46)
[Link]
...And given how perl goes through great pains to be "bugward" compatible indefinitely.
(I have literally decades-old perl scripts that still JustWork(tm). During the python2->python3 migration debacle I decided to just rewrite a lot of p2 (and early p3) into perl instead, and it's not needed to be touched since then. Ran faster, too)
Posted Feb 13, 2025 18:43 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 13, 2025 22:53 UTC (Thu)
by himi (subscriber, #340)
[Link] (1 responses)
It may have been moderately portable when it really was a quick hack, but even then it was probably only used in exactly one location and probably wasn't very useful anywhere else. The growth pattern that turned it into a monstrosity certainly didn't increase portability - it made it into something with even more detailed ties to the environment it's used in.
Shell /is/ good for quick and dirty hacks that are moderately portable, far better than Python. But quick and dirty hacks tend to keep getting hacked on, and the point of the whole thread here is where you stop hacking on the shell code and rewrite it in something that's a bit more amenable to proper software engineering practises . . .
Posted Feb 14, 2025 2:05 UTC (Fri)
by marcH (subscriber, #57642)
[Link]
Note this is not so specific to shell. How many times have we looked at some piece of code and thought "This has grown organically, it should really be rewritten". Of course the bar tends to be much higher when a language switch is required, notably because the _incremental_ change possibilities are much more limited. But at a very high level it's the same https://wiki.c2.com/?PlanToThrowOneAway logic.
Posted Feb 12, 2025 15:25 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Feb 12, 2025 16:10 UTC (Wed)
by stijn (subscriber, #570)
[Link]
Posted Feb 12, 2025 21:20 UTC (Wed)
by walters (subscriber, #7396)
[Link]
But on the topic of scripts and testing - in bootc I started to try to use nushell for some of our integration tests which are *mostly* just forking external processes, but have about 10% where you really want things like arrays.
You can get a good sense of what this is like from e.g. this test:
A notable plus is that it's super easy to parse JSON from a subprocess without falling back to jq and thinking about its mini-language and then parsing that back into bash.
But a downside is we've been caught twice by subtle incompatible nushell changes:
I hope this won't happen too much more often...
---
If you want to approach this a different way and write a regular Rust program but fork subprocesses with as little ceremony and overhead as possible, I heartily recommend https://crates.io/crates/xshell
It's really cool because it makes use of Rust macros to do what you can't easily do in Go or Python - correctly quote subprocess arguments referring to Rust local variables.
Posted Feb 12, 2025 15:37 UTC (Wed)
by stijn (subscriber, #570)
[Link] (17 responses)
Posted Feb 12, 2025 19:05 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link] (2 responses)
Posted Feb 13, 2025 13:30 UTC (Thu)
by stijn (subscriber, #570)
[Link] (1 responses)
Posted Mar 1, 2025 17:00 UTC (Sat)
by jkingweb (subscriber, #113039)
[Link]
Posted Feb 12, 2025 19:06 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
> A field is the maximal string matched by the basic regular expression:
But note that this is describing how the -f flag works, not the -c flag (which is also the case for the man page you quote).
As far as I can tell, POSIX makes no allowance whatsoever for right-justifying the -c output, and in fact specifies the opposite:
> If the -c option is specified, the output file shall be empty or each line shall be of the form:
(%d means "a number in decimal," not "a number in decimal, but possibly with some whitespace in front of it.")
If a uniq implementation right-justifies its -c output, that is either a bug or a deliberate non-conformance to the standard. I would suggest reporting it upstream if you have not already done so.
[1]: https://pubs.opengroup.org/onlinepubs/9799919799/utilitie...
Posted Feb 13, 2025 13:55 UTC (Thu)
by stijn (subscriber, #570)
[Link]
Yep I know. It is a horror I noted in passing.
> You may dislike that interpretation of "field," but it is specified in the POSIX standard[1]:
I very much dislike that POSIX standard then. POSIX probably codified existing behaviour, but anyway it does not matter. Clearly existing options have to continue to carry the meaning that they have, I'm not tilting at that windmill.
Coreutils combined with utilities such as datamash can be quite powerful in composing filters, maps or computations in a streaming paradigm, but it does require having a meaningful definition of field. One workhorse in scientific computing is the dataframe encoded as tab-separated data with column headers. This format (bar column headers) is already half-embraced by a lot of unix utilities, I hope this progresses further. A good check is always 'does it work for the empty array / empty string', and this is required to make the tab-separated format usable.
> As far as I can tell, POSIX makes no allowance whatsoever for right-justifying the -c output, and in fact specifies the opposite:
In the savannah gnu git repo the code for uniq has 'if (count_occurrences) printf ("%7jd ", linecount + 1);'
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;...
I'm not interested in left-justification either - datamash -g 1 count 1 is a more usable alternative.
Posted Feb 15, 2025 22:59 UTC (Sat)
by pdewacht (subscriber, #47633)
[Link]
Posted Feb 16, 2025 16:33 UTC (Sun)
by ck (subscriber, #158559)
[Link] (3 responses)
Posted Feb 17, 2025 13:52 UTC (Mon)
by taladar (subscriber, #68407)
[Link] (2 responses)
Posted Feb 17, 2025 13:55 UTC (Mon)
by pizza (subscriber, #46)
[Link] (1 responses)
I thought that was a bad thing?
Posted Feb 17, 2025 14:15 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Posted Feb 16, 2025 16:57 UTC (Sun)
by dskoll (subscriber, #1630)
[Link] (6 responses)
... | uniq -c | sed -e 's/^ *//'
You could wrap that in a shell function or script, I guess.
Posted Feb 17, 2025 9:30 UTC (Mon)
by stijn (subscriber, #570)
[Link] (5 responses)
The thought had occurred to me. Implicit in my point here is that this is a fudge, easily fixed by adding a new option to uniq that does the right thing. Shell programming with unix pipes can be an elegant and very concise way to mutate data in a functional way. Having to include fudges like the above (at the expense of a process) grates and creates an impression of crummy (shell) programming that should be completely unnecessary.
Posted Feb 17, 2025 11:18 UTC (Mon)
by mbunkus (subscriber, #87248)
[Link] (4 responses)
• humans: we need data formatted so that it visually very clear where columns start & end. We also prefer to be able to determine at a glance when a number is bigger than another number. This means that columns must be aligned in the first place in order to satisfy the first requirement, and for numbers right-aligning satisfies the second. We (humans) might even profit from table borders.
You cannot satisfy all those requirements with a single format. Therefore I consider your argument to be completely wrong. The default output for uniq is to be easily readable by humans. That's a design choice. It's not a bug.
[1] Examples with bash:
[mosu@velvet ~]$ printf "moo\nmoo\ncow\n" | uniq -c | awk '{ sum += $1 } END { print sum }'
Posted Feb 17, 2025 18:46 UTC (Mon)
by stijn (subscriber, #570)
[Link] (2 responses)
- current default behaviour of uniq -c is poor for composing.
With this, we can have both a 'visually clear' format and a suitable-for-composing format. For compatibility the current format is of course the default in that scenario.
> For example, if you want to process it further via pipes then then awk & bash don't care at all about the right-aligned numbers[1], whereas other programs might.
I work a lot with dataframes, which are essentially mysql tables in tab-separated format with column headers, or equivalently a single table in a spreadsheet, or the things you might want to read with Python pandas or in R. Tab separated is preferred, as I've never encountered a need to escape embedded tab characters. In this wider ecosystem there is no automatic white-space scrubbing of data and a there is a requirement that tables are well-formatted. Programs such comm, join, datamash, shuf and a fair few more can be very handy in summarising, QC'ing or (even) manipulating this data. Hence I clamour for the ability (not necessarily as default) to have all tuple/table type data formatted as tab-separated tables, with or without column names. This should go well with unix composability of processes.
Posted Feb 17, 2025 19:18 UTC (Mon)
by mbunkus (subscriber, #87248)
[Link] (1 responses)
> It is quite puzzling that Richard Stallman let this program loose on the world as it violates usual Unix well-behavedness of textual interfaces.
And to that my argument was that first and foremost `uniq -c` was most likely designed to be easy to read by humans. By that metric it is very much well-behaved & doesn't violate anything. Furthermore, even with it being designed to be human-readable its output is actually useable as-is by a lot of other traditional Unix programs, making it arguably even less of a "wrong" that has to be "righted" (your choice of words, again from your first post). Apart from awk & bash which I mentioned earlier, "sort -n" works fine as well.
Posted Feb 17, 2025 20:15 UTC (Mon)
by stijn (subscriber, #570)
[Link]
Posted Feb 18, 2025 8:11 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
> • humans: we need data formatted so that it visually very clear where columns start & end. We also prefer to be able to determine at a glance when a number is bigger than another number. This means that columns must be aligned in the first place in order to satisfy the first requirement, and for numbers right-aligning satisfies the second. We (humans) might even profit from table borders.
Then why is it a static number of columns wide? I know to make it actually the right width, the entire output needs to be known so that you can't output anything until the whole thing is read, but if human viewing is most important, why not buffer and Do It Right™? Either the output is small and quick enough to not really matter or it is so large that…what human is really going to be looking at it directly anyways?
> • other programs: here it depends on what the other program is & what it expects as input. For example, if you want to process it further via pipes then then awk & bash don't care at all about the right-aligned numbers[1], whereas other programs might. If your goal isn't pipe-processing but e.g. copy-pasting into spreadsheets, then CSV-formatted data might be much better (though that would make processing in awk/bash much harder)
Sure, awk and bash do separator coalescing. But `cut` doesn't, so one needs to `sed` before `cut`, but not before `bash` and `awk`. Great. Yet another paper cut to remember in shell scripts. Given that `bash` and `awk` do support the mechanism that `cut` would understand and the general human-ness usefulness is of questionable quality…it really seems like an unnecessary quirk of a tool's output.
Posted Feb 12, 2025 16:46 UTC (Wed)
by excors (subscriber, #95769)
[Link] (13 responses)
This sounds unfortunately similar to Lord Farquaad from Shrek: "Some of you may die, but it's a sacrifice I am willing to make". The developers choosing to add convenient dependencies aren't going to be the ones suffering the costs of any such attacks, so I think there's some misalignment of incentives.
Hopefully any companies or large projects making use of this will be aware of the risk they're taking on, and will contribute to the tooling and code review and maintenance efforts needed to mitigate it. (Is there much activity around cargo-crev/cargo-vet nowadays, and is there anything better than that?)
Posted Feb 12, 2025 19:17 UTC (Wed)
by lunaryorn (subscriber, #111088)
[Link] (1 responses)
That said, I believe that adding cargo-vet to Rust project is universally a good idea, even if dependencies are routinely exempted. At least it makes the cost of dependencies visible, and helps to separate the big and trustworthy crates (eg libc, rustix, Tokio, etc.) from small. little know dependencies which need a closer look.
Posted Feb 13, 2025 9:17 UTC (Thu)
by taladar (subscriber, #68407)
[Link]
Posted Feb 12, 2025 19:43 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link] (9 responses)
Supply-chain attacks are a legitimate concern and I completely agree with your second paragraph. That said, there are benefits and I would like to see solutions around how to do it rather than treating having distributed dependencies as a design flaw.
I mean, we could say the same thing about dynamic linking (and some do I realize) or having any dependencies at all. I mean, I hope that the users of all software realize the risks that they are taking when trusting some random C library or C++ compiler to provide foundational functionality to the applications that they compile. Reflections on trusting trust I guess.
Posted Feb 12, 2025 19:54 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link]
Posted Feb 12, 2025 21:39 UTC (Wed)
by excors (subscriber, #95769)
[Link] (7 responses)
I think the same logic arguably does apply in that case too, and explains why developers have kept using C for so long. There is certainly _some_ incentive to prevent bugs in their code - pride, reputation, contracts, etc - but the developers aren't the people whose hospital is disrupted by ransomware exploiting a code execution vulnerability in some packet parsing code. If they felt those consequences directly, they might have been more eager to adopt memory-safe languages decades ago.
Many are adopting Rust now, but I think that's not primarily because of memory safety: it's because Rust is generally a much nicer language to work in than C/C++, with a helpful type system and good performance and modern package management and IDEs and all the other stuff that developers enjoy. The memory safety is a compelling bonus, but I believe very few people would use Rust if it wasn't better at everything else too.
Regarding third-party dependencies, C/C++'s lack of good package management means developers face a significant cost to pulling in dependencies: you've got to download tarballs and fight with incompatible build systems and keep on top of API changes and do it all differently for Windows and it's a big pain, so you're likely to limit yourself to a few widely-used libraries that are too large and/or too tricky to write yourself (and are therefore probably maintained by a team). For anything not so large/tricky, it's easier (though still a pain) to rewrite it yourself inside your application, or just remove the feature from your application because it's too hard to support.
It's a happy accident that the cost to developers (caused by poor package management) correlates with the cost to users (from the wider supply-chain attack surface), so the incentives are aligned and it works out okay in practice. But Rust greatly reduces the cost to developers of adding a tiny dependency (or 300) from some random guy on GitHub, without reducing the risk to users from each dependency, so it creates an imbalance that needs to be addressed somehow. Hopefully cargo-vet helps by both increasing the cost to developers (not much, but they're at least aware of the lengthening code review backlog every time they add a dependency) and reducing the risk to users (when any code reviews are actually performed).
Posted Feb 12, 2025 22:19 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (4 responses)
Assuming that big projects adopt cargo vet, this allows smaller projects to "ride on their coattails" and shrink the exemptions list by trusting Google, Mozilla, or other big names to provide audits.
Posted Feb 12, 2025 22:48 UTC (Wed)
by koverstreet (✭ supporter ✭, #4296)
[Link] (3 responses)
Couple that with tools that show you "you have x dependencies that haven't been sufficiently vetted" (or have had significant changes since then), and we could efficiently farm out the auditing.
Posted Feb 13, 2025 0:11 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Feb 14, 2025 23:23 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Feb 13, 2025 10:14 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Posted Feb 13, 2025 9:20 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Feb 14, 2025 6:19 UTC (Fri)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 14, 2025 6:56 UTC (Fri)
by hsivonen (subscriber, #91034)
[Link]
Of course, being packaged for Debian involves less auditing than people generally believe, but it’s somewhat inconsistent to believe that C code becomes trustworthy by distro packing and not believe the same of Rust code.
(Furthermore, dependencies don’t tend to be random crates from random authors but there’s a cluster of popular dependencies by people who have had community name recognition for a while.)
Posted Feb 12, 2025 17:14 UTC (Wed)
by garyguo (subscriber, #173367)
[Link] (8 responses)
We had some scripts that invoke `nproc` to determine the parallelism to use, and we want to move this into containers. Then we realized that in containers with resource restrictions, `nproc` from coreutils still report number of total cores, but not the amount of CPUs available to it by the cgroup. It turns out that the nproc haven't received meaningful updates since its inception and obviously doesn't know what to do with cgroups.
The fix? We simply replace coreutils nproc with uutils nproc in the container. uutils nproc simply calls into Rust std's available_parallelism, which is cgroup-aware.
Posted Feb 12, 2025 17:27 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (7 responses)
You might have been using a very old coreutils. I've been relying on this behavior of coreutils' nproc for a few years already, and as far as I can tell, it very much exists:
Posted Feb 12, 2025 18:40 UTC (Wed)
by jengelh (guest, #33263)
[Link]
Posted Feb 12, 2025 19:09 UTC (Wed)
by garyguo (subscriber, #173367)
[Link] (4 responses)
No, coreutils 9.5 still has the same issue.
`AllowedCPUs` restricts what CPUs can be seen, so nproc reports what you expected. This is true even with very old nproc. The difference is `CPUQuota`, which is what gets used for k8s resource limits.
$ systemd-run --pipe -p CPUQuota=200% nproc
$ systemd-run --pipe -p CPUQuota=200% uutils-nproc
Posted Feb 12, 2025 19:13 UTC (Wed)
by intelfx (subscriber, #130118)
[Link] (3 responses)
Posted Feb 12, 2025 19:36 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
I'm skeptical of that. It seems like you should get preempted more often and lose some performance to context switching (as compared to the approach of dividing CPUQuota by 100% to get the optimal number of threads, so that each thread has enough quota to run for the maximum allowable timeslice without preemption).
Posted Feb 12, 2025 19:37 UTC (Wed)
by intelfx (subscriber, #130118)
[Link]
Posted Feb 12, 2025 19:53 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
If your workload is bursty - a lot of work for a small fraction of the period used by CPUQuota, then idle for the rest of the period - you might want a thread per core so that when work comes in, you spread across all cores, get it done, and go back to idle. If your workload saturates all possible threads - so it'll be throttled by CPUQuota - you usually want just enough threads to allow you to saturate your quota, and no more (e.g. 3 threads for a quota of 250%); doing so means that you benefit from the cache effects of holding onto a single CPU core for longer, rather than being throttled most of the time and hitting a cooler cache when you can run.
And if you're I/O bound, chances are good that you can't make good use of a large number of threads anyway, because you're spending all your time stuck waiting for I/O on one thread or on 128 threads. You might as well, in this situation, just use 2 threads and save the complexity of synchronising many threads.
I thus believe it's rare to want thread per CPU core when you have a CPUQuota control group limit.
Posted Feb 13, 2025 0:08 UTC (Thu)
by sionescu (subscriber, #59410)
[Link]
Posted Feb 12, 2025 18:00 UTC (Wed)
by jreiser (subscriber, #11027)
[Link] (6 responses)
Why? (Algorithm strategy? Parallelism of CPU? Of I/O?) In what environment? (The same GNU code works in 16-, 32-, and 64-bit environments. 32-bit MS Windows offers only a cramped 2GiB of user address space, while the input and data structures fit easily in a minimal 64-bit machine with 4 GiB of RAM.)
Posted Feb 12, 2025 19:29 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
6x is a pretty large improvement, so I expect that they also made some substantive changes to how it works, which is probably easier in Rust-with-dependencies than it would be in C-without-dependencies. For example, Rayon provides a parallelized sorting implementation for data that fits in memory[1], whereas that would probably take tens or even hundreds of lines of C to write from scratch. Sorting a full file probably also benefits from Rayon's high-level API (par_iter() is much easier to use than pthread_create()). I don't know if they actually used Rayon, or for that matter if the GNU people did parallel sorting in their sort(1), but it's just one example of the kind of thing that is much easier in Rust-with-dependencies than in C-without-dependencies.
[1]: https://docs.rs/rayon/latest/rayon/slice/trait.ParallelSl...
Posted Feb 12, 2025 20:53 UTC (Wed)
by roc (subscriber, #30627)
[Link]
For this reason and others, the performance ceiling for single-threaded Rust code is potentially quite a bit higher than for similar C or C++ code. A lot more compiler work is needed though.
Posted Feb 25, 2025 20:10 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Case in point: https://trifectatech.org/blog/zlib-rs-is-faster-than-c/ - zlib port into Rust is outperforming C.
Posted Feb 12, 2025 19:49 UTC (Wed)
by excors (subscriber, #95769)
[Link] (2 responses)
With LANG=C.UTF-8, coreutils spends most of its time in strcoll_l, and it sorts by what I presume is some Unicode collation algorithm.
As far as I can see, uutils has no locale support. It aborts if the input is not valid UTF-8 ("sort: invalid utf-8 sequence of 1 bytes from index 0"). It simply sorts by byte values (equivalent to sorting by codepoint), regardless of LANG.
So in this case it's only faster because it doesn't implement Unicode collation.
Posted Feb 12, 2025 20:08 UTC (Wed)
by excors (subscriber, #95769)
[Link] (1 responses)
Posted Feb 18, 2025 3:10 UTC (Tue)
by ehiggs (subscriber, #90713)
[Link]
Regardless it should do Unicode canonicalization or it will miss sort depending on how different runes are composed. This is fine for diacritic free languages like English but as soon as you get some diacritics then LANG=C's naïve handling of text breaks in my experience.
Posted Feb 12, 2025 19:32 UTC (Wed)
by jmalcolm (subscriber, #8876)
[Link] (56 responses)
The author states that GNU utils already have sufficient security but also points out that they continue to evolve. The fact remains that it is so easy to accidentally write insecure code in C is still an issue. Rust is not perfect but tends to result in more secure code by default. I am sure we have all seen the "70% of vulnerabilities stem from memory allocation" stories from Microsoft, Google, and others.
Writing code in C that takes advantage of multiple cores is also tricky and makes the security problem worse. Rust again makes this a lot easier and offers greater assurance that code that builds is also correct. For this reason alone, I would expect performance to be quite a bit better for many kinds of tools.
Pulling from so many crates certainly creates packaging challenges but it also distributes the innovation. Tools in Rust are going to benefit from improvements in the overall ecosystem. Tools in C tend to be more fully stand-alone and only improve when you improve them.
Cross-platform consistency also appears to be a strength of the Rust ecosystem. The article mentions that it is easier to implement cross-platform behaviour (even complex behaivour). It has been my experience that the behaviour itself is more consistent as well.
My biggest concern is that these kinds of low-level tools in Rust complicate the bootstrap problem. In order to build these tools from source, I need to have built Rust first which also means building LLVM. I also need to have built any of the crates that they depend on and of course other parts of the toolchain like Cargo. And of course Rust and LLVM are not even available on some of the systems that you might want utilities like this to be available for. GCC certainly wins there. That said, if GCC gets its own Rust front-end, this problem will go away. Perhaps this addresses most of my point here actually as, if you can build GCC, I guess you can build the Rust front-end too.
Not that the bootstrap situation is perfect in C GNU land. The author of Chimera Linux explains his choice of using the BSD userland in his distro instead of GNU largely because of the chicken-and-egg nature of things like the GNU coreutils and util-linux.
One thing that was not touched on is executable size. I wonder how the Rust and C versions compare.
Posted Feb 13, 2025 8:22 UTC (Thu)
by joib (subscriber, #8541)
[Link] (55 responses)
There is some overhead due to the static linking Rust does (for Rust dependencies), but OTOH uutils uses (by default, or only optionally, not sure?) the "busybox trick" of generating only one binary, and all the /usr/bin/foo utilities are hardlinks to that binary, and the utility checks argv[0] to determine which functionality to invoke.
Posted Feb 21, 2025 9:40 UTC (Fri)
by ras (subscriber, #33059)
[Link] (54 responses)
On my debian laptop, which doesn't have a lot of things installed, there are over 2,200 programs dynamically linked to libc.
Rust's monomorphization has lots of benefits, but it would be nice if you could draw a line and say "not here, at this boundary conventional linker is all you need". It wouldn't just make dynamic linking easier, it would speed up compile times. It's not as if the language doesn't support it in principle with things dyn, but in practice the language positively begs you to cross the boundary at every opportunity.
Posted Feb 21, 2025 10:45 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (53 responses)
The issue right now is that the psABIs aren't that great for Rust and Rust-like languages; the crABI experiment is aiming to come up with a stable ABI that is suitable for expressing the things that you lose by saying "not here", such as Vec<u32>, since Vec is monomorphized over its type argument.
Posted Feb 21, 2025 11:33 UTC (Fri)
by ras (subscriber, #33059)
[Link] (52 responses)
That's a good illustration. You want the benefits of the type checking that monomorphization provides, but without it effecting the code generated. You can write that sort of code in Rust, but as Vec<u32> demonstrates it would be near impossible to write it without the compiler telling you "hey, you can't do that here" because type checking is usually bundled with implied monomorphization.
Posted Feb 21, 2025 12:11 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (51 responses)
crABI's role is to specify layouts such that knowing that Vec is two usize and either a thin or fat pointer in size depending on T means that you know how Vec<T> is laid out in memory for any known T, at the expense of preventing the compiler from applying optimizations that come from different layout.
For Vec, crABI is unlikely to give up much performance; it's just not big enough that completely removing 2 or 3 fields would make a difference. But it's possible that for other data types, this optimization will matter; hence crABI will be opt-in for quite a long time (although I can see it becoming the default for all items exported from a crate, eventually).
Posted Feb 21, 2025 22:43 UTC (Fri)
by ras (subscriber, #33059)
[Link] (50 responses)
I'm not an expert, but I'm guessing there are 3 things Vec has to know. Yes the size of an element and the capacity are two. The current length doesn't count because it's not encoded in the type. Neither of those two are particularly problematic.
The 3rd is one you didn't mention: drop. That is more difficult. C++ would use a vtable, which alters the binary format of the element. The alternative solution is embed a pointer to the items drop in the Vec along with the capacity and element length. I don't know Rust well enough to know if it can do that. Well, I'm sure it can with some helper macros but's that an admission the language doesn't support it well.
> For Vec, crABI is unlikely to give up much performance
Actually, the big speed gains from monomorphization come from small types. Consider Vec.get() for Vec<u32>. If get() doesn't know the length it's probably going to be implemented using memcpy(), but memcpy() is orders of magnitude slower than "mov %rax,nnn(%rbp,%rdx)". But if the u32 was a much bigger type the overhead imposed by implementing get() as a function that uses memcpy() rather than inline becomes tolerable.
That hints to the solution you see in C and C++: you need a mixture of monomorphization and shared libraries. C and C++ can do that with inline. For that to work you need separate .h, .c and .o (compiler output). Or perhaps the required information gets folded into the .o. In any case, the compiler gets two sorts of information from the library: the compiled code which can potentially be put in a dynamic library, and the parts the library author thinks need to be monomorphized like for speed, like Vec.get().
Rust has effectively dropped that separation. No one uses binary libraries because it's too hard . The result is everything is monomorphized. But is reality, in a large library the bits that need to be monomorphized for speed are a very small percentage of the code. Given most Rust code a typical Rust program uses comes from these large libraries (interestingly the kernel will be a notable exception), this blows up the complied time of most small programs enormously. It also thawrts the current Linux distributions practice of fixing security holes in libraries by just shipping a new version of the .so. Now they have to ship every program that depends on it. As I said, on my laptop that means every time libc.so has a bug, I'd have to reinstall over 2,000 programs.
I consider this a major wart in the language. But as I said, it likely won't effect the kernel.
Posted Feb 22, 2025 10:16 UTC (Sat)
by farnz (subscriber, #17727)
[Link] (49 responses)
But note that the current length, size of a pointer to an element (since Vec does not store elements directly, but rather a pointer to a contiguous list of elements), and capacity are all part of Vec itself, and right now in Rust, the memory layout of the 3 or 4 items that make up a Vec (capacity, length, pointer to contiguous array of T, optional pointer to additional data for T like T's vtable) is arbitrary for each instantiation of Vec in each compilation unit.
You appear to be aiming for something slightly different to the goals of crABI and other stable Rust ABI efforts; you're talking about removing monomorphization completely, so that the compile unit for Vec can output code for all of Vec's functions, and rely on runtime polymorphic behaviour to get the correct implementation, because that's what C++ does. Instead, though, the Rust stable ABI efforts (including crABI) try to come up with ways to guarantee that the visible behaviour of two separate instantiations of Vec::<T>::push have the same visible ABI, and thus it becomes possible to rely on just choosing one instantiation of each. Then, for functions that are externally polymorphic, you'd use the trick used by File::open today, to give yourself a tiny generic shim (correct by inspection) that calls into a much larger chunk of monomorphic code.
Posted Feb 22, 2025 18:30 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
In the C++ world, this is called PIMPL (pointer to implementation).
Posted Feb 23, 2025 1:47 UTC (Sun)
by ras (subscriber, #33059)
[Link] (47 responses)
As understand it, Vec.drop() has to call the inner types drop(). There is only two ways it can know what to call. If you don't know at compile time the only option is put a function pointer to the drop() function somewhere. If all you have is traditional .so you're linking to, this is your only option. OO puts the pointer in the inner type. C++ does this via a vtable stored in each instance of the inner type. Alternatively you could store a point to T.drop() it in each Vec<T> instance, which you can do because the code that instantiated the Vec<T> object.
If you know the type of T compile time, monomorphization can create a special version of Drop<T> for Vec<T>. You need the source of Vec<T>.drop() to do that, but there are many ways of skinning that cat.
> you're talking about removing monomorphization completely,
No, I'm not. I'm saying there are various ways of doing it. Rust does it by recompiling everything, all the time, and as a consequence doesn't support pre-compiled libraries. That's an extreme. Both C and C++ allow programmer complete control over what is monomorphized, and what isn't.
Backtracking for a second, the requirement to avoid a function pointer is the compiler have access to the source for Vec<T>.drop(). That's all you need. Rust achieves that by giving the compiler access to the entire source for Vec and recompiles it every time for every Vec<T>, but as I said that's extreme. C and C++ achieve it by distributing two things: the compiled output (.a or .so), and the .h. The .h contains the source code for Vec<T>.drop() using inline, or templates or whatever. This is not "removing monomorphization completely". This is giving control of what is monomorphized and what is not to the programmer.
Rust effectively does not give you that choice, it monomorphizes all the time. That is the root cause it's compile time issues, and it's inability to support dynamic linking.
Posted Feb 23, 2025 12:55 UTC (Sun)
by farnz (subscriber, #17727)
[Link] (46 responses)
And once you start using C++20 modules, the distinction you're making between compiler output and types of compiler input goes away, too - the compiler does basically the same thing as Rust's compiler does.
Unlike C++, C does completely avoid this, by having no language-level support for generics. But if you use Rust without generics, you also don't get any monomorphization - monomorphization in Rust only takes place when you (a) have a type parameter, and (b) the code's behaviour changes when the type parameter's value changes. This is the same as C++, as far as I can see, bar the fact that C++ makes you do more manual work to use monomorphization correctly, so people are more likely to hand-write a buggy reimplementation of a generic than to share source.
Posted Feb 24, 2025 8:06 UTC (Mon)
by ras (subscriber, #33059)
[Link] (45 responses)
It is the same. What is not the same is C++ (and C) splits the source into .h and .cpp, when you compile against a cpp library you recompile the .h and link with the pre-compiled .cpp files. It's so simple everyone does it that way, and it yields the two benefits I mentioned - fast compile times, and the ability to fix a library by shipping a single shared library rather than all the binaries that depend on it.
In Rust, if doing that is possible it must be very hard, because one one does it that way.
Posted Feb 24, 2025 10:35 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (42 responses)
C is different precisely because it doesn't have compile-time generics, but instead requires cut-and-paste programming, with associated bugs where you notice an issue with (e.g.) StringVector, but don't fix it for your DoubleVector as well. If you do make use of the macro system to get generics, then you have the same problem.
And C++ build times aren't fast compared to Rust; that's not a benefit of the C++ model in my experience, where it's often slower than Rust at development builds (since Rust does a good job of incremental compilation, splitting the files into codegen units in ways a human wouldn't), and no faster at release builds (where LLVM's optimizer dominates in both cases).
Posted Feb 24, 2025 13:05 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (41 responses)
While they don't *require* that, any PIMPL-like mechanisms or desire to hide what modules are used to implement some functionality will still end up with splitting sources between interface and implementation. Because module interfaces need to be provided to consumers, any desire to keep things from them will require using separate implementation units. Compilers may be able to *help* with this by only updating BMI files as needed, but this will require behavior like ninja's `restat = 1` which (AFAIK) `make` completely lacks to not recompile consumers anyways.
Posted Feb 24, 2025 13:29 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (40 responses)
Posted Feb 24, 2025 14:26 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Note that Windows is still only exposing a C ABI, so the fact that it is Rust, C, C++, or Fortran behind the scenes isn't really visible to consumers.
Posted Feb 24, 2025 17:03 UTC (Mon)
by viro (subscriber, #7872)
[Link] (38 responses)
Posted Feb 24, 2025 17:06 UTC (Mon)
by viro (subscriber, #7872)
[Link]
Posted Feb 24, 2025 17:22 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (3 responses)
This is distinct from splitting the implementation up inside a single C++ module; having multiple module units, one for the exported interface and many for the internal implementation, makes a lot of sense, but having two separate C++ modules, one of which exports an unimplemented interface, and the other of which exports an implementation of that interface, is a mess, since it means I have to make sure that the two separate modules are kept in sync manually.
Why create that extra workload when I can have a single module with an internal module partition such that it's very obvious when I change the interface without changing the implementation to match, and where I have one thing to release instead of two?
Posted Feb 25, 2025 0:00 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
If the partition is imported into the interface for any reason, it must be shipped as well.
Posted Feb 25, 2025 10:01 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (1 responses)
Remember that the goal here is one module, nicely structured for ease of maintenance, and thus split across multiple module units, with an internal module partition to make the stuff that's for internal use only invisible from outside the module, rather than multiple modules.
Posted Feb 25, 2025 11:57 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Feb 24, 2025 17:24 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
That's not what he said. He said "extra work". Which in reality usually means "push the work down the road until I get to it". Which also often in reality means "I'll never get to it".
It wouldn't get done in commercial circles either, if secrecy didn't have a (at least nominal) value.
Time pressure usually turns out to be an extremely important consideration.
Cheers,
Posted Feb 24, 2025 17:32 UTC (Mon)
by mb (subscriber, #50428)
[Link] (27 responses)
All Rust installs ship a tool that extracts the public interface of your crate and puts it into a nice html document for review:
This is much better than manually typing in the redundant code for the public interface declarations.
Posted Feb 25, 2025 1:27 UTC (Tue)
by ras (subscriber, #33059)
[Link] (26 responses)
I had said earlier I wanted the ability to say to the compiler "not here, at this boundary conventional linker is all you need". I also said C++ gives you the ability to do that, by splitting stuff into .h and .cpp. @franz said "but that's the old way, boost for example doesn't do that". That's true, but the point is the people who wrote boost made the decision to adopt the way Rust does it. C++'s std makes a different decision. @taladar said C++ compiles are slow. I'm guessing that's because the packages / libraries he is working adopt this newfangled way, and everything gets recompiled all the time. He's blaming C++ for that, but I'd argue that fault lies at least as much with the package authors for making that choice.
@franz then said "oh but it's hard to think about what has to be monomorphized and what isn't, and besides redeclaring everything in .h is verbose and a lot of work". I don't have much sympathy for the first part - I did it all the time when I wrote C++. The second is true, the information in .h is redundant. A modern language shouldn't make you type the same thing twice without good reason.
Those language differences were swirling around in my head when I wrote: "Or perhaps the required information gets folded into the .o". It was a thought bubble. But it's key point illustrated the idea nicely with your "cargo doc" comment. Rust could add something to the language that says "this source is to be exported (made available) to people who want to link against my pre-compiled library", in the same way "cargo doc" exports stuff. That information would be roughly equivalent to what's put in a .h file now. But where would you put it? The thought bubble was place it a section of the .elf object that holds the compiled code. Call it say a ".h" section. Then when someone wants to compile against your library, they give that .o / .so / .a to both the compile phase (which looks for the equivalent of the .h sections) and the link phase (which just wants the compiled code for the non-monomorphized stuff, which - if the programmer has done their job - should be the bulk of it).
The ultimate goal is to allow the programmer to decide what needs to be monomorphized, and what can be pre-compiled. And to have Rust tell the programmer when they've mucked that boundary up. I guess it would get an error message like: "This type / function / macro has to be exported to the .h, because it depends on type T the caller is passing in". Right now Rust programmers don't have that option, and that leads to the trade-offs I mentioned.
Posted Feb 25, 2025 3:50 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (24 responses)
Posted Feb 25, 2025 8:06 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (1 responses)
But if you had stuff that was specifically meant to be a library, why can't you declare "I want to monomorphise these Vec<T>s". Any others are assumed to be internal and might generate a warning to that effect, but they're not exported.
And then you add rules about how the external interface is laid out, so any Rust compiler is guaranteed to create the same export interface. Again, if the programmer wants to lay it out differently, easy enough, they can declare an over-ride.
And then lastly, the .o or whatever the Rust equivalent is, contains these declarations to enable a compiler of another program to pull them in and create the correct linkings.
Okay, it's more work having to declare your interface, but I guess you could pull the same soname tricks as C - extending your interface and exports is okay, but changing it triggers a soname bump.
Cheers,
Posted Feb 25, 2025 10:43 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
There are, currently, two reasons for Rust to not have a stable ABI, both being worked on by experts in the field (often overlapping with people solving this problem for Swift and for C++ modules:
There is, however, serious work going into a #[export] style of ABI marker that allows you to mark the bits (or an entire crate) as intended to have a stable ABI, and errors if the compiler can't support that. This will, inevitably, be a restricted subset of the full capabilities of Rust (since macros, generics, and other forms of compile-time code creation can't be supported in an exported ABI), but it's being actively thought about as a research project with a goal of allowing as much code as possible to be dynamically linked while not sacrificing any of the safety promises that Rust makes today using static linking.
Posted Feb 25, 2025 21:29 UTC (Tue)
by ras (subscriber, #33059)
[Link] (21 responses)
I don't get the problem. libc.so.X.Y already handles versioning pretty well.
Putting the .h's in the .elf does solve one problem that bites me on occasion - the .h's don't match the .so I'm linking against. It would be nice to see that nit disappear.
Posted Feb 26, 2025 9:19 UTC (Wed)
by taladar (subscriber, #68407)
[Link] (13 responses)
Posted Feb 26, 2025 11:36 UTC (Wed)
by ras (subscriber, #33059)
[Link] (12 responses)
I expect it would be the same story as C or C++. It has the same traps - don't expect an inline function (or template in C++'s case) in a .h to be effected by distributing a new .so. Despite that limitation shipping updated .so's to fix security problems happens all the time. The rule is always new stuff can be added, but existing stuff can't be changed. It would be the same deal with Rust, but would cover the ".h" section too, meaning you can add new exported types of monomorphized functions, but not changed existing ones.
Putting the .h section in the .so brings one advantage. There is no way for a C program to know if the .h it is compiled against matches the one the .so was compiled against. But a Rust program compiled against a .so could check the types in the .h section match the ones it was compiled with, and reject it if they aren't.
Posted Feb 26, 2025 12:26 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (11 responses)
For example, if you go deep into how Vec::shrink_to_fit is implemented internally, you find that you have a set of tiny inline functions that guarantee that an operation is safe that leads down to a monomorphic unsafe shrink_unchecked function that actually does the shrinking.
Because these are all shipped together, it's OK to rearrange where the various checks live; it would be acceptable to move a check out of shrink_unchecked into its callers, for example. But, in the example you describe, you've separated the callers (which are inlined into your binary) from the main body of code (in the shared object), and now we have a problem with updating the shared object; if you move a check from the main body into the callers, you now must know somehow that the callers are out-of-date and need recompiling before you can update the shared object safely.
C and C++ implementations handle this by saying that you must just know that your change (and a security fix is a change to existing stuff, breaking your rule that "new stuff can be added, but existing stuff can't be changed") is one that needs a recompile of dependents, and it's on you to get this right else you face UB for your mistakes. Rust is trying to build a world where you only face UB if you explicitly indicate to the compiler that you know that UB's a risk here, not one where a "trivial" cp new/libfoo.so.1 /usr/lib/libfoo.so.1 can create UB.
Posted Feb 26, 2025 13:13 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (10 responses)
But if you've explicitly declared an interface, surely that means rearranging the checks across the interface is unsafe in and of itself, so the compiler won't do it ...
Cheers,
Posted Feb 26, 2025 14:15 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (9 responses)
The compiler could stop you changing shrink_to_fit quite easily, because it's an external interface, but it uses a RawVec<T, A> as an implementation detail, which uses a heavily unsafe RawVecInner<A> as a monomorphic implementation detail. The current implementation of Vec::shrink_to_fit checks to see if the length of greater than the capacity, and if it is, calls the inline function RawVec::shrink_to_fit(self.buf, length). In turn, RawVec::shrink_to_fit simply calls the inline function RawVecInner::shrink_to_fit(self.inner, cap, T::LAYOUT) (which is a manual monomorphization so that RawVecInner is only generic over the allocator chosen, not the type in the vector). Following that, RawVecInner::shrink_to_fit arranges to panic if it can't shrink, and calls the inline function RawVecInner::shrink(&mut self, cap, layout). This then panics if you're trying to grow via a call to shrink, then calls the unsafe function RawVecInner::shrink_unchecked.
There's a lot of layers of inline function here, each doing one thing well and calling the next layer. But it would not be unreasonable to change things so that RawVecInner::shrink_unchecked does the capacity check that's currently in RawVecInner::shrink, and then have a later release move the capacity check back to RawVecInner::shrink; the reason they're split the way they are today is that LLVM's optimizer is capable of collapsing all of the checks in the inline functions into a single check-and-branch, but not of optimizing RawVecInner::shrink_unchecked on the assumption that the check will pass, and doing all of this means that LLVM correctly optimizes all the inline functions down to a single check-and-branch-to-cold-path, followed by the happy path code if all checks pass.
And note that the reason that this is split into so many tiny inline functions is that there's other callers in Vec that call different sequences of inline functions - rather than duplicate checks, they've been split into other functions so that you can call at the "right" point after your function-specific checks.
But, going back to the "compiler shouldn't do it"; why should it know that moving a check in one direction inside RawVecInner (which is an implementation detail) is not OK, but moving it in the other direction is OK? For this particular call chain, only RawVecInner::shrink_unchecked is going to be in the shared object, because the remaining layers (which are critical to the safety of this specific operation) are inlined.
Posted Feb 26, 2025 15:45 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (8 responses)
Hmmm ...
That is an edge case, but equally, you do want the compiler to catch it, and I can see why it wouldn't ... but if you're building a library I find it hard to see why you the programmer would want to do it - surely you'd either have both sides of the interface in a single crate, or you're explicitly moving stuff between a library and an application ... not good ...
Cheers,
Posted Feb 26, 2025 16:32 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (4 responses)
No; I'm saying that if the compiler doesn't even know that this is an interface boundary, why would it bother detecting that you've moved code across the boundary in a fashion that's safe when statically linked, but not when dynamically linked?
Put concretely, in the private module raw_vec.rs (none of which is exposed as an interface boundary), I move a check from shrink_unchecked to shrink; how is the compiler supposed to know that this is not a safe movement to make, given that shrink is the only caller of shrink_unchecked? Further, how it is supposed to know that moving a check from shrink to shrink_unchecked is safe? And, just to make it lovely and hard, how is it supposed to distinguish "this check is safe to move freely" from "this check must not move"?
And note that "checks" and "security fixes" look exactly the same to the compiler; some code has changed. How is the compiler supposed to distinguish a "good" change from a "bad" change?
Posted Feb 26, 2025 21:43 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (3 responses)
Because if the whole aim of this is to create a dynamic library, the compiler NEEDS to know this is an interface boundary, no?
Cheers,
Posted Feb 27, 2025 10:44 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (2 responses)
Note that when making this judgement call, it can't just look at things like "is this moving a check across an internal boundary", since some moves across an internal boundary are safe, nor can you condition it on removing a check from inside the boundary (since I may remove an internal check that is guaranteed to be true since all the inline functions that can call this have always done an equivalent check, and I'm no longer expecting more inline functions without the check).
Posted Feb 27, 2025 14:07 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
If an external application cannot see the boundary, then it's not a boundary! So you'd need to include the definition of all the Ts in Vec<T> you wanted to export, but the idea is that the crate presents a frozen interface to the outside world, and what goes on inside the crate is none of the caller's business. So internal boundaries aren't boundaries.
Cheers,
Posted Feb 27, 2025 14:12 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
To get the sort of boundary you're describing, we do static linking and carefully hand-crafted interfaces for plugins. That's the state of play today, for everything from assembly through C to Agda and Idrs; the goal, however, is to dynamically link, which means that we need to go deeper. And then we have a problem, because the moment you go deeper, your boundaries stop applying, thanks to inlining.
Posted Feb 26, 2025 18:04 UTC (Wed)
by excors (subscriber, #95769)
[Link] (2 responses)
`RawVecInner<A>::shrink_to_fit` could be an ABI boundary, because that doesn't depend on `T` (and we'll ignore `A`), but it's currently not an API boundary. It can't be made into a public API because its safety depends on non-trivial preconditions (like being told the correct alignment of `T`) and that'd be terrible API design - preconditions should be as tightly scoped as possible, within a function or module or crate. So you'd have to invent a new category of interface boundary, which is both an internal API and an ABI, with stability guarantees (including for the non-machine-checkable safety preconditions) and with tooling to help you fulfil those guarantees, which sounds really hard.
Posted Feb 26, 2025 22:46 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
Like putting the equivalent of a C .h in the crate?
But I would have thought if the compiler can prove the preconditions as part of a monolithic compilation, surely it must be able to encode them in some sort of .h interface in a library crate?
Of course, if you get two libraries calling each other, then the compiler might have to inject glue code to rearrange the structures passed bwtween the two :-)
Cheers,
Posted Feb 27, 2025 12:43 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
The challenge is that we're talking about separating the unsafe block (in an inline function) from the unsafe fn it calls (in the shared object); this means that the human not only has to consider the unsafe code as it stands today, but all possible future and past variants on the unsafe code, otherwise Rust's safety promise is not upheld.
That's clearly an intractable problem; the question is about reducing it down to a tractable problem. There's three basic routes to make it tractable:
There's room to be sophisticated with symbol versioning in all cases; for example, you can have a human assert that this version of the unsafe fn is compatible with the inlined callers from older versions (thus allowing a swap of a shared object), or in case 3 you can use it to allow new inlined callers to use a new shared object, while allowing the existing ones to use either old or new shared objects.
In all cases, though, the trouble is preventing the human proofs of correctness being invalidated by creating new combinations of inline functions and out-of-line unsafe code that weren't present in any source version; you want the combinations to be ones that a human has approved.
Posted Feb 26, 2025 10:53 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
What C has that Rust doesn't is that it's fairly trivial to take a C library, build it into a .so, and have it work as long as upstream doesn't make a silently breaking change (which can result in UB, rather than a failure); it's also fairly simple to patch the build system so that the .so is versioned downstream of the library authors, so that they are ignorant of the use as a shared library. This is being worked on for Rust, but the goal in Rust is to ensure that any breaking changes upstream result in a failure to dynamically link, rather than a risk of UB.
Posted Feb 26, 2025 21:14 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
This is fine for static linking, but you don't generally want to end up with 15 versions of the same shared library with a slightly different patch version. So you ideally should be able to control the versions so that distro-provided libraries are used as much as possible, overriding Cargo's resolution mechanism. Ideally, making sure that you get CVE fixes.
Doing it properly is not trivial.
Posted Feb 26, 2025 21:19 UTC (Wed)
by mb (subscriber, #50428)
[Link] (4 responses)
Dependency locks are ignored.
>slightly different versions of libraries
No. It can include several *incompatible* versions of the libraries with a different major semantic version.
Posted Feb 26, 2025 21:25 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Yes, that's what I mean. One app can lock somelibrary#1.1.123, and another one at somelibrary#1.1.124 If this is packaged naïvely, you'll end up with two shared objects for `somelibrary`.
Posted Feb 27, 2025 5:27 UTC (Thu)
by mb (subscriber, #50428)
[Link] (2 responses)
As 1.1.124 and 1.1.123 are semantically compatible versions you can just upgrade all packages to 1.1.124. And it's also likely that it would work with 1.1.123, too.
This is really not different at all from C library dependencies with backward compatible versions.
Posted Feb 27, 2025 18:41 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
And yep, it's strictly better than C. It's just not a trivial task...
Posted Feb 27, 2025 19:11 UTC (Thu)
by mb (subscriber, #50428)
[Link]
There's no need to use a lock file or to use an online crates forge.
I don't really get it why this would be a nontrivial task.
>I dislike just ignoring the
Well, you can either use it or ignore it.
Posted Feb 25, 2025 6:48 UTC (Tue)
by mb (subscriber, #50428)
[Link]
Put it into a my-public-interface crate and generate the docs for it? I don't see the problem.
Posted Feb 24, 2025 17:39 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (3 responses)
Posted Feb 25, 2025 11:16 UTC (Tue)
by taladar (subscriber, #68407)
[Link] (2 responses)
Posted Feb 25, 2025 11:24 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
Posted Feb 25, 2025 21:51 UTC (Tue)
by ras (subscriber, #33059)
[Link]
Or "it's faster because we can optimise".
My favourite counter example to this is LVM / DM vs ZFS, possibly because I'm a recent user of ZFS. LVM / DM / traditional file system give you similar a outcome to ZFS, albeit with a more clunky interface because "some assembly is required". The zfs CLI is nice. However by every other metric I can think of LVM / DM / ... stack wins. The stack is faster, the modular code is much easier to understand, they have less bugs (I'm tempted to far less), and you have more ways of doing the same thing.
This is surprising to me. I would have predicted to the monolithic style to win on speed at least, and are easier to extend (which evidence against "it's easier to develop that way").
I guess there is a size when the code base becomes too much for one person. At that point it should become modular, with each module maintained by different people sporting an interface that requires screaming and yelling to change. But by that time it's already a ball of mud, I guess the tooling is a convenient thing to blame for not doing the work required to split it up.
Posted Feb 24, 2025 11:27 UTC (Mon)
by taladar (subscriber, #68407)
[Link]
Posted Feb 24, 2025 11:55 UTC (Mon)
by excors (subscriber, #95769)
[Link]
Not everyone - even disregarding the cases where templates force you to put all your code in the .h file, there are plenty of libraries that choose to put all their code in the .h file so users don't have to fight with C/C++ build systems or package managers. You just download the code and #include it and it works.
E.g. https://github.com/nothings/stb explains "The idea behind single-header file libraries is that they're easy to distribute and deploy because all the code is contained in a single file" and "[these libraries] are only better [than other open source libraries] in that they're easier to integrate, easier to use, and easier to release", and it's pretty popular as a result of that. There's a big list of header-only libraries at https://github.com/p-ranav/awesome-hpp . It's a good way to make your library more attractive to developers, because package management in the C/C++ world is so bad.
Posted Feb 12, 2025 20:41 UTC (Wed)
by da4089 (subscriber, #1195)
[Link] (15 responses)
As stated, these 55 year old tools are remarkably bug free, have very few dependencies, and continue to work as designed. Their maintenance burden is not high.
But instead of putting effort into discovering new better approaches to tools (and operating systems) we instead spend the innovative potential of a generation to get back to where we started, even losing some features in the process.
We did this already through the 90’s and 00’s to extract our toolset from corporate stasis and sharding. Is doing it again to replace C with Rust really the best use of our effort’s potential?
I don’t mean this as a personal attack, nor as any a criticism of the quality and dedication involved. But it feels to me like an inward turning of the collective vision that quite dismays me.
Posted Feb 12, 2025 21:04 UTC (Wed)
by kleptog (subscriber, #1183)
[Link] (4 responses)
They're replicating the results of those 55 years of development in few years, so what "innovative potential" is being lost here? Now you have a sound base to continue building, except every year of effort put in the Rust version is equivalent to multiple years of effort on the C version.
I don't understand this idea that these tools should have minimal dependencies. Why should "sort" implement its own sorting algorithm instead of using the battle-tested UltraSpeedParallelSuperSorting crate that is used by everyone else because it sorts blazingly fast?
We can build great programs because we are standing on the shoulder of powerful libraries. Re-use is almost always better than writing it yourself.
Let's face it: C programmers aim for minimal dependencies because dependencies in C programs are really difficult. In more modern language dependencies are one line in your package configuration and you're done.
Posted Feb 12, 2025 23:56 UTC (Wed)
by dvdeug (guest, #10998)
[Link] (3 responses)
But they're not, for many reasons.
Many programmers have used a cool new library and discovered that it was no longer supported after a few years and wouldn't build with newer versions of other dependencies, and had to replace it. One can recall the left-pad catastrophe, where one author withdrew a 17-line package from npm and thousands of programs stopped building. One can also remember the endless chase of libgtk versions.
You lose control over supported platforms. E.g. the Python-Cryptography package started using Rust, and now no longer works on many of the platforms it used to. Coreutils compiles on HPUX 11 on hppa and ia64. Even if Rust supported those, you're still depending on every dependencies you use to support those. Good luck with your x86 box when your dependency pivots to using Oxide, the programming language that's going to replace Rust.
The bug envelope changes. If sort implements its own sorting algorithm, then if it breaks, you look at the sort code, if necessary checking that it works on the same libc and gcc version. If it uses the UltraSpeedParallelSuperSorting crate, then you have to look at that crate, and any crates it uses. Coreutils supports a variety of C libraries and C compilers (even with the restriction to C99); it's easy to prove that any new bugs are due to your changes. You can say that this crate is used by "everyone else", but you'll end up using a crate that does what you need but not everyone else needs, or using a crate in a way that other people aren't, triggering bugs that nobody else is meeting.
That was benign bugs, too. The Jia Tan attack was made possible by the fact that libxz was linked into ssh. To make it worse, it wasn't even code that ssh used; it was unused code from using another dependency to support systemd. Every dependency adds security risk, and when they're built in this fashion themselves, they bring in more security risks with their dependencies.
If you're writing a KDE program or GNOME program, both come with an array of libraries you can solidly depend on, no problem. A major frontend program can pull in a key dependencies without issue; most people get Gimp or Darktable through their distro or Flatpak anyway. (But note that this idea that dependencies have no cost amplifies the problem. Installing a C library that depends on nothing or a few standard dependencies is low cost. Installing a dependency that depends on a host of other dependencies amplifies most of the problems above.)
But coreutils? Programs that are frequently run as root, are called from shell scripts all over the system, and thus are security critical? Programs that are critical to running on every system? Be it Rust or C, they should carefully consider every dependency.
Posted Feb 13, 2025 9:41 UTC (Thu)
by taladar (subscriber, #68407)
[Link] (1 responses)
Your own internal version is very likely to be orders of magnitude less tested, optimized, maintained,... than even a moderately used third party dependency. It is also less likely to see the same scrutiny from security researchers or the same amount of tracking from security tracking systems like CVE or GHSA.
The number of programmers who know how it works will be lower. When it isn't touched for a while nobody (including the original author) will know how it works or how to change it. You will have more trouble to find people to add cross-platform support because they will need to use exactly your program, not just any one of the programs using it.
Also, what you say about the bug envelope doesn't work at all. If there is a bug in the sorting algorithm you wrote, you have to find it and invest time that will only benefit that one project. If the bug is in a dependency you have a good chance someone else already found it and even if not, your effort to find and fix it benefits everyone using that library.
You look at it from the perspective of comparing a Rust dependency that is maybe 5 years old with some internal version in some 50 year old C tool and there the cross platform support will likely come out on top because cross platform used to be so much more important in the bad old days when nothing was standardized and the hardware and OS landscapes were more fractured in terms of what baseline features you could expect from your hardware or OS but honestly, reusable and shared code is the better choice in the long term.
Posted Feb 21, 2025 2:02 UTC (Fri)
by dvdeug (guest, #10998)
[Link]
If we're talking about coreutils, no, it's not. That's the context of this.
> When it isn't touched for a while nobody (including the original author) will know how it works or how to change it.
Good code is understandable. Not to mention, code that does what you need is sometimes easier to understand than interfacing with a powerful library with many features and options.
> Also, what you say about the bug envelope doesn't work at all. If there is a bug in the sorting algorithm you wrote, you have to find it and invest time that will only benefit that one project. If the bug is in a dependency you have a good chance someone else already found it and even if not, your effort to find and fix it benefits everyone using that library.
If there's anyone else using that library. If everyone else hasn't already moved on to the next big thing. If you didn't pick the library that was better, but never hit critical mass and the maintainer dropped it. If you aren't calling with a set of options that nobody else is using.
There are libraries like libz and libpng where there's little reason not to include. But so many libraries, crates, packages, etc. are created, uploaded, maybe even get a few releases, then get dropped. If you're writing coreutils in Rust and hope to replace the existing coreutils, you need to plan for a 50 year existence, most of which is going to spent doing minor changes on programs that work.
Cross platform is a choice. But Debian won't switch until it supports all release platforms, of which there are nine, and failing to support GNU/Hurd or several other semiactive ports is going to lose you some support. Sell to your audience.
> reusable and shared code is the better choice in the long term.
Making a tight program that runs where you need it to and runs even when other stuff is broken is incredibly useful. We have busybox with 0 dependencies, because coreutils today has too many dependencies in some cases. Reusable code is good, but again Jai Tan got in because ssh was being linked against a library which linked against a library that ssh didn't need. You've never had to deal with an upgrade mess when dealing with an incompatible ABI break at the base of a tree of dependencies. Or having to change a program to work with the new version of the library, because the old version of the library has an entirely different API but isn't getting bug fixes any more.
Again, you're writing a GNOME program, go wild with GNOME libraries. You're writing a program that needs a solid solver for 3-SAT, link in z3. You get a choice between writing 5 lines of code using a hash-map that's fast enough, or linking in a custom data structure that is optimized... if it's code for mass distribution and hopefully a long life, save yourself some headache down the road and write the 5 lines of code.
Posted Feb 15, 2025 5:53 UTC (Sat)
by marcH (subscriber, #57642)
[Link]
> But they're not, for many reasons.
Indeed not.
https://cacm.acm.org/practice/surviving-software-dependen...
To be fair, it is much worse with C/C++ because then you have both the technical _and_ the less technical issues.
Posted Feb 15, 2025 6:06 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (9 responses)
Imagine someone invents some new plumbing material that never dissolves harmful pollutants, is cheaper to manufacture, easy and cheap to work with, does not burst when freezing and lasts forever. But in the daily life of the people using water, it makes absolutely no difference[*]. So, no innovation? I think plumbers and house builders would have a very different opinion.
[*] never having to call a plumber and not having cancer is huge but makes no "obvious" difference. And we can rather rarely tell where we got cancer from.
Posted Feb 15, 2025 12:04 UTC (Sat)
by pizza (subscriber, #46)
[Link] (8 responses)
And the production facilities and distribution are essentially free, because this magical material is produced by self-replicating spherical unicorns that feed on greenhouse gasses.
But even if all these properties were somehow true, it still makes zero economical sense to preemptively rip out and replace the existing plumbing in literally billions of structures.
Posted Feb 16, 2025 19:27 UTC (Sun)
by marcH (subscriber, #57642)
[Link] (6 responses)
This is where software analogies always breaks down: the Economy.
- With software, replacing the existing plumbing in ONE house is barely cheaper than replacing it in ALL houses.
The previous point was not about the economy at all. It was about "innovation". While it's not visible to users at all, there is "behind the scenes" innovation in uutils because it's a production-grade, potentially mass-deployed Rust project. It's admittedly less rocket science and more engineering but the latter matters too.
Posted Feb 16, 2025 22:41 UTC (Sun)
by pizza (subscriber, #46)
[Link] (5 responses)
That would be true if everyone, and every device, ran identical software in lock-step. Heck, even when the software components _are_ identical, they're often (usually!) integrated in different (if not completely bespoke) manners.
In the longer run, integration (and subsequent maintenance) is usually more work than writing the original software. (not unlike plumbing!)
> The previous point was not about the economy at all. It was about "innovation". While it's not visible to users at all, there is "behind the scenes" innovation in uutils because it's a production-grade, potentially mass-deployed Rust project.
"innovation" implies something new, not a 100% re-implementation (and necessarily quirk-for-quirk compatible) of existing (and well-maintained!) code.. Would you still be calling uutils "innovative" if it was written in oh, Ada, Java, Python, or PHP?
Posted Feb 16, 2025 23:23 UTC (Sun)
by marcH (subscriber, #57642)
[Link] (4 responses)
> That would be true if everyone, and every device, ran identical software in lock-step.
This is more and more irrelevant... I just gave a reminder that software marginal costs are negligible, which limits software analogies. How many devices are actually running what software does not matter: marginal software costs are still negligible and the economic aspect of software analogies still fails.
> "innovation" implies something new,...
Obviously.
> ... not a 100% re-implementation (and necessarily quirk-for-quirk compatible) of existing (and well-maintained!) code
That's part of your definition. My definition is "not in mass production yet" and it does not require cars to fly.
> Would you still be calling uutils "innovative" if it was written in oh, Ada, Java, Python, or PHP?
No because:
So I suspect uutils in Java would be non-event.
Here's another analogy: designing and selling an electric car with a brand new, solid state battery design today would be "innovative" because it would help the solid state battery company scale up. That would be innovative even if the car company did nothing innovative besides trusting, partnering with and ultimately helping the innovative battery company doing all the innovative research and even when customers see "only" more range and less fire risk (= the battery equivalent of memory corruption?)
Once a few other Rust rewrites like uutils will be in "mass production" stage (e.g.: enabled by default in some distributions), then I agree the next ones won't be as "innovative" - and will get less press coverage. But there are several domains: first graphical application, first kernel driver, etc. Each domain has different engineering challenges requiring different "innovative" solutions.
Posted Feb 17, 2025 14:32 UTC (Mon)
by pizza (subscriber, #46)
[Link] (3 responses)
Just because *you* dismiss it as irrelevant doesn't make it so. The world is far larger than you.
"Marginal software costs" refer to the production of additional identical _copies_. If said copies are not identical, that margin rapidly grows. The computing world is anything but homogeneous.
Or are you seriously arguing that a binary driver written for a specific Windows version will enable that hardware to work on MacOS, all currently-supprorted Linux LTS distributions, and one of countless RTOSes running on the dozen-ish major microcontroller architectures? Heck, even if that driver was provided in source form, it's going to be a ton of work to enable the others -- and it's most likely to be simpler, faster, and cheaper to start over from scratch for each one.
> 2) None of these languages are recent in 2025, they have all been in production for decades none of them needs any sort uutils-like project to help them get battle-hardened and "upgraded" to mass production.
So... the "innovation" here is "Figuring out how to make Rust good enough to compete with stuff that's existed for decades?"
> Here's another analogy: designing and selling an electric car with a brand new, solid state battery design today would be "innovative" because it would help the solid state battery company scale up.
"helping the battery company scale up" is an example of longer-term planning and/or investment, not innovation.
(The battery technology itself, including specific manufacturing processes, may be innovative)
Posted Feb 17, 2025 17:49 UTC (Mon)
by marcH (subscriber, #57642)
[Link] (2 responses)
It's irrelevant to "Is uutils innovation?" which is the very specific disagreement in this subthread. Feel free to discuss software marginal costs or any other very interesting but unrelated question anywhere else.
> The world is far larger than you.
Thanks, that's very useful to know.
> So... the "innovation" here is "Figuring out how to make Rust good enough to compete with stuff that's existed for decades?"
Yes. It's a very important innovation milestone towards getting rid of memory corruption and 70% of vulnerabilities in operating systems, browsers (the other operating system) and more. You're entitled to your definition of "innovation" but mine does not stop outside the lab.
It's just a word definition; let's agree to disagree.
> "helping the battery company scale up" is an example of longer-term planning and/or investment, not innovation.
Not for the very first customer, no. It's most often a partnership and a huge bet.
Posted Feb 17, 2025 17:57 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
Simply because many "innovations" die once they leave the lab.
This is especially true with software which gets created and dies orders of magnitude more than in other fields (darn, and now I'm close to making your economic digression relevant...)
Posted Feb 19, 2025 7:00 UTC (Wed)
by marcH (subscriber, #57642)
[Link]
> Yes. It's a very important innovation milestone towards getting rid of memory corruption and 70% of vulnerabilities in ...
From https://lwn.net/Articles/1008721/
> Ultimately, Poettering said that he is "happy to play ball" but does not want systemd to be the ones to solve the problems that need to be solved in order to use Rust.
That's the boring, last part of innovation: making it "mainstream". Rust is getting very close but not quite there yet. If ever?
Posted Feb 20, 2025 7:07 UTC (Thu)
by ssmith32 (subscriber, #72404)
[Link]
Yeah, production/packaging and distribution is essentially free for the software in question ("small" command line tools).
You never have the "lone maintainer" problem for the production and distribution of large pipes and fittings, at least produced at the scales we're discussing, because you simply _can't_.
It may not feel that way for the lone maintainer, for sure, but an objective comparison of the costs demonstrates otherwise.
Posted Feb 12, 2025 23:05 UTC (Wed)
by Nikratio (subscriber, #71966)
[Link] (4 responses)
Posted Feb 13, 2025 23:17 UTC (Thu)
by intgr (subscriber, #39733)
[Link] (3 responses)
Posted Feb 15, 2025 6:25 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (2 responses)
Is zsh "compatible enough" with POSIX? Does not have to be 100% for the above. If not, is there some other, fancy new shell that is significantly better than bash and that I should try? I'd like not to lose too much readline "compatibility" either. readline key strokes are close to universal so why should I learn new, different ones? Except for new features of course. It's painful enough to switch between readline and vi commands already.
Posted Feb 15, 2025 23:33 UTC (Sat)
by intgr (subscriber, #39733)
[Link]
My understanding is yes, zsh is a proper POSIX shell and even implements some bashisms.
Posted Feb 17, 2025 13:50 UTC (Mon)
by taladar (subscriber, #68407)
[Link]
There are also some subtleties that make it unwise to test if a command in a bash script behaves exactly as desired in zsh instead of bash, particularly around sub shells and undefined variables.
Posted Feb 13, 2025 9:10 UTC (Thu)
by adobriyan (subscriber, #30858)
[Link] (4 responses)
aaa.c:2: error: ...
it will be very much appreciated.
And "sort -I" too for in-place file sorting, too...
Posted Feb 13, 2025 9:19 UTC (Thu)
by alx.manpages (subscriber, #145117)
[Link]
intro.2
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.g...>
Something similar could be written to implement your sorterrors program. It's such a niche thing, and different programs format errors slightly differently, that I don't think it would make much sense to add a sort(1) option. But it would make sense to write your own script if it's useful to you.
Posted Feb 13, 2025 9:21 UTC (Thu)
by alx.manpages (subscriber, #145117)
[Link]
<https://manpages.debian.org/bookworm/moreutils/sponge.1.e...>
Posted Feb 13, 2025 15:05 UTC (Thu)
by khim (subscriber, #9252)
[Link] (1 responses)
How would that flag differ from already existing
Posted Feb 13, 2025 16:22 UTC (Thu)
by adobriyan (subscriber, #30858)
[Link]
Posted Feb 13, 2025 20:16 UTC (Thu)
by antiphase (subscriber, #111993)
[Link] (9 responses)
Posted Feb 14, 2025 15:04 UTC (Fri)
by jzb (editor, #7867)
[Link] (8 responses)
IIRC "performant" was the term used by the speaker. Sometimes we do paraphrase and I admit there are some "words" I avoid using if at all possible (e.g., "impactful") that I also have a bias against, but, generally, I prefer to stick close to the way a speaker talks. This is in no small part because there've been times when I was the person speaking and being quoted, and I want as much fidelity as possible when what I'm saying is reported on. Also, I've traditionally seen "performant" used this way rather than simply "performs a task as standard"—and at least one online dictionary agrees with the usage. So I'm curious where it's defined otherwise. I'd look in my OED, but all the volumes are currently in boxes. I hope to solve that by next week... :)
Posted Feb 20, 2025 7:15 UTC (Thu)
by ssmith32 (subscriber, #72404)
[Link] (7 responses)
Posted Feb 20, 2025 7:58 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
The concise? The standard? Or the complete?
I'm not sure how big the complete is, but I'd love a copy. Only snag is, it would probably take up a bookcase all by itself :-)
Cheers,
Posted Feb 20, 2025 10:31 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Posted Feb 20, 2025 13:32 UTC (Thu)
by jzb (editor, #7867)
[Link]
Posted Feb 20, 2025 13:29 UTC (Thu)
by jzb (editor, #7867)
[Link]
Posted Feb 20, 2025 18:02 UTC (Thu)
by sfeam (subscriber, #2841)
[Link] (2 responses)
Posted Feb 21, 2025 8:50 UTC (Fri)
by geert (subscriber, #98403)
[Link]
Posted Mar 4, 2025 11:16 UTC (Tue)
by sammythesnake (guest, #17693)
[Link]
I passed the same milestone myself some time ago, too :-/
Posted Feb 14, 2025 2:01 UTC (Fri)
by gnu (guest, #65)
[Link] (3 responses)
Posted Feb 14, 2025 2:52 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link] (2 responses)
Posted Feb 14, 2025 11:23 UTC (Fri)
by gnu (guest, #65)
[Link] (1 responses)
Posted Feb 14, 2025 17:54 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link]
Posted Feb 17, 2025 5:43 UTC (Mon)
by jsakkine (subscriber, #80603)
[Link]
Posted Feb 20, 2025 20:00 UTC (Thu)
by Alterego (guest, #55989)
[Link] (1 responses)
The GPL "obligation" to provide source code is a very good thing, and IMO a key of Linux success.
Posted Feb 20, 2025 20:15 UTC (Thu)
by mjg59 (subscriber, #23239)
[Link]
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
Wol
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
A shell in Rust could be really cool
In bioinformatics there are many pipelines where the units are processes/scripts/programs. They might be sequence aligners, statistics packages, clustering programs and a whole lot more. Typically these do not come as libraries attached to a language but as separate programs. These days they are usually orchestrated with tools such as Nextflow, Snakemake, Cromwell and others. There is still a fair bit of use for shell scripting when manipulating the outputs (which are usually big dataframes). It gets tiresome very quickly to write a python script with parameters that loads a large dataframe into memory when all you want to do is some stream data processing. The downside is the marshalling/unmarshalling of data. I wouldn't mind something like Apache Arrow formatted data that I could compose with unix pipes in a shell-like environment. It might exist, I haven't checked as there is more than just one obstacle.
A shell in Rust could be really cool
A shell in Rust could be really cool
https://github.com/containers/bootc/blob/main/tests/boote...
- https://github.com/containers/bootc/pull/911/commits/6f80...
- https://github.com/containers/bootc/pull/937/commits/28e8...
Sounds like a good project; itches scratched and flowers blooming. uniq -c is one of the most irritating commands I know, as the first (count) field is right-justified without any option available to avoid this white-space padding. It is quite puzzling that Richard Stallman let this program loose on the world as it violates usual Unix well-behavedness of textual interfaces. I will make my way over to uutils to see if it has righted this wrong by providing such an option or is open to such. Checking the manual page I see A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars. which again makes me shudder as it seems to indicate failure to respect empty fields in tab-separated data.
fix uniq -c
fix uniq -c
It would equally be possible to add new options that add new behaviour, such as for example outputting well-formatted output, that is, usable for further consumption by adhering to a standard table format such as tab-separated data. I'll enquire.
fix uniq -c
fix uniq -c
fix uniq -c
> [[:blank:]]*[^[:blank:]]*
> "%d %s", <number of duplicates>, <line>
fix uniq -c
fix uniq -c
fix uniq -c
While I generally like the push to abolish the old coreutils, this compatibility makes it a huge waste of time on my opinion.
fix uniq -c
fix uniq -c
That depends on the licence of the new project; when Linux embraced, extended and then extinguished proprietary UNIXes, that was a good thing because it moved from a world of gatekeepers to an open source world. When Microsoft attempted to embrace-extend-extinguish the open web with ActiveX, that was a bad thing because it would move us from an open world, to one in which you paid Microsoft as a gatekeeper for technology access.
Embrace-extend-extinguish
fix uniq -c
fix uniq -c
fix uniq -c
• other programs: here it depends on what the other program is & what it expects as input. For example, if you want to process it further via pipes then then awk & bash don't care at all about the right-aligned numbers[1], whereas other programs might. If your goal isn't pipe-processing but e.g. copy-pasting into spreadsheets, then CSV-formatted data might be much better (though that would make processing in awk/bash much harder)
3
[mosu@velvet ~]$ printf "moo\nmoo\ncow\n" | uniq -c | ( while read line ; do set - $line ; echo $1 ; done )
2
1
[mosu@velvet ~]$
fix uniq -c
- let's add an option so that we can have the behaviour that I like.
fix uniq -c
I let a pointless/misdirected rhetorical flourish get the better of me again. I understand the history - and view it as beneficial to strive towards (recasting it from wrong/right) to make outputs (tuples/tables) preferably by default composable or at least add options to do that without assumptions about white-space padding. The table format (mysql dump, spreadsheet, dataframe) is pervasive and super useful. In the right setup shell pipelines can do amazing things with them. Working in those type of table environments has made the various stripes of unix white-space padding and stripping jar with me.
fix uniq -c
fix uniq -c
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
One good feature of cargo vet in this respect is the ability to import trusted audits and maintain a registry of significant auditing entities; this enables a "big" entity who cares (like Google, for example), to publish a list of audits they've done that smaller projects can import.
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
That already exists, separately to cargo vet - it's provided by cargo crev, which aims to crowdsource review.
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
Supply-chain-attack risk
using coreutils in production
> We had some scripts that invoke `nproc` to determine the parallelism to use, and we want to move this into containers. Then we realized that in containers with resource restrictions, `nproc` from coreutils still report number of total cores, but not the amount of CPUs available to it by the cgroup. It turns out that the nproc haven't received meaningful updates since its inception and obviously doesn't know what to do with groups.
using coreutils in production
# pacman -Qo nproc
/usr/bin/nproc is owned by coreutils 9.6-2
# nproc
8
# systemd-run --pipe -p AllowedCPUs=0,1 nproc
Running as unit: run-p1960742-i1961042.service
2
using coreutils in production
Using 15-year old systems as a base to justify today's decisions... *sigh* that's a new low.
using coreutils in production
32
2
using coreutils in production
using coreutils in production
using coreutils in production
It depends on whether you've got bursty work, or saturate your threads.
Correct behaviour under a CPUQuota= setting
using coreutils in production
Reasons for speedup?
> the Rust version performing the test six times faster than GNU's implementation.
Reasons for speedup?
Reasons for speedup?
Reasons for speedup?
Reasons for speedup?
Reasons for speedup?
Reasons for speedup?
Great fit for Rust
Great fit for Rust
Great fit for Rust
You can say "not here, at this boundary conventional linker is all you need" already, by using the psABI (repr(C) and extern "C") instead of the unstable Rust ABI.
Improved dynamic linking ABIs
Improved dynamic linking ABIs
And the issue is that, under the current Rust ABI, the layout of Vec<T> is not specified until after monomorphization - you literally don't know what order the capacity, length, and data fields are in, since without knowing what T is, you don't know whether the data field is one or two pointers in size - and Rust is allowed to split up the two halves of a "fat pointer" (one that's two pointers in size), and to elide any parts of the Vec that aren't used (e.g. eliding length because it's always the same as capacity).
Improved dynamic linking ABIs
Improved dynamic linking ABIs
Drop is fairly trivial to handle, since it's a function call to Vec's code for this particular instance of Vec<T>, and if you have the ability to handle function calls on Vec at all (including push, insert, get, indexing and pop, which also get affected by the type of T), you can handle drop that way, too. You fundamentally can't implement anything useful if your handling of something like Vec doesn't have different function behaviour for many functions, because of this.
Improved dynamic linking ABIs
Improved dynamic linking ABIs
Improved dynamic linking ABIs
Rust allows, as far as I can tell, exactly the same level of control over what's monomorphized and what's not as C++ does. I can find no difference in how Clang treats std::vector to how Rust treats std::vec::Vec, complete with similar behaviour around how std::vector::~vector is treated as compared to std::vec::Vec::drop.
Improved dynamic linking ABIs
Improved dynamic linking ABIs
C++ modules do not split the source like that, and there are many C++ projects (like Boost) which end up putting all the source in the .h file because it's the only way to guarantee that all use cases compile correctly. As a result, a lot of libraries in C++ land are in the same position as Rust, because the important parts of the code are in the .h file, and the precompiled object is tiny. Any bugfix requires a full rebuild of all dependents, as well as a new shared library file.
Improved dynamic linking ABIs
Improved dynamic linking ABIs
I suspect that, in practice, people splitting sources and implementations will be rare in open source C++ modules; you have no reason to hide the implementation from consumers of the interface, since it's all open source, but it is extra work to separate them cleanly. You see similar in Rust, where things like Windows use Rust behind an ABI barrier (and only export the interface), but open source Rust projects tend not to bother.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Yes. The extra work of splitting something into an interface module and an implementation module is significant, and does not reduce the complexity of reasoning about a change, nor the amount of code that has to be reviewed for assessing validity of a change.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Sure, but I'm shipping full source anyway, and I can check for people exporting parts of the internal partition in CI, just as I'd have to have similar checks in place to stop people moving code from the interface module to the implementation module.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Wol
Splitting implementation and interface in C++
cargo doc
It's easy to navigate and includes all details of your public interfaces.
Splitting implementation and interface in C++
cargo doc
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Wol
When you provide T, it's monomorphized for you. The hard case is when I want to provide Foo<T>, and allow you to provide an arbitrary T that I haven't thought about up-front.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Splitting implementation and interface in C++
The problem is that shipping an updated .so in the way C or C++ do it runs the risk of invoking UB from the "safe" subset of Rust, and one of Rust's promises is that invoking UB requires you to use "unsafe Rust". Thus, just copying the C way of doing things isn't acceptable, because it can take a safe program and cause it to invoke UB.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Wol
That's why I chose that particular example; the explicitly declared interface is:
Rearranging across the interface
#[inline]
impl<T, A: Allocator> Vec<T, A> {
pub fn shrink_to_fit(&mut self);
}
Rearranging across the interface
Wol
Rearranging across the interface
Rearranging across the interface
Wol
But then you're getting into a mess around defining what is, and is not, a safe code change inside the ABI boundary. If you do make the internals of a crate (not the exported interface) the ABI boundary, you're now in a position where the compiler has to make a judgement call - "is this change inside the internals of a library a bad change, or a good change?".
Rearranging across the interface
Rearranging across the interface
Wol
No, for performance reasons. We inline parts of our libraries (even in C, where the inlined parts go in the .h file) into their callers because the result of doing so is a massive performance boost from the optimizer - which can do things like reason "hey, len can't be zero here, so I can eliminate the code that handles the empty list case completely".
Rearranging across the interface
Rearranging across the interface
Rearranging across the interface
Wol
The compiler doesn't prove anything for the danger cases; it relies on the human assertion that they've checked that this unsafe block is safe, given the code that they can see today.
Rearranging across the interface
Note that libc handles versioning well because the glibc maintainers do a lot of hard and non-trivial work to have things like compatibility symbols so that you can swap to a later glibc without breakage. You can do a similar level of work in Rust today to get dynamic linking working, and working well.
Versioning of shared objects
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Only the top level application crate lock matters.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
(I almost never use them and I almost always provide them.)
Except that typical C applications simply don't provide a lock information, so you're fully on your own.
Providing lock information is better than providing no lock information.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
https://doc.rust-lang.org/cargo/commands/cargo-install.ht...
You can just use what is --offline available in your distribution.
If you don't want to use it, because you want to use your own packaged dependencies there's only the option to ignore it, right?
Splitting implementation and interface in C++
To express it slightly differently, why is the Linux kernel a single module with multiple internal partitions, rather than separate modules for the VFS interface, MM interface, network interface etc, along with implementation modules that implement each of those interfaces? If the benefits are as big as you're claiming, surely it would make sense to separate out the interfaces and implementations into separate modules, rather than just have separate partitions internally?
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Note, though, that once you have tooling that makes splitting it up easy, the dividing line is rarely as simple as "interfaces" and "implementations". You're more likely to have splits like "VFS interface and implementation", "ext4 interface and implementation" etc.
Splitting implementation and interface in C++
Splitting implementation and interface in C++
Improved dynamic linking ABIs
Improved dynamic linking ABIs
RIIR
RIIR
RIIR
RIIR
RIIR
RIIR
https://medium.com/@sdboyer/so-you-want-to-write-a-packag...
etc.
RIIR
RIIR
RIIR
- The duplication cost is so negligible that a single guy doing some DIY work in his garage for fun - NOT for economical reasons - can (and here: does!) end up replacing the plumbing in all houses across the world.
RIIR
RIIR
1) There would most likely be significant performance regressions (which obviously matter in "core" utils) instead of the progressions observed with Rust.
2) None of these languages are recent in 2025, they have all been in production for decades none of them needs any sort uutils-like project to help them get battle-hardened and "upgraded" to mass production.
RIIR
RIIR
RIIR
RIIR
RIIR
Fish Shell 4 is written in Rust
Fish Shell 4 is written in Rust
Fish Shell 4 is written in Rust
Fish Shell 4 is written in Rust
Fish Shell 4 is written in Rust
sort -E
aaa.c:20: error: ...
bbb.c:10: error:...
sort -E
membarrier.2
umask.2
intro.3
printf.3
id_t.3type
useconds_t.3type
sort -E
sort -E
-V
sort?sort -E
https://github.com/coreutils/gnulib/blob/master/lib/filev...
Writing clarity request
Writing clarity request
Writing clarity request
As much as I like seeing Rust in action (and enjoy a passion project) - _that_ is a treasure worth reporting on 😁.
Writing clarity request
Wol
I just looked at the OED paper version online. The "complete" is 20 volumes, and currently has three "additional" volumes to get up to date.
Paper copy of the full OED
Writing clarity request
Writing clarity request
I have the "Compact Edition" of the full OED on the shelf behind me, two large volumes of tiny print. It came in a custom case with a built-in drawer contain a magnifying glass. Each page contains images of four pages from the full-size version, sort of like the output of psnup -4. Back when I got it the magnifying glass was an unnecessary extra, now these old eyes appreciate the thought. It's still my go-to reference if I'm sitting at my desk.
Writing clarity request
Writing clarity request
Writing clarity request
Great stuff
Great stuff
Great stuff
Great stuff
What a great project
GPL license does matter vs MIT license
GPL license does matter vs MIT license