|
|
Log in / Subscribe / Register

Rewriting the GNU Coreutils in Rust

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 7:38 UTC (Wed) by epa (subscriber, #39769)
Parent article: Rewriting the GNU Coreutils in Rust

Making them faster is nice but I think the main value may be in de-crufting. If you read the coreutils source it's full of error handling and workarounds for things like an interrupted system call that has to be retried. It's a long way from the clean style of K&R's book. Any real-world C code is, of course, but for the GNU utilities the difference seems even greater, perhaps because they are more diligent about handling all possible errors on all platforms. I'd hope that with Rust and its standard library, running on a much narrower set of platforms, a lot of this defensive code isn't needed.


to post comments

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 23:17 UTC (Wed) by Sesse (subscriber, #53779) [Link] (9 responses)

Most syscalls can very well return EINTR also on modern Linux. You may think it is cruft, but you cannot just remove it.

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 23:19 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

You can wrap it in a reusable wrapper.

Rewriting the GNU Coreutils in Rust

Posted Jun 10, 2021 5:34 UTC (Thu) by epa (subscriber, #39769) [Link] (7 responses)

I agree, my point is that hopefully Rust and its standard library handle this kind of thing for you, so you don't need boilerplate in every call site.

Rewriting the GNU Coreutils in Rust

Posted Jun 10, 2021 7:50 UTC (Thu) by Sesse (subscriber, #53779) [Link] (4 responses)

That's assuming all you want to do is restart. Frequently, that's not what you want; the reason this behavior persists is so that you _can_ abort syscalls. (You can turn it off in the signal handler by setting SA_RESTART, that's got nothing to do with the language.)

Say that you have some long-running computation that you want to be able to abort cleanly. Now further assume that your thread is stuck in a syscall; say either waiting for a write to file on NFS, or waiting for data from socket. Let's say a recvfrom(). You trap SIGINT, and in your handler, you set a “please shut down” flag. (You can't do much else in a signal handler!) Now, if this didn't abort your recvfrom() with an EINTR, how would you ever get to actually check that flag and shut down?

Rewriting the GNU Coreutils in Rust

Posted Jun 15, 2021 12:44 UTC (Tue) by epa (subscriber, #39769) [Link] (3 responses)

What you say is quite right in general but I am talking about coreutils specifically. Those are short-lived command line programs with no event loop. (And, I would suggest, any program where the programmer hasn't explicitly decided to allow for signal handling and interruptions. The default should be to retry automatically, simulating a world in which EINTR doesn't exist, but of course the standard library should include an interruptible variant of each operation you can call explicitly and handle the error explicitly.)

Rewriting the GNU Coreutils in Rust

Posted Jun 15, 2021 19:02 UTC (Tue) by nix (subscriber, #2304) [Link] (2 responses)

> Those are short-lived command line programs with no event loop

Many, perhaps most of the commands in coreutils are or can operate as filters. These can be arbitrarily long-lived and spend nearly all their time blocked on I/O.

Rewriting the GNU Coreutils in Rust

Posted Jun 16, 2021 5:03 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

Right, but they are not event driven. At least not in the classical implementation. Maybe with the Rust implementation it’s much more parallelized, there is cleanup work to do on being interrupted, and so you do need to explicitly check for system calls giving EINTR and do something other than just retrying. I still think Rust’s error handling mechanisms will make that cleaner than in C.

Rewriting the GNU Coreutils in Rust

Posted Jun 16, 2021 14:46 UTC (Wed) by nix (subscriber, #2304) [Link]

Oh, definitely. The need to wrap everything in either if statements or EINTR-checking while loops makes robust C unbelievably ugly compared to Rust of corresponding robustness, where you usually just need a trailing ?.

Rewriting the GNU Coreutils in Rust

Posted Jun 10, 2021 15:45 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

> Rust and its standard library handle this kind of thing for you

That's not good in a system's language. You want to get the errors as they occur. Now, the stdlib can provide combinators to say "restart on EINTR" around these calls to abstract at least that away, but Sesse's point is that you can't just hide it away in all cases.

Rewriting the GNU Coreutils in Rust

Posted Jun 11, 2021 20:09 UTC (Fri) by notriddle (subscriber, #130608) [Link]

That's what it does.

In the documentation for Write, it says that the regular write call can return Interrupted. The wrapper that automatically retries is called write_all.

Rewriting the GNU Coreutils in Rust

Posted Jun 14, 2021 8:04 UTC (Mon) by ceplm (subscriber, #41334) [Link] (2 responses)

You know you almost word-by-word quoting from https://www.joelonsoftware.com/2000/04/06/things-you-shou... and not in the good way, right?

Rewriting the GNU Coreutils in Rust

Posted Jun 14, 2021 8:49 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Rewriting the whole Linux stack in one go in Rust is stupid.

Rewriting small pieces of it one-by-one is perfectly fine. It's exactly the small gradual refactoring that we need.

For perspective, coreutils is around 70k lines of code in C. This is not a huge project by any means.

Rewriting the GNU Coreutils in Rust

Posted Jun 15, 2021 8:57 UTC (Tue) by epa (subscriber, #39769) [Link]

Yes, I'm well aware of that, and I was going to add a second paragraph repeating the conventional wisdom that rewriting from scratch isn't always a good idea, but since this is LWN I assumed an educated audience wouldn't need the reminder.

I agree with Joel's points, and you don't want to ditch battle-hardened code, but if the hardening was only added for running on AIX 0.9 in mixed-endian EBCDIC configuration, you can drop it as long as you explicitly drop support for that platform. A new implementation with narrower goals doesn't have to be a drop-in replacement for the old version in all cases. And there are times when rewriting is worthwhile; the GNU tools themselves are from-scratch rewrites of classic Unix utilities, and quite apart from licensing they fixed lots of technical problems like fixed line width.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds