|
|
Log in / Subscribe / Register

Rewriting the GNU Coreutils in Rust

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 0:02 UTC (Wed) by wahern (subscriber, #37304)
In reply to: Rewriting the GNU Coreutils in Rust by Sesse
Parent article: Rewriting the GNU Coreutils in Rust

sendfile and especially copy_file_range would be much faster than using io_uring for copying files. Technically many copy_file_range use cases could be probably be abstracted by sendfile.

You can also use sendfile to copy files from disk to socket and vice-versa. With kernel-based TLS (aka kTLS) you can do this on TLS-encrypted TCP sockets.

I suppose somebody will at some point add a sendfile operation to io_uring, though that wouldn't necessarily buy you performance, just user space convenience.


to post comments

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 2:30 UTC (Wed) by Paf (subscriber, #91811) [Link] (4 responses)

Fun fact, from a file system developer:
Send file is very often *not* faster than read and write because many file system implementations use a 64KB pipe, generating 64 KB I/Os. It’s often much better to do larger I/O from user space, even on SSDs.

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 3:46 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

Now I'm wondering how sendfile's performance compares to, say, mmap'ing the file and handing the pointer directly to write (or perhaps fwrite, if we're doing this from userspace). I can't imagine that it could be too much worse than what you describe, seeing as most of the real work is taking place in the page cache, which is supposed to be good at this sort of thing...

Would it be legal/advisable for sendfile to actually be implemented by doing that (with write, not fwrite), in cases where the file to read is mmap'able? Or does that break some sort of "don't alloc buffers in userspace without userspace explicitly asking for it" rule? I can't imagine that any reasonable implementation of malloc would care, but I'm sure there's some ridiculous use case where you go trawling through /proc/self/maps and do something clever with it.

Alternative #2: What if you could mmap file A, somehow tell the kernel that "this mmap now belongs to file B," and then unmap it? I'm guessing that you can't do the second step, though. AFAIK, there's currently no such thing as filesystem-to-memory-to-filesystem COW, so you'd have to fault everything into memory, at which point this is clearly a terrible idea (for large files, anyway).

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 13:51 UTC (Wed) by Paf (subscriber, #91811) [Link] (1 responses)

Yeah, I don’t fully follow some of the details of your thought, but I think you’ve found a few of the issues. My impression - I’m a file system developer but I don’t actually know this, it’s just an impression - is that send file is implemented this way to avoid allocating large buffers inside the kernel.

You can do things like mmap in the kernel; in general you can kind of do ... whatever. But a lot of it is ugly or weird (using syscall type functionality within the kernel is prone to weird reentrancy issues), or requires allocating big blobs of memory in the kernel.

Things like COW for page cache memory would be ... weird. It gets in to areas I’m not familiar with, but I would think faulting, etc, might be a problem.

(Also, I misremembered above - send file is *always* via a pipe.)

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 13:58 UTC (Wed) by Sesse (subscriber, #53779) [Link]

You can use splice() to go directly from one file descriptor to another with no pipe in-between (sendfile is just a special case of splice), or use the new copy_file_range() call.

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 8:39 UTC (Wed) by LtWorf (subscriber, #124958) [Link]

Yeah I had tried doing a bunch of ifdef in a project of mine to use send file, but after checking the benchmarks I scrapped it all, as it was slower than the regular read/write loop.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds