|
|
Subscribe / Log in / New account

Rewriting the GNU Coreutils in Rust

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 3:46 UTC (Wed) by NYKevin (subscriber, #129325)
In reply to: Rewriting the GNU Coreutils in Rust by Paf
Parent article: Rewriting the GNU Coreutils in Rust

Now I'm wondering how sendfile's performance compares to, say, mmap'ing the file and handing the pointer directly to write (or perhaps fwrite, if we're doing this from userspace). I can't imagine that it could be too much worse than what you describe, seeing as most of the real work is taking place in the page cache, which is supposed to be good at this sort of thing...

Would it be legal/advisable for sendfile to actually be implemented by doing that (with write, not fwrite), in cases where the file to read is mmap'able? Or does that break some sort of "don't alloc buffers in userspace without userspace explicitly asking for it" rule? I can't imagine that any reasonable implementation of malloc would care, but I'm sure there's some ridiculous use case where you go trawling through /proc/self/maps and do something clever with it.

Alternative #2: What if you could mmap file A, somehow tell the kernel that "this mmap now belongs to file B," and then unmap it? I'm guessing that you can't do the second step, though. AFAIK, there's currently no such thing as filesystem-to-memory-to-filesystem COW, so you'd have to fault everything into memory, at which point this is clearly a terrible idea (for large files, anyway).


to post comments

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 13:51 UTC (Wed) by Paf (subscriber, #91811) [Link] (1 responses)

Yeah, I don’t fully follow some of the details of your thought, but I think you’ve found a few of the issues. My impression - I’m a file system developer but I don’t actually know this, it’s just an impression - is that send file is implemented this way to avoid allocating large buffers inside the kernel.

You can do things like mmap in the kernel; in general you can kind of do ... whatever. But a lot of it is ugly or weird (using syscall type functionality within the kernel is prone to weird reentrancy issues), or requires allocating big blobs of memory in the kernel.

Things like COW for page cache memory would be ... weird. It gets in to areas I’m not familiar with, but I would think faulting, etc, might be a problem.

(Also, I misremembered above - send file is *always* via a pipe.)

Rewriting the GNU Coreutils in Rust

Posted Jun 9, 2021 13:58 UTC (Wed) by Sesse (subscriber, #53779) [Link]

You can use splice() to go directly from one file descriptor to another with no pipe in-between (sendfile is just a special case of splice), or use the new copy_file_range() call.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds