LWN: Comments on "How useful should copy_file_range() be?" https://lwn.net/Articles/846403/ This is a special feed containing comments posted to the individual LWN article titled "How useful should copy_file_range() be?". en-us Thu, 16 Oct 2025 09:24:27 +0000 Thu, 16 Oct 2025 09:24:27 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net How useful should copy_file_range() be? https://lwn.net/Articles/862418/ https://lwn.net/Articles/862418/ flussence <div class="FormattedComment"> <font class="QuotedText">&gt;Is the screen a file?</font><br> <font class="QuotedText">&gt;Is the keyboard a file?</font><br> <p> You&#x27;re being facetious, but it&#x27;s occasionally very useful to be able to do things like check which port a monitor is plugged in on or framedump the console on a server using only ssh.<br> </div> Sat, 10 Jul 2021 23:16:44 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/862414/ https://lwn.net/Articles/862414/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; Or at least, perhaps, allow open, for a hierarchical namespace, but not read, and reveal all information via ioctl.</font><br> Windows NT tried that. It failed miserably.<br> </div> Sat, 10 Jul 2021 21:48:42 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/862387/ https://lwn.net/Articles/862387/ mpr22 <div class="FormattedComment"> Those things are not files, but allowing userspace to treat them as file-like for the simplest common use case (sequential I/O) is one of the things that contributed to Unix eating most of the rest of the server operating system industry&#x27;s breakfast, lunch, dinner, and face.<br> </div> Sat, 10 Jul 2021 13:54:25 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/862375/ https://lwn.net/Articles/862375/ jaykrell <div class="FormattedComment"> The bug is that /proc exists at all.<br> This data should be retrieved through strongly typed special purpose function calls.<br> Or at least, perhaps, allow open, for a hierarchical namespace, but not read, and reveal all information via ioctl.<br> <p> Not everything is a file!<br> <p> In fact, most things are not a file.<br> Is the screen a file?<br> Is the keyboard a file?<br> Are sockets files? They have read/write, but how about seek and mmap?<br> <p> </div> Sat, 10 Jul 2021 12:57:39 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/847320/ https://lwn.net/Articles/847320/ zuzzurro <div class="FormattedComment"> Given the amount of confusion that is visible in the thread where kernel programmers are trying to find the best way to fix this I find it sad that people have tried to just throw the issue off to the userland.<br> </div> Thu, 25 Feb 2021 11:12:21 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/847179/ https://lwn.net/Articles/847179/ jsmith45 <div class="FormattedComment"> Very true. The problem is that there is almost certainly a bunch of programs out there that will break if the file types of /proc pseudofiles changes to be anything but a normal file. <br> </div> Tue, 23 Feb 2021 19:02:38 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846894/ https://lwn.net/Articles/846894/ markh <div class="FormattedComment"> The bug is that the kernel is reporting it as a regular file (in st_mode), but then does not satisfy the requirements for regular files. If the kernel does not want to satisfy those requirements, all that is needed is to report it as a type of file that does not have the requirements that it cannot satisfy. It is not reasonable to expect userspace programs to somehow guess that a file reported as a regular file cannot be relied upon to behave as such.<br> </div> Sat, 20 Feb 2021 17:01:45 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846888/ https://lwn.net/Articles/846888/ smurf <div class="FormattedComment"> Those fake kernel files don&#x27;t have POSIX semantics, period. Their reported size doesn&#x27;t correspond to the number of readable bytes, if they&#x27;re writeable you can&#x27;t just write &#x27;1&#x27; and then &#x27;23&#x27; when you intend to write &#x27;123&#x27;, and so on.<br> <p> So the bug is on the user. If you treat these things like ordinary files and expect all the posicky corner cases to work &quot;correctly&quot;, you&#x27;re SOL. These files will never have POSIX semantics. No, you can&#x27;t use libc to emulate it. Deal.<br> <p> Yes there should be a way to ask the kernel whether a file conforms to 100% posix. Well, we don&#x27;t have that. Deal.<br> <p> One possible workaround is to check the file size. If it&#x27;s smaller than pagesize*4 or so then it&#x27;s probably cheaper to copy its data the old-fashioned way anyway.<br> </div> Sat, 20 Feb 2021 14:45:12 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846878/ https://lwn.net/Articles/846878/ jengelh <div class="FormattedComment"> Looking at the POSIX description for the lseek(2) POSIX-C function, it is required to operate in terms of bytes.<br> <p> I am not getting that with the /proc/self/sched file (uses `seq_lseek`, which operates in terms of records). Is sys_llseek *meant* to be POSIX-compatible?<br> If yes, it&#x27;s a kernel bug.<br> If no, then it&#x27;s a libc bug because it failed to provide/emulate POSIX semantics on top of an (unposixy) kernel interface.<br> So, which is it, where should I file a bug?<br> </div> Sat, 20 Feb 2021 12:16:37 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846813/ https://lwn.net/Articles/846813/ the8472 <p>You're looking at the linux implementation for <a rel="nofollow" href="https://doc.rust-lang.org/std/io/fn.copy.html">io::copy</a> which can copy arbitrary readers to writers but specializes to various syscalls when types wrapping file descriptors of unknown type are passed, hence the complexity.</p> <p>There also is <a rel="nofollow" href="https://doc.rust-lang.org/std/fs/fn.copy.html">fs::copy</a> which shares some code but has fewer cases to cover and does fewer checks before invoking <code>copy_file_range</code> since it's only meant to copy entire regular files.</p> <p>The relevant parts are <a rel="nofollow" href="https://github.com/rust-lang/rust/blob/7647d03c33339bd85a1665047b22ae7e800fee98/library/std/src/sys/unix/fs.rs#L1205-L1221">here</a> and <a rel="nofollow" href="https://github.com/rust-lang/rust/blob/7647d03c33339bd85a1665047b22ae7e800fee98/library/std/src/sys/unix/kernel_copy.rs#L549-L597">here</a></p>. Note that it doesn't use stat information to decide what to do in the <code>fs::copy</code> case, it just tries <code>copy_file_range</code> and then falls back in various cases, including 0 byte reads. Fri, 19 Feb 2021 20:29:02 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846816/ https://lwn.net/Articles/846816/ fw I see, there is a different code path that reaches the <code>copy_file_range</code> system call. <p>On the other hand, it is still deeply nested within the library, and it is not immediately obvious whether the callers of the universal copy routines expect consistent file offsets on errors. For a function that is directly modeled on the system call (which seems to be the case for Go), predictable file offset behavior seems quite important. Fri, 19 Feb 2021 18:55:50 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846799/ https://lwn.net/Articles/846799/ NYKevin <div class="FormattedComment"> If they are unwilling to flag the filesystem as a whole as FS_GENERATED_CONTENT, I can&#x27;t imagine they will be eager to flag *each individual file* with something else.<br> </div> Fri, 19 Feb 2021 17:33:20 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846741/ https://lwn.net/Articles/846741/ Deewiant <div class="FormattedComment"> The current code in Rust does seem to handle arbitrary fds too. It&#x27;s all here, but there&#x27;s a lot of code to read through (which supports your point that it&#x27;s difficult): <a href="https://github.com/rust-lang/rust/blob/7647d03c33339bd85a1665047b22ae7e800fee98/library/std/src/sys/unix/kernel_copy.rs">https://github.com/rust-lang/rust/blob/7647d03c33339bd85a...</a><br> <p> Looks like it starts by checking the underlying file type (with a stat() if necessary) and only tries copy_file_range on regular files of nonzero size (lines 122 and 168; and 284 and 469 for the logic leading to the stat() itself), while still falling back to other methods if copy_file_range only copied zero bytes (lines 175 and 563). Overall there seems to be a lot of logic around keeping track of what was actually written vs. what was reported by the syscalls.<br> </div> Fri, 19 Feb 2021 07:32:42 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846739/ https://lwn.net/Articles/846739/ fw The Go implementation works on file descriptors, not files. Full userspace emulation of <code>copy_file_range</code> is very hard: append-only output files, non-seekable input files that cannot restore the correct input read position after an output failure, other error conditions that are not recoverable because system calls fail during rollback. If it's difficult for the kernel, it's probably hard for userspace, too. <p>If you can just close the descriptors and report an error (because the function opened them locally), these issues do not apply. Fri, 19 Feb 2021 07:14:21 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846727/ https://lwn.net/Articles/846727/ drinkcat <div class="FormattedComment"> Yeah... except for sysfs files that report a size of 4096 bytes, copy_file_range would appear to work on these.<br> </div> Fri, 19 Feb 2021 00:17:26 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846723/ https://lwn.net/Articles/846723/ JoeBuck <div class="FormattedComment"> Perhaps I&#x27;m missing something, but I thought that for the generated files, the call always transfers 0 bytes, and the Rust patch immediately falls back when it sees this. So how is seeking an issue?<br> <p> </div> Thu, 18 Feb 2021 23:49:51 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846713/ https://lwn.net/Articles/846713/ drinkcat <div class="FormattedComment"> That workaround should work in most cases. But another tricky thing with copy_file_range is that in case of partial writes, it&#x27;s supposed to be able to seek in the input file (which is not usually possible on generated files).<br> </div> Thu, 18 Feb 2021 22:37:17 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846678/ https://lwn.net/Articles/846678/ JoeBuck <div class="FormattedComment"> Looks like others can just port the Rust fix.<br> <p> </div> Thu, 18 Feb 2021 19:21:26 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846672/ https://lwn.net/Articles/846672/ smurf <div class="FormattedComment"> Some of these /proc files are quasi-seekable IIRC, in that they return an offset which indicates your read position but which does not correspond to the character count from beginning-of-file to wherever you were when you called lseek(fd,SEEK_CUR).<br> </div> Thu, 18 Feb 2021 18:38:47 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846670/ https://lwn.net/Articles/846670/ Deewiant <div class="FormattedComment"> Rust&#x27;s standard library has been using copy_file_range for years. Though apparently a fix for these kinds of issues landed only six months ago: <a href="https://github.com/rust-lang/rust/commit/4ddedd521418d67e845ecb43dc02c09b0af53022">https://github.com/rust-lang/rust/commit/4ddedd521418d67e...</a><br> </div> Thu, 18 Feb 2021 18:24:37 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846662/ https://lwn.net/Articles/846662/ dezgeg <div class="FormattedComment"> There were objections to the FS_GENERATED_CONTENT flag due to &quot;madness and constant auditing&quot;... but how many such virtual filesystem types that need the flag actually exist, besides the 4 touched in the patchset? Is that really a number that one cannot count with fingers?<br> </div> Thu, 18 Feb 2021 17:31:10 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846655/ https://lwn.net/Articles/846655/ matthias <div class="FormattedComment"> As I understand the man page, the offsets are adjusted by the number of bytes copied. So if you read n bytes and only write m&lt;n bytes, then both offsets are adjusted by m.<br> </div> Thu, 18 Feb 2021 16:43:17 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846654/ https://lwn.net/Articles/846654/ zuzzurro <div class="FormattedComment"> What if these &quot;non seekable&quot; files were simply flagged as named pipes?<br> </div> Thu, 18 Feb 2021 16:41:36 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846648/ https://lwn.net/Articles/846648/ zuzzurro <div class="FormattedComment"> Isn&#x27;t the problem caused by the fact that we have in the systems files that pretend to be files but don&#x27;t really behave like ones? If that&#x27;s the case, in the good old days people would create a new file type and flag them as such (p for named pipes, m for multiplexed files, s for sockets). So that useland would not assume that the normal file behaviour was available.<br> I guess it&#x27;s too late to do the same anymore...<br> </div> Thu, 18 Feb 2021 16:36:49 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846643/ https://lwn.net/Articles/846643/ dullfire <div class="FormattedComment"> Never mind, failed to re-read the man page before commenting. Seems the kernel already does this.<br> <p> So either the author was mistaken and it&#x27;s not actually an issue, or the kernel doesn&#x27;t update the read-size loff_t when there is a write failure, in which case the flag would just change this behavior<br> </div> Thu, 18 Feb 2021 15:48:18 +0000 How useful should copy_file_range() be? https://lwn.net/Articles/846642/ https://lwn.net/Articles/846642/ dullfire <div class="FormattedComment"> I&#x27;m not sure adding the ability to report if the sort copy was due to the read or the write is really worth while.<br> <p> However if that was really desired: I imagine a fairly simple way to do that is: add a flag, something like CFR_UPDATE_OFF_SIZE. Setting it would make the kernel update the loff_t&#x27;s (pointed to by off_in and off_out) with the bytes read/written correspondingly. Userspace can easily tell which side failed then. If the two sides are equal, it was a read size failure, if the read side is greater, it was a write failure. Note that it is a logic error is the write size were to be greater than the read size.<br> </div> Thu, 18 Feb 2021 15:44:56 +0000