LWN: Comments on "splice() and the ghost of set_fs()" https://lwn.net/Articles/896267/ This is a special feed containing comments posted to the individual LWN article titled "splice() and the ghost of set_fs()". en-us Sun, 09 Nov 2025 00:32:09 +0000 Sun, 09 Nov 2025 00:32:09 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net splice() and the ghost of set_fs() https://lwn.net/Articles/897418/ https://lwn.net/Articles/897418/ stem <div class="FormattedComment"> <font class="QuotedText">&gt; One is that while set_fs() is active, various security measures (like SMEP and SMAP) are defeated.</font><br> Are you sure?<br> afaik, set_fs() has nothing to do with SM*P, it was abused wrt access_ok() - copy_*_user().<br> </div> Thu, 09 Jun 2022 17:06:19 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896531/ https://lwn.net/Articles/896531/ willy <div class="FormattedComment"> That&#x27;s not the problem.<br> <p> The problem is twofold. One is that while set_fs() is active, various security measures (like SMEP and SMAP) are defeated. The other is that (on some architectures and eg on a 4GB/4GB split x86-32), you may not actually be able to access userspace because accessing userspace actually accesses kernel space. On x86-64, you can tell from the high bits of the pointer whether it&#x27;s userspace or kernel space, but that&#x27;s not true eg on SPARC or PARISC.<br> </div> Sat, 28 May 2022 03:56:23 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896527/ https://lwn.net/Articles/896527/ wahern <p>This was exactly Linus' original reasoning wrt vmsplice:</p> <blockquote><pre> On Thu, 20 Apr 2006, Piet Delaney wrote: &gt; &gt; What about marking the pages Read-Only while it's being used by the &gt; kernel NO! That's a huge mistake, and anybody that does it that way (FreeBSD) is totally incompetent. [...] That cost is _bigger_ than the cost of just copying the page in the first place. </pre></blockquote> <p>Source: <a href="https://lkml.org/lkml/2006/4/20/310">https://lkml.org/lkml/2006/4/20/310</a></p> <p>See also <a href="https://lkml.org/lkml/2006/4/19/237">Linus' justifications of splice and tee</a> earlier in that thread. vmsplice is also briefly mentioned, and it's implicit from the context that much of the net value of vmsplice comes from combining with tee, as you mentioned earlier.</p> Sat, 28 May 2022 02:37:26 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896496/ https://lwn.net/Articles/896496/ SLi <div class="FormattedComment"> Is this something that is a problem only because RAII is not used (and couldn&#x27;t you _really_ get something like that working for the kernel), or do I misunderstand the original problem?<br> <p> It seems like a silly limitation to me; as far as I can tell, it&#x27;s 100% knowable at the time of the initial set_fs() call that you want to reset it exactly when the current function (or block) exits. It&#x27;s not even something that would require producing code that is slower than the correct C code.<br> </div> Fri, 27 May 2022 18:02:27 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896495/ https://lwn.net/Articles/896495/ Sesse <div class="FormattedComment"> I believe the gift flag is seen as a mistake in retrospect.<br> <p> FreeBSD has CoW on send(), I believe, but of course that means you need to go through a rather expensive page fault when/if the data changes.<br> </div> Fri, 27 May 2022 17:44:53 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896494/ https://lwn.net/Articles/896494/ NYKevin <div class="FormattedComment"> That still doesn&#x27;t explain why you need the silly GIFT flag. Why can&#x27;t the kernel just mark the offending pages as COW, like fork(2) does? You could do that without a special flag, because it should be transparent to userspace. Indeed, you can do that even for write(2), if you really want to.<br> </div> Fri, 27 May 2022 17:42:32 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896466/ https://lwn.net/Articles/896466/ Sesse <div class="FormattedComment"> vmsplice() is for sending the same data multiple times, I believe? E.g., pre-canned HTTP headers or small responses. vmsplice() once to get it from userspace into the kernel, then you can splice multiple times with copy.<br> </div> Fri, 27 May 2022 13:09:25 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896420/ https://lwn.net/Articles/896420/ gerdesj <div class="FormattedComment"> &quot;But it is true that this type of episode makes the kernel&#x27;s &quot;no regressions&quot; rule look a bit more like just a guideline.&quot;<br> <p> A bit like the Constitution for the UKoGBnNI - <a href="https://en.wikipedia.org/wiki/Constitution_of_the_United_Kingdom">https://en.wikipedia.org/wiki/Constitution_of_the_United_...</a> - &quot;Unlike in most countries, no attempt has been made to codify such arrangements into a single document.&quot; <br> <p> ... and yet somehow we struggle along. <br> <p> It might look a bit daft to conflate the Linux kernel&#x27;s governance with the UK&#x27;s legal system but I think it is quite instructional. One of our cherished principles is the idea that you should be able to &quot;quietly enjoy&quot; your property - I can&#x27;t remember the exact term. I think a coal mine making a racket caused the formative judgment.<br> <p> So, the no regressions thing can be seen in similar terms: Don&#x27;t break people&#x27;s code.<br> <p> However we have to be practical and sometimes stuff has to change.<br> <p> <p> </div> Fri, 27 May 2022 01:03:21 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896415/ https://lwn.net/Articles/896415/ NYKevin <div class="FormattedComment"> I think that depends on the use case. When fd_in is a pipe, splice should be quite fast, because the man page says it just copies pointers to individual pages. If you use a naive sendfile-like implementation, suddenly you&#x27;re making real copies. Or at least, that&#x27;s what I was able to figure out from the man pages, anyway.<br> <p> Side note: Why are there so many syscalls that do almost, but not quite, entirely the same thing in this space? We&#x27;ve also got copy_file_range(2), which seems to be the same as splice(2) but both fds must be normal files. And then there&#x27;s vmsplice(2), which appears to be exactly the same as read(2)/write(2), but with an overly-complicated API, unless you pass SPLICE_F_GIFT, which looks to be the &quot;I&#x27;m doing something ridiculous, don&#x27;t judge me&quot; flag. And I imagine there&#x27;s also some io_uring equivalent to this madness, too. Why is there not a simple, all-purpose &quot;move data from here to here and don&#x27;t bother me about the details, just do whatever&#x27;s fastest or most reasonable&quot; syscall?<br> <p> * splice isn&#x27;t it, because splice requires one of the fds to be a pipe.<br> * copy_file_range isn&#x27;t it, because it requires *both* of the fds to be normal files.<br> * sendfile isn&#x27;t it, because it&#x27;s missing an offset argument for the output file, and the input file must not be a socket.<br> * io_uring isn&#x27;t it, because it&#x27;s like five syscalls and a userspace buffer, not one fire-and-forget syscall.<br> </div> Thu, 26 May 2022 23:55:06 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896414/ https://lwn.net/Articles/896414/ josh <div class="FormattedComment"> splice has to keep working; does it have to keep working *fast*? Could it become a wrapper around sendfile-like semantics, and then just have specific cases where it can go faster?<br> </div> Thu, 26 May 2022 22:23:24 +0000 splice() and the ghost of set_fs() https://lwn.net/Articles/896406/ https://lwn.net/Articles/896406/ zx2c4 <div class="FormattedComment"> The ensuing performance discussion is somewhat interesting. Currently it&#x27;s taking place across a few threads. My tracking of it is:<br> <p> <p> - I noticed the slow down here:<br> <a href="https://lore.kernel.org/lkml/Yoey+FOYO69lS5qP@zx2c4.com/">https://lore.kernel.org/lkml/Yoey+FOYO69lS5qP@zx2c4.com/</a><br> - Jens confirmed it&#x27;s around 3%:<br> <a href="https://lore.kernel.org/lkml/0a6ed6b9-0917-0d83-5c45-70ff58fad429@kernel.dk/">https://lore.kernel.org/lkml/0a6ed6b9-0917-0d83-5c45-70ff...</a><br> - Relatedly, I had proposed doing the same thing to /dev/zero: <a href="https://lore.kernel.org/lkml/20220520135030.166831-1-Jason@zx2c4.com/">https://lore.kernel.org/lkml/20220520135030.166831-1-Jaso...</a><br> - Jens liked the idea, but Al pointed out the<br> performance issues, and later started figuring out why:<br> <a href="https://lore.kernel.org/lkml/Yokmu7bQpg70Bp8R@zeniv-ca.linux.org.uk/">https://lore.kernel.org/lkml/Yokmu7bQpg70Bp8R@zeniv-ca.li...</a><br> - Al references resurrecting a particularly relevent older thread on<br> fsdevel:<br> <a href="https://lore.kernel.org/linux-fsdevel/Yokl+uHTVWFxoQGn@zeniv-ca.linux.org.uk/">https://lore.kernel.org/linux-fsdevel/Yokl+uHTVWFxoQGn@ze...</a><br> - This thread is now a many messages deep. That&#x27;s where things are at now.<br> - Looks like Al&#x27;s got some patches he&#x27;s playing with in<br> <a href="https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/log/?h=new.iov_iter">https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs....</a><br> <p> Hopefully those close the performance gap and then all drivers get faster.<br> </div> Thu, 26 May 2022 21:09:35 +0000