splice() and the ghost of set_fs()
splice() and the ghost of set_fs()
Posted May 26, 2022 22:23 UTC (Thu) by josh (subscriber, #17465)Parent article: splice() and the ghost of set_fs()
Posted May 26, 2022 23:55 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
Side note: Why are there so many syscalls that do almost, but not quite, entirely the same thing in this space? We've also got copy_file_range(2), which seems to be the same as splice(2) but both fds must be normal files. And then there's vmsplice(2), which appears to be exactly the same as read(2)/write(2), but with an overly-complicated API, unless you pass SPLICE_F_GIFT, which looks to be the "I'm doing something ridiculous, don't judge me" flag. And I imagine there's also some io_uring equivalent to this madness, too. Why is there not a simple, all-purpose "move data from here to here and don't bother me about the details, just do whatever's fastest or most reasonable" syscall?
* splice isn't it, because splice requires one of the fds to be a pipe.
Posted May 27, 2022 13:09 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (3 responses)
Posted May 27, 2022 17:42 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Posted May 27, 2022 17:44 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (1 responses)
FreeBSD has CoW on send(), I believe, but of course that means you need to go through a rather expensive page fault when/if the data changes.
Posted May 28, 2022 2:37 UTC (Sat)
by wahern (subscriber, #37304)
[Link]
This was exactly Linus' original reasoning wrt vmsplice: Source: https://lkml.org/lkml/2006/4/20/310 See also Linus' justifications of splice and tee earlier in that thread. vmsplice is also briefly mentioned, and it's implicit from the context that much of the net value of vmsplice comes from combining with tee, as you mentioned earlier.
splice() and the ghost of set_fs()
* copy_file_range isn't it, because it requires *both* of the fds to be normal files.
* sendfile isn't it, because it's missing an offset argument for the output file, and the input file must not be a socket.
* io_uring isn't it, because it's like five syscalls and a userspace buffer, not one fire-and-forget syscall.
splice() and the ghost of set_fs()
splice() and the ghost of set_fs()
splice() and the ghost of set_fs()
splice() and the ghost of set_fs()
On Thu, 20 Apr 2006, Piet Delaney wrote:
>
> What about marking the pages Read-Only while it's being used by the
> kernel
NO!
That's a huge mistake, and anybody that does it that way (FreeBSD) is totally incompetent.
[...]
That cost is _bigger_ than the cost of just copying the page in the first place.
