|
|
Log in / Subscribe / Register

The Linux "copy problem"

The Linux "copy problem"

Posted May 29, 2019 19:57 UTC (Wed) by desbma (guest, #118820)
Parent article: The Linux "copy problem"

No mention of the sendfile system call?
It does not work for every case, but when it does, it is much more efficient than doing chunk based copy, and it solves the "what is the optimal I/O size" dilemma.


to post comments

The Linux "copy problem"

Posted May 29, 2019 21:38 UTC (Wed) by ewen (subscriber, #4772) [Link]

The sendfile system call was my thought too. If a copy tool used sendfile, eg, for within a file system copying then the kernel has lots more information on the high level goal with which to optimise the approach taken. And even between file systems it could potentially make use of file system internal knowledge (block layout, etc) and buffer access.

If copy tools just do read/write in a loop, the kernel is left guessing the high level intent (including read ahead and whether to cache it in the kernel buffers),

Ewen

The Linux "copy problem"

Posted May 30, 2019 13:22 UTC (Thu) by ecree (guest, #95790) [Link] (3 responses)

I was wondering about splice(). Wouldn't that be a more efficient way to do copies?

The Linux "copy problem"

Posted May 30, 2019 13:38 UTC (Thu) by desbma (guest, #118820) [Link] (2 responses)

According to `man splice`:
> splice() moves data between two file descriptors [...] where one of the file descriptors must refer to a pipe

sendfile does not have this restriction (it can also work if the destination fd is a socket), so it is the ideal candidate for copying files.

Apparently, someone proposed to update coreutil's cp to use sendfile in 2012, but it was rejected: https://lists.gnu.org/archive/html/coreutils/2012-10/msg0...

A while ago I did some benchmarks in Python to compare "read/write chunk" vs "sendfile" based copy, and it let to a 30-50% speedup : https://github.com/desbma/pyfastcopy#performance

The Linux "copy problem"

Posted May 30, 2019 13:54 UTC (Thu) by ecree (guest, #95790) [Link] (1 responses)

> > one of the file descriptors must refer to a pipe
Yes, so you make two splice calls:

int p[2];
pipe(p);
splice(fd_in, NULL, p[1], NULL, len, flags);
splice(p[0], NULL, fd_out, NULL, len, flags);

I haven't actually tried this, but in theory it should enable the kernel to do a zero-copy copy where the underlying files support that. The pipe is really no more than a way to associate a userspace handle with a kernel buffer; see https://yarchive.net/comp/linux/splice.html for details.

The Linux "copy problem"

Posted May 30, 2019 14:00 UTC (Thu) by desbma (guest, #118820) [Link]

> I haven't actually tried this, but in theory it should enable the kernel to do a zero-copy copy where the underlying files support that.

This is exactly what sendfile does, with a single system call, instead of 3 for your example.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds