|
|
Log in / Subscribe / Register

The Linux "copy problem"

The Linux "copy problem"

Posted May 29, 2019 20:20 UTC (Wed) by roc (subscriber, #30627)
Parent article: The Linux "copy problem"

> Mason pointed out that kernel developers are not ambassadors to go fix applications across the open-source world, however; "our job is to build the interfaces", so that is where the focus of the discussion should be.

It seems like it would be worth at least diving into `cp` and making sure it's as good as can be, since that gets used so much. Then you can point developers of those other applications at `cp` as an example of how to do things right.


to post comments

The Linux "copy problem"

Posted May 29, 2019 21:05 UTC (Wed) by smfrench (subscriber, #124116) [Link] (7 responses)

In the presentation I listed seven options that could be added (e.g. to cp and rsync). Other copy tools (like robocopy for Windows) have these (as well as others that may be less important for us on Linux) and may be useful examples.

For example some options which other tools like robocopy let the user select:
- parallel i/o (especially for the uncached copy case)
- allow setting file size first (to reduce the number of metadata updates during the copy operation)
- allow calling the copy system call (copy_file_range API) for file systems which support it
- allow copying additional metadata (e..g xattr and ACLs)
- allow choosing larger i/o (overriding the block size). For some filesystems i/o > 1MB can be much faster than small I/O (some tools will default to 4K or smaller which can be more than 10 times slower)

And then following up on other discussions at the sumimt:
- allow options like encryption or compression (which could be supported over SMB3 for example and probably other filesystems).

The Linux "copy problem"

Posted May 29, 2019 23:05 UTC (Wed) by roc (subscriber, #30627) [Link]

That makes sense, but you also want to the default to be as good as can be.

The Linux "copy problem"

Posted May 30, 2019 16:20 UTC (Thu) by boutcher (subscriber, #7730) [Link]

I had to laugh that you brought up OS/2

The Linux "copy problem"

Posted Jun 1, 2019 1:40 UTC (Sat) by tarkasteve (subscriber, #94934) [Link] (4 responses)

I'd also humbly suggest `xcp`:

https://crates.io/crates/xcp

* Uses copy_file_range() where possible, falls back to userspace if not.
* Supports sparse files (with lseek; I wasn't aware of fiemap, is there any advantage to one over the other?)
* Partially parallel (recursive read is separate from copy operations; I have an todo for parallel copy as it seems to have advantages on nvme drives).
* Optional progress bar.
* Written in Rust
* Cross platform (well, Linux + other unix-like OSs; Windows may work, I've never managed to get Rust to work on it).

It doesn't support much in the way of permissions/ACLs ATM, it's still an intermittent WIP.

I did look at using O_DIRECT, but I get EINVAL. The open manpage lists a whole series of caveats and warnings about using it, including a disparaging quote from Linus.

Thanks for the discussion/article, it's given me some things to look into.

The Linux "copy problem"

Posted Jun 1, 2019 13:04 UTC (Sat) by desbma (guest, #118820) [Link] (1 responses)

Thanks for the link.

It joins the list of great little tools that have taken inspiration from classic Unix command line tools, but rewritten them in Rust with many improvements along the way: grep -> ripgrep, find -> fd, hexdump -> hexyl, cat -> bat, du -> diskus, cloc -> tokei...

I'll be sure to look into xcp, and probably open a few issues along the way :)

The Linux "copy problem"

Posted Jun 2, 2019 3:02 UTC (Sun) by scientes (guest, #83068) [Link]

I myself was using inotail until I reported the problem (tail -f didn't support inotify) to coreutils and it was actually fixed.

The Linux "copy problem"

Posted Jun 2, 2019 5:12 UTC (Sun) by tarkasteve (subscriber, #94934) [Link] (1 responses)

So inspired by all this, I've updated xcp with the ability to do parallel copies (at the per-file level). The results are fairly good; I'm seeing 30%-60% speed-ups depending on caching.

The Linux "copy problem"

Posted Jun 10, 2019 21:58 UTC (Mon) by smfrench (subscriber, #124116) [Link]

This is great news - looking forward to trying it. Am also very excited about the work Andreas at RedHat did, enabling GCM crypto for SMB3.1.1 mounts, which can more than double performance copying files to server when on encrypted mounts (in conjunction with two cifs.ko client patches that I recently merged into for-next that enable GCM on the client).

The Linux "copy problem"

Posted May 30, 2019 7:29 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

I had a range of responses to this 'build the interfaces' thing.

How people use those interfaces is something that needs guidance, always: you form a symbiotic loop in reliance on each other.

Conway's Law hasn't gone away and is implications about interacting with the users of your interfaces still stand. I think that Chris Mason's comment here is misguided and that we need to help people work together and communicate better. I think we need to balance this view of APIs with criticism of xattrs being difficult to copy and racy (with security implications).

K3n.

The Linux "copy problem"

Posted May 30, 2019 16:00 UTC (Thu) by KAMiKAZOW (guest, #107958) [Link] (5 responses)

> It seems like it would be worth at least diving into `cp` and making sure it's as good as can be, since that gets used so much.

I find it insane that cp was fixed a decade ago by NASA and in all that time neither them nor anybody else thought about upstreaming the changes.

The Linux "copy problem"

Posted May 30, 2019 16:17 UTC (Thu) by desbma (guest, #118820) [Link] (4 responses)

Unfortunately, there are many other example of cases like this.

For example zlib, one the most widely used software library in the world, has several forks (Intel, Cloudflare, zlib-ng...) with optimizations that improve compression/decompression speed.

Yet the changes have never been merged back in zlib, and everybody still uses the historic version, and happily wastes CPU cycles (including when your browser decompresses this very page).

The Linux "copy problem"

Posted Jun 2, 2019 3:07 UTC (Sun) by scientes (guest, #83068) [Link] (3 responses)

> (including when your browser decompresses this very page).

Compression is disabled for https sites due to various attacks on the file size information leak.

The Linux "copy problem"

Posted Jun 2, 2019 10:27 UTC (Sun) by desbma (guest, #118820) [Link] (2 responses)

My browser (and curl) both disagree with you:
curl -v --compressed 'https://lwn.net/' > /dev/null 2>&1 | grep gzip
> Accept-Encoding: deflate, gzip
< Content-Encoding: gzip

The Linux "copy problem"

Posted Jun 2, 2019 12:20 UTC (Sun) by Jandar (subscriber, #85683) [Link] (1 responses)

> curl -v --compressed 'https://lwn.net/' > /dev/null 2>&1 | grep gzip

This command is obviously without any output.

$ curl -v --compressed 'https://lwn.net/' > /dev/null 2>&1 | wc
0 0 0

Perhaps you meant: curl -v --compressed 'https://lwn.net/' 2>&1 > /dev/null | grep gzip

The Linux "copy problem"

Posted Jun 2, 2019 12:45 UTC (Sun) by desbma (guest, #118820) [Link]

You are right, I'm using ZSH and didn't realize that line was not portable across other shells.

curl -v --compressed 'https://lwn.net/' -o /dev/null 2>&1 | grep gzip

also works


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds