LWN.net Logo

A look at rsync performance

A look at rsync performance

Posted Aug 19, 2010 10:53 UTC (Thu) by ewen (subscriber, #4772)
In reply to: A look at rsync performance by Liefting
Parent article: A look at rsync performance

If you have a directory with, say, 100 * 2GB files in it, and another directory which has 96 of those files, and a few older ones, then using your tar pipeline requires transfering 200GB of data -- but using rsync only requires transferring 8GB of data. I know which I'd prefer. (And the tar technique still leaves you having to figure out which files no longer belong and remove them.)

There are a bunch of reasons for using rsync as shorthand for "make these two directories the same", even without needing the rsync algorithm to synchronise changes within an individual file. And it seems to me that adding a special case for "whole new file" into the rsync program, that copied with maximum efficiency, would be valuable. Which I think was (one of) the points of the original article.

Ewen

PS: I use "tar -cpf - . | (cd /dest && tar -xpf -)" for a bunch of safety reasons, and to preserve at least some permissions. With GNU tar that'll copy most things; with traditional unix tar, less so, but it gets closer than most tools on traditional unix. (GNU cp has an "-a" extension which will also preserve most things.)

PPS: For the later questioner, using a tar pipeline historically had better performance because it scheduled two processes which kept more I/O in flight. I've not looked recently to see if that's still the case, and given the performance numbers in the article it may not be the case (eg, the kernel's readahead may do just as well, if not better).


(Log in to post comments)

A look at rsync performance

Posted Aug 19, 2010 22:53 UTC (Thu) by Comet (subscriber, #11646) [Link]

-W, --whole-file copy files whole (without rsync algorithm)

A look at rsync performance

Posted Aug 20, 2010 4:29 UTC (Fri) by jcvw (subscriber, #50475) [Link]

I tried that. It doesn't help. The amount of user and system time is still incredibly high (compared to a simple cp). rsync doesn't use a tight read/write loop, like cp does, but (even in local cases) uses two processes, a socket and lots of select system calls. The -W doesn't change anything there (unfortunately).

A look at rsync performance

Posted Aug 20, 2010 4:46 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

my first reaction on reading this is that the processes are stalling, and the behavior you describe to improve performance sounds like it's on the same tack.

did you try tweaking the buffer sizes that rsync uses to see if larger buffers may smooth things out a bit?

I would still expect it to take significantly more CPU time than a plain cp, I'm not bothered by that (it's really designed for a different job that it excels at, this is a degenerate corner case for it), but the throughput should be higher.

while the approach is less efficient than the read()/write() in a tight loop, if it can keep the buffers between the processes full it should be able to do fairly well (as your testing shows when you tweak things so that the threads aren't waiting for each other much), so perhaps larger buffers can avoid the stalls.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds