| Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
Recently I bought a shiny new disk for my Fedora-10 based Mythtv system. I had to copy some 700GiB of video files from the old disk to the new one. I am used to rsync for this type of job as the rsync command and its accompanying options flow right from my fingers to the keyboard. However, I was not happy with what I saw, as the performance was nothing to write home about: the files were copied at about 37MiB/s. Both disks can handle about three times that speed — at least on the outer cylinders. That makes a lot of difference: an expected wait of just over two hours changed into a six hour ordeal. Note that both SATA disks were local to the system and no network was involved.
Wanting to know what happened, I created a small test to see what was going on: copying a 10GiB file from one disk to the other. I made sure that the ext4 file systems involved were completely fresh so fragmentation could not play a part (a new mkfs after each test.) I also made sure that the test file systems were created on the outermost (and fastest) cylinders of the disks. Simply reading the source file could be done at 106MiB/s and writing a 10GiB file to the destination file system could be done at 134MiB/s.
The copy programs under test were rsync, cpio, cp, and cat. Of course I took care that the cache could not interfere by flushing the cache before each test, and waiting for the dirty buffers to be flushed to the destination disk after the test command completes. For example, when the SRC and DEST are variables holding the name of the source file in the current directory and the name of the destination directory:
sync # flush dirty buffers to disk
echo 3 > /proc/sys/vm/drop_caches # discard caches
time sh -c "cp $SRC $DEST; sync" # measure cp and sync time
The echo command to /proc/sys/vm/drop_caches forces the invalidation of all non-dirty buffers in the page cache. To also force dirty pages to be flushed, we first use the sync command. The copy command will copy the 10GiB file, but it will actually finish before the last blocks have been flushed to disk. That is why we time the combination of the cp command and the sync command, which forces flushing the dirty blocks to disk.
The four commands tested were:
rsync $SRC $DEST
echo $SRC | cpio -p $DEST
cp $SRC $DEST
cat $SRC > $DEST/$SRC
The results for rsync, cpio, cp, and cat were:
user sys elapsed hog MiB/s test 5.24 77.92 101.86 81% 100.53 cpio 0.85 53.77 101.12 54% 101.27 cp 1.73 59.47 100.84 60% 101.55 cat 139.69 93.50 280.40 83% 36.52 rsync
The observation that rsync was slow was indeed substantiated. Looking at the hog factor (the amount of cpu-time used relative to the elapsed time), we can conclude that rsync is not so much disk-bound (as is to be expected), but cpu-bound. That required some more scrutiny. The atop program showed rsync appears to need three processes: one process that does only disk reads, one that does only disk writes and one (I assume) control process that uses little CPU time and does no disk I/O.
Using strace, it can be shown that cp only uses read() and write() system calls in a tight loop, while rsync uses two processes that talk to each other using reads and writes through a socket, sprinkled with loads of select() system calls. To simulate the multiple processes, I then used multiple cat processes strung together using pipes. That test does not show the bad performance that rsync demonstrates. To test the influence of using a socket, I also created a TCP service using xinetd that just starts cat with its output redirected to a file to simulate the "network traffic." The client side:
cat $SRC | nc localhost myservice
And the server side:
cat > $DEST
Even this setup outperforms rsync. It achieves the same disk bandwidth as cp with a far lower CPU load than rsync.
taskset -pc 0 1111 # force on CPU0
taskset -pc 1 1112 # force on CPU1
taskset -pc 2 1113 # force on CPU2
By using taskset right after rsync was started, the throughput of rsync went up from 36.5MiB to 40MiB. Though a 10% improvement, it was still nowhere near cat's performance. When forcing the three rsync processes to run on the same CPU, performance went down to 32MiB/s
rsync needs quite a lot of CPU power (both user and system time). Despite that, the on-demand frequency governor does not scale up the CPU frequency. We can force all cores to run at the highest frequency with:
for i in 0 1 2 3 ; do
echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
done
If the CPU-frequency is forced on the highest frequency (2.6GHz), the results for three rsyncs on a single core goes up: 62MiB/s. Combining this with the "spread the load" tactic using taskset, we even get up to 85MiB/s. Still 15% less than other copy programs, but more than a two-fold performance increase compared to the default situation.
The conclusion is that in the default situation, using cp over rsync will give you almost threefold better performance. However, a little tinkering with the scheduler (using taskset) and the cpufreq governor can get you a twofold performance improvement with rsync, but still only two-thirds that of cp.
Summarizing the results of the test with rsync:
Throughput CPUs Core frequency 22MiB/s 1-3 0.8GHz 23MiB/s 1 0.8GHz 34MiB/s 1 ondemand 37MiB/s 1-3 ondemand << default 39MiB/s 3 0.8GHz 40MiB/s 3 ondemand 62MiB/s 1 2.6GHz 62MiB/s 1-3 2.6GHz 85MiB/s 3 2.6GHz
In this table, the second column shows how the rsyncs were distributed over the cores. 1 CPU means the three rsyncs were forced on the same one single CPU. 1-3 CPUs means the scheduler could do what it saw fit. And finally when the three rsyncs were each forced on their own CPU, the table shows 3 CPUs.
It is clear that the default setting are not the worst settings, but close to it.
The bad behavior can be seen using cpufreq_stats. After loading the module:
modprobe cpufreq_stats
it is possible to see how much time was spent in each frequency by
which core. If we look at the results after the rsync command, we
see for CPU 2:
$ cat /sys/devices/system/cpu/cpu2/cpufreq/stats/time_in_state
2600000 423293
1900000 363
1400000 534
800000 6645805
The frequency (in 1000Hz units) is the first column, while the time (in
10ms units) is the second column.
Since the module was loaded, CPU2 has spent most time on the lowest
frequency, despite the fact that rsync really is quite CPU-intensive.
After all these results, I decided to give Arjan's patches a try. I compiled kernel version 2.6.35-rc3 that has the patches incorporated and used that instead of the 2.6.27.41-170.2.117 kernel Fedora 10 was running when the original problem popped up. For comparison, I also ran the tests with a more recent kernel that does not incorporate Arjan's patches: 2.6.34
I could immediately see (in atop) that the three rsync processes were on separate processors most of the time. The newer kernels apparently are better at spreading the load. However, this is not a great help:
FC10 2.6.34 2.6.35-rc3 CPUs Frequency MiB/s MiB/s MiB/s 23.12 28.85 28.07 1 0.8GHz 22.19 44.23 45.25 1-3 0.8GHz 38.62 43.39 43.75 3 0.8GHz 34.01 55.48 57.37 1 ondemand 36.52 44.85 45.08 1-3 ondemand <<default 39.73 43.65 44.30 3 ondemand 62.37 66.67 68.52 1 2.6GHz 62.15 92.34 91.84 1-3 2.6GHz 85.47 89.79 89.42 3 2.6GHz
The newer kernels are better at spreading the processes over the cores. However, this is hindering Arjan van de Ven's patch from doing its work. The patch does indeed work when all rsync processes run on a single CPU. But because the new kernel does a better job of spreading the processes over CPUs, Arjan's frequency increase does not occur. Arjan is working on an entirely new governor that may be better at raising the CPU's frequency when doing a lot of disk I/O.
A look at rsync performance
Posted Aug 19, 2010 0:53 UTC (Thu) by coopj (subscriber, #1139) [Link]
A look at rsync performance
Posted Aug 19, 2010 1:28 UTC (Thu) by jdub (subscriber, #27) [Link]
A look at rsync performance
Posted Aug 19, 2010 1:46 UTC (Thu) by joey (subscriber, #328) [Link]
Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a
whole-file checksum that is generated as the file is trans‐
ferred
I think that means both the client and server sides checksum the file,
even if rsync is running locally. Thus cpu usage, etc.
Most of the reason to use rsync locally is its nice interface. It should be possible to have a rsync varient that omits using the checksums, and simply overwrites the destination file always, like cp -- but with the rest of the rsync interface left intact. That should be much faster on some hardware.
For example, I have an arm fileserver that I used to use to rsync data to an external usb disk. It turns out to be faster to run rsync on a faster (intel) client, even though it has to get the data over NFS..
Since md4 tends to be 50% or so faster than md5, running rsync with --protocol=29 may also be a nice way to speed it up.
A look at rsync performance
Posted Aug 19, 2010 2:30 UTC (Thu) by Trelane (✭ supporter ✭, #56877) [Link]
I disagree rather strenuously. IMHO, the main reason to use rsync locally is if you're trying to copy over an update (e.g. you have a camcorder with a bunch of videos that you've previously copied, and some new videos that you've not, or backing up a large dataset that has a number of files that are an update or are new since the previous copy to backup.)
"It should be possible to have a rsync varient that omits using the checksums, and simply overwrites the destination file always, like cp -- but with the rest of the rsync interface left intact"
Or you could use the right tool for the job, e.g. tar or cp. If the files are entirely new, there's no point in using rsync; there's no need to calculate any checksums (unless you're verifying the integrity of the copy perhaps).
A look at rsync performance
Posted Aug 19, 2010 2:33 UTC (Thu) by joey (subscriber, #328) [Link]
http://git.kitenet.net/?p=joey/home.git;a=blob_plain;f=bi...
It takes the same options as rsync, except the src and dest directories
must be specified as the first 2 parameters. And neither directory can be remote.
It operates by simply using rsync --dry-run to determine which files need to be updated, and then copying them to the dest directory using cp. rsync is run at the end to handle everything else.
Testing on my laptop, rsync takes 19 seconds to sync a directory containing a 260 mb file. local-rsync takes 8 seconds. Roughly in line with the benchmarks in this article.
A look at rsync performance
Posted Aug 19, 2010 15:16 UTC (Thu) by jcvw (subscriber, #50475) [Link]
A look at rsync performance
Posted Sep 4, 2010 8:53 UTC (Sat) by llloic (subscriber, #5331) [Link]
Note that rsync already do the "overwrites the destination file always, likecp" as you said with the --whole-file option, which is according "the default when both the source and destination are specified as local paths", quoting this man page.
As far as I understand, when rsync acts on local files, in addition to a "normal" cp, rsync is only computing the whole file checksum.
A look at rsync performance
Posted Aug 19, 2010 1:47 UTC (Thu) by glennc99 (guest, #6993) [Link]
I fail to see why this should surprise you. I suppose one could have two entirely different algorithms, and have the 'front end' logic detect that you should have run something else, and proceed to pretend to be that other program, but why bother?
A look at rsync performance
Posted Aug 19, 2010 6:13 UTC (Thu) by tajyrink (subscriber, #2750) [Link]
A look at rsync performance
Posted Dec 1, 2015 20:33 UTC (Tue) by alankila (guest, #47141) [Link]
One case occurs with emulators and video players that need to either generate a frame by a deadline, or will skip that frame entirely and attempt to meet the next frame's deadline. Because meeting a deadline implies running out of work and thus sleeping for some time, ondemand has a habit of lowering CPU speed until the realtime task is running right at the edge of feasibility with little margin (e.g. CPU is 80 % consumed and 20 % idle), and once something happens then frame skipping begins, and this causes CPU to sleep close to 50 % of the time because it is now only rendering every second frame, and that causes ondemand to lower the CPU speed further.
This is one of the most miserable CPU governors ever invented.
A look at rsync performance
Posted Dec 1, 2015 20:48 UTC (Tue) by alankila (guest, #47141) [Link]
I hope ondemand has been reworked many times since something like 2008 which was the last time I really tried using Linux on desktop system, but I remember that when I got an Android phone, one of the reasons why it didn't seem to react very well was again ondemand, and one of the fixes in -- was it -- Android 4.0 was to set the CPU speed to max as soon as user finger touches screen.
I would suggest that kernel offers a way for applications to signal that they are behaving poorly because they do not have enough CPU power available, or some way to request that no powersaving for CPU is to be used while they are running. Fancy algorithms trying to determine CPU speed just suck.
A look at rsync performance
Posted Dec 2, 2015 20:54 UTC (Wed) by flussence (subscriber, #85566) [Link]
The Android governor you mention is called "cpufreq-interactive" — it's been doing the rounds on phones for years, but a few weeks back I saw it posted in the kernel patches section on this site, so there's some hope our regular desktops might soon benefit from interactivity.
A look at rsync performance
Posted Dec 2, 2015 21:34 UTC (Wed) by bronson (guest, #4806) [Link]
Closing comments
Posted Dec 2, 2015 21:35 UTC (Wed) by corbet (editor, #1) [Link]
That's good to hear...it's something I've contemplated occasionally. Comments on really old articles have a high probability of being spam, so it's tempting to turn them off.
A look at rsync performance
Posted Aug 19, 2010 7:58 UTC (Thu) by Darkmere (subscriber, #53695) [Link]
Read this as a very detailed bug report to the compound of the system. Explaining both in part what goes wrong, where it goes wrong, and something about how to fix parts of it.
A look at rsync performance
Posted Aug 19, 2010 8:52 UTC (Thu) by ewen (subscriber, #4772) [Link]
I think this demonstrates that while the rsync algorithm helps bringing two large only somewhat differing files efficiently into sync, it performs suboptimally in the case of large directories that differ in the presence/absence of large files. Which is something the rsync algorithm wasn't really designed to handle. But something the rsync program could be relatively easily enhanced to handle (as Joey's "local-rsync" further up in the comments shows).
Ewen
A look at rsync performance
Posted Aug 19, 2010 9:15 UTC (Thu) by Liefting (guest, #8466) [Link]
tar -cvf - * | (cd /someotherdir; tar -xvf -)
Or, when going through a network, tar in combination with nc:
receiver# nc -l 1234 | tar -xvf -
sender# tar -cvf - * | nc receiver 1234
I wonder how this stacks up against the other methods. At the very least it doesn't seem to be CPU-bound, but disk I/O bound or, in case of the network copy, network-bound.
A look at rsync performance
Posted Aug 19, 2010 9:55 UTC (Thu) by spaetz (subscriber, #32870) [Link]
At the danger of becoming off-topic. Why would you do something like that over a simple "cp"? I honestly would be interested in learning why tar is better in that case.
A look at rsync performance
Posted Aug 19, 2010 10:35 UTC (Thu) by dafid_b (guest, #67424) [Link]
I dont think that is part of cp.
A look at rsync performance
Posted Aug 19, 2010 10:48 UTC (Thu) by spaetz (subscriber, #32870) [Link]
but perhaps that is missing out things that tar manages to preserve. Not sure. Just curious in any case.
A look at rsync performance
Posted Aug 19, 2010 15:08 UTC (Thu) by bronson (guest, #4806) [Link]
Nowadays cp -a works well everywhere (in my experience) so there's no need to resort to tar. It's just damage from the Unix wars.
A look at rsync performance
Posted Aug 19, 2010 19:31 UTC (Thu) by pj (subscriber, #4506) [Link]
tar cf - . | ssh user@remote "cd /dest/dir; tar xf -)
or
(ssh user@remote "cd /src/dir ; tar cf - . ") | (cd /dest/dir; tar xf -)
A look at rsync performance
Posted Aug 20, 2010 11:53 UTC (Fri) by NAR (subscriber, #1313) [Link]
A look at rsync performance
Posted Aug 20, 2010 12:28 UTC (Fri) by dsommers (subscriber, #55274) [Link]
A look at rsync performance
Posted Aug 20, 2010 14:52 UTC (Fri) by spaetz (subscriber, #32870) [Link]
Only because you open a new ssh connection per file by default and tar+ssh opens only one. Which causes lots of overhead. If you reuse your ssh connection scp will be fast as well:
http://www.debian-administration.org/articles/290
A look at rsync performance
Posted Aug 21, 2010 2:05 UTC (Sat) by dmag (guest, #17775) [Link]
A look at rsync performance
Posted Aug 24, 2010 20:01 UTC (Tue) by BackSeat (guest, #1886) [Link]
No need for all the "cd" commands:tar -C /src/dir -cf - . | tar -C /dest/dir -xf -
A look at rsync performance
Posted Jul 29, 2018 23:58 UTC (Sun) by bentpointer (guest, #126033) [Link]
A look at rsync performance
Posted Jul 30, 2018 8:30 UTC (Mon) by farnz (subscriber, #17727) [Link]
Why not? It does for me:
$ cd source $ touch "i am a fish" $ cd .. $ mkdir dest $ tar -C source/ -cf - . | tar -C dest/ -xf - $ cd dest/ $ ls i am a fish $ ls -l total 0 -rw-r--r-- 1 farnz users 0 30 Jul 09:28 i am a fish
A look at rsync performance
Posted Aug 19, 2010 21:08 UTC (Thu) by evgeny (subscriber, #774) [Link]
A look at rsync performance
Posted Aug 25, 2010 3:09 UTC (Wed) by roelofs (guest, #2599) [Link]
Nowadays cp -a works well everywhere (in my experience) so there's no need to resort to tar.BSDs included? In my experience they've been mighty picky about the GNUisms (or "things that would have been GNUisms if someone else hadn't done them first") they're willing to implement. I remember being surprised by something along those lines just a couple of months ago, though I've forgotten the details already.
But perhaps cp -a came from BSD in the first place...
Greg
+1 Informative
Posted Aug 25, 2010 13:25 UTC (Wed) by dmarti (subscriber, #11625) [Link]
Just ssh-ed in to a FreeBSD 7.2 system -- `cp -a` works, and `-a` is in the man page.
A look at rsync performance
Posted Aug 19, 2010 10:50 UTC (Thu) by valhalla (subscriber, #56634) [Link]
A look at rsync performance
Posted Aug 19, 2010 10:56 UTC (Thu) by dafid_b (guest, #67424) [Link]
I would use the tar pipe of old as I expect it to build a proper copy in the new location.
Having read the cp manual page again (and again) I fear my confidence in tar might be misplaced :(.
Anyone know of a tutorial for each of the cp options?
A look at rsync performance
Posted Aug 21, 2010 6:16 UTC (Sat) by dirtyepic (subscriber, #30178) [Link]
A look at rsync performance
Posted Aug 19, 2010 10:53 UTC (Thu) by ewen (subscriber, #4772) [Link]
There are a bunch of reasons for using rsync as shorthand for "make these two directories the same", even without needing the rsync algorithm to synchronise changes within an individual file. And it seems to me that adding a special case for "whole new file" into the rsync program, that copied with maximum efficiency, would be valuable. Which I think was (one of) the points of the original article.
Ewen
PS: I use "tar -cpf - . | (cd /dest && tar -xpf -)" for a bunch of safety reasons, and to preserve at least some permissions. With GNU tar that'll copy most things; with traditional unix tar, less so, but it gets closer than most tools on traditional unix. (GNU cp has an "-a" extension which will also preserve most things.)
PPS: For the later questioner, using a tar pipeline historically had better performance because it scheduled two processes which kept more I/O in flight. I've not looked recently to see if that's still the case, and given the performance numbers in the article it may not be the case (eg, the kernel's readahead may do just as well, if not better).
A look at rsync performance
Posted Aug 19, 2010 22:53 UTC (Thu) by Comet (subscriber, #11646) [Link]
A look at rsync performance
Posted Aug 20, 2010 4:29 UTC (Fri) by jcvw (subscriber, #50475) [Link]
A look at rsync performance
Posted Aug 20, 2010 4:46 UTC (Fri) by dlang (guest, #313) [Link]
did you try tweaking the buffer sizes that rsync uses to see if larger buffers may smooth things out a bit?
I would still expect it to take significantly more CPU time than a plain cp, I'm not bothered by that (it's really designed for a different job that it excels at, this is a degenerate corner case for it), but the throughput should be higher.
while the approach is less efficient than the read()/write() in a tight loop, if it can keep the buffers between the processes full it should be able to do fairly well (as your testing shows when you tweak things so that the threads aren't waiting for each other much), so perhaps larger buffers can avoid the stalls.
A look at rsync performance
Posted Aug 19, 2010 11:55 UTC (Thu) by zmower (subscriber, #3005) [Link]
A look at rsync performance
Posted Aug 23, 2010 23:49 UTC (Mon) by bronson (guest, #4806) [Link]
A look at rsync performance
Posted Aug 20, 2010 12:01 UTC (Fri) by rvfh (subscriber, #31018) [Link]
(sorry for going OT)
A look at rsync performance
Posted Aug 23, 2010 10:39 UTC (Mon) by error27 (subscriber, #8346) [Link]
A look at rsync performance
Posted Aug 26, 2010 14:44 UTC (Thu) by ariveira (guest, #57833) [Link]
recently pax ( the posix archiver ) was brought to my attention
pax -rw . /someotherdir
Not to mention its awesome -s option.
A look at rsync performance
Posted Sep 8, 2010 15:52 UTC (Wed) by daenzer (subscriber, #7050) [Link]
A look at rsync performance
Posted Aug 19, 2010 15:19 UTC (Thu) by jcvw (subscriber, #50475) [Link]
A look at rsync performance
Posted Aug 19, 2010 11:25 UTC (Thu) by jengelh (guest, #33263) [Link]
http://tinyurl.com/35v8o6p
xcp -m|-s foo bar
A look at rsync performance
Posted Aug 20, 2010 22:56 UTC (Fri) by jmorris42 (guest, #2203) [Link]
A look at rsync performance
Posted Aug 21, 2010 9:32 UTC (Sat) by shlomif (guest, #11299) [Link]
In this module-authors@perl.org thread, the CPAN admins have complained that rsync does an equivalent of "find . -type f" over the network every time, and, as a result, reducing the number of files in the CPAN will yield a good benefit.
A look at rsync performance
Posted Aug 23, 2010 19:39 UTC (Mon) by knobunc (subscriber, #4678) [Link]
A look at rsync performance
Posted Aug 26, 2010 11:31 UTC (Thu) by chojrak11 (guest, #52056) [Link]
Note that -a does not preserve hardlinks, because finding multi-
ply-linked files is expensive. You must separately specify -H.
Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds