User: Password:
|
|
Subscribe / Log in / New account

A look at rsync performance

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

August 18, 2010

This article was contributed by JC van Winkel

The problem

Recently I bought a shiny new disk for my Fedora-10 based Mythtv system. I had to copy some 700GiB of video files from the old disk to the new one. I am used to rsync for this type of job as the rsync command and its accompanying options flow right from my fingers to the keyboard. However, I was not happy with what I saw, as the performance was nothing to write home about: the files were copied at about 37MiB/s. Both disks can handle about three times that speed — at least on the outer cylinders. That makes a lot of difference: an expected wait of just over two hours changed into a six hour ordeal. Note that both SATA disks were local to the system and no network was involved.

Measuring

Wanting to know what happened, I created a small test to see what was going on: copying a 10GiB file from one disk to the other. I made sure that the ext4 file systems involved were completely fresh so fragmentation could not play a part (a new mkfs after each test.) I also made sure that the test file systems were created on the outermost (and fastest) cylinders of the disks. Simply reading the source file could be done at 106MiB/s and writing a 10GiB file to the destination file system could be done at 134MiB/s.

The copy programs under test were rsync, cpio, cp, and cat. Of course I took care that the cache could not interfere by flushing the cache before each test, and waiting for the dirty buffers to be flushed to the destination disk after the test command completes. For example, when the SRC and DEST are variables holding the name of the source file in the current directory and the name of the destination directory:

    sync                               # flush dirty buffers to disk
    echo 3 > /proc/sys/vm/drop_caches  # discard caches
    time sh -c "cp $SRC $DEST; sync"   # measure cp and sync time

The echo command to /proc/sys/vm/drop_caches forces the invalidation of all non-dirty buffers in the page cache. To also force dirty pages to be flushed, we first use the sync command. The copy command will copy the 10GiB file, but it will actually finish before the last blocks have been flushed to disk. That is why we time the combination of the cp command and the sync command, which forces flushing the dirty blocks to disk.

The four commands tested were:

    rsync $SRC $DEST
    echo $SRC | cpio -p $DEST
    cp  $SRC $DEST
    cat $SRC > $DEST/$SRC

The results for rsync, cpio, cp, and cat were:

usersyselapsedhogMiB/stest
5.2477.92101.8681%100.53cpio
0.8553.77101.1254%101.27cp
1.7359.47100.8460%101.55cat
139.6993.50280.4083%36.52rsync

The observation that rsync was slow was indeed substantiated. Looking at the hog factor (the amount of cpu-time used relative to the elapsed time), we can conclude that rsync is not so much disk-bound (as is to be expected), but cpu-bound. That required some more scrutiny. The atop program showed rsync appears to need three processes: one process that does only disk reads, one that does only disk writes and one (I assume) control process that uses little CPU time and does no disk I/O.

Using strace, it can be shown that cp only uses read() and write() system calls in a tight loop, while rsync uses two processes that talk to each other using reads and writes through a socket, sprinkled with loads of select() system calls. To simulate the multiple processes, I then used multiple cat processes strung together using pipes. That test does not show the bad performance that rsync demonstrates. To test the influence of using a socket, I also created a TCP service using xinetd that just starts cat with its output redirected to a file to simulate the "network traffic." The client side:

    cat $SRC | nc localhost myservice
And the server side:
     cat > $DEST
Even this setup outperforms rsync. It achieves the same disk bandwidth as cp with a far lower CPU load than rsync.

The kernel plays a role too

On my 4-core AMD Athlon II X4 620 system, all three processes seem to run on the same CPU most of the time. But with help from the taskset command, it is possible to force processes on specific sets of processors (cores). Suppose the three rsync processes have PID's 1111, 1112, 1113, they are forced each on their own core by:

    taskset -pc 0 1111   # force on CPU0
    taskset -pc 1 1112   # force on CPU1
    taskset -pc 2 1113   # force on CPU2

By using taskset right after rsync was started, the throughput of rsync went up from 36.5MiB to 40MiB. Though a 10% improvement, it was still nowhere near cat's performance. When forcing the three rsync processes to run on the same CPU, performance went down to 32MiB/s

rsync needs quite a lot of CPU power (both user and system time). Despite that, the on-demand frequency governor does not scale up the CPU frequency. We can force all cores to run at the highest frequency with:

    for i in 0 1 2 3 ; do
      echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
    done

If the CPU-frequency is forced on the highest frequency (2.6GHz), the results for three rsyncs on a single core goes up: 62MiB/s. Combining this with the "spread the load" tactic using taskset, we even get up to 85MiB/s. Still 15% less than other copy programs, but more than a two-fold performance increase compared to the default situation.

The conclusion is that in the default situation, using cp over rsync will give you almost threefold better performance. However, a little tinkering with the scheduler (using taskset) and the cpufreq governor can get you a twofold performance improvement with rsync, but still only two-thirds that of cp.

Summarizing the results of the test with rsync:

ThroughputCPUsCore frequency
22MiB/s1-30.8GHz
23MiB/s10.8GHz
34MiB/s1ondemand
37MiB/s1-3ondemand<< default
39MiB/s30.8GHz
40MiB/s3ondemand
62MiB/s12.6GHz
62MiB/s1-32.6GHz
85MiB/s32.6GHz

In this table, the second column shows how the rsyncs were distributed over the cores. 1 CPU means the three rsyncs were forced on the same one single CPU. 1-3 CPUs means the scheduler could do what it saw fit. And finally when the three rsyncs were each forced on their own CPU, the table shows 3 CPUs.

It is clear that the default setting are not the worst settings, but close to it.

The future

An LWN article described problems that the ondemand scheduler has in choosing the right CPU frequency for processes that do a lot of I/O and need a lot of CPU power (like rsync). In the time the processor is waiting for the I/O to finish, the clock frequency is scaled down almost immediately. But when the disk request finishes, and the process continues using the CPU, the ondemand governor waits too long in scaling the frequency back up again. Arjan van de Ven of the Intel Open Source Technology Centre has made changes to the ondemand governor that won't scale the CPU down until the CPU is really idle, and not just waiting for fast I/O.

The bad behavior can be seen using cpufreq_stats. After loading the module:

    modprobe cpufreq_stats
it is possible to see how much time was spent in each frequency by which core. If we look at the results after the rsync command, we see for CPU 2:
    $ cat /sys/devices/system/cpu/cpu2/cpufreq/stats/time_in_state
    2600000 423293
    1900000 363
    1400000 534
    800000 6645805
The frequency (in 1000Hz units) is the first column, while the time (in 10ms units) is the second column. Since the module was loaded, CPU2 has spent most time on the lowest frequency, despite the fact that rsync really is quite CPU-intensive.

After all these results, I decided to give Arjan's patches a try. I compiled kernel version 2.6.35-rc3 that has the patches incorporated and used that instead of the 2.6.27.41-170.2.117 kernel Fedora 10 was running when the original problem popped up. For comparison, I also ran the tests with a more recent kernel that does not incorporate Arjan's patches: 2.6.34

I could immediately see (in atop) that the three rsync processes were on separate processors most of the time. The newer kernels apparently are better at spreading the load. However, this is not a great help:

FC10  2.6.34  2.6.35-rc3  CPUs  Frequency
MiB/sMiB/sMiB/s
23.1228.8528.0710.8GHz
22.1944.2345.251-30.8GHz
38.6243.3943.7530.8GHz
34.0155.4857.371ondemand
36.5244.8545.081-3ondemand <<default
39.7343.6544.303ondemand
62.3766.6768.5212.6GHz
62.1592.3491.841-32.6GHz
85.4789.7989.4232.6GHz

Conclusions

One thing is clear: I should upgrade the kernel on my Mythtv system. In general, the 2.6.34 and 2.6.35-rc3 kernels give better performance than the old 2.6.27 kernel. But, tinkering or not, rsync can still not beat a simple cp that copies at over 100MiB/s. Indeed, rsync really needs a lot of CPU power for simple local copies. At the highest frequency, cp only needed 0.34+20.95 seconds CPU time, compared with rsync's 70+55 seconds.

The newer kernels are better at spreading the processes over the cores. However, this is hindering Arjan van de Ven's patch from doing its work. The patch does indeed work when all rsync processes run on a single CPU. But because the new kernel does a better job of spreading the processes over CPUs, Arjan's frequency increase does not occur. Arjan is working on an entirely new governor that may be better at raising the CPU's frequency when doing a lot of disk I/O.


(Log in to post comments)

A look at rsync performance

Posted Aug 19, 2010 0:53 UTC (Thu) by coopj (subscriber, #1139) [Link]

rsync uses encryption. Depending on what algorithm you use it can
make a big difference. I generally find even a 50 percent drop in
throughput compared to cp. You don't seem to be able to turn off
encryption altogether, but you can use an algorithm with lower cpu
usage, like blowfish.

A look at rsync performance

Posted Aug 19, 2010 1:28 UTC (Thu) by jdub (subscriber, #27) [Link]

You're confusing 100% local rsync for rsync over SSH. :-)

A look at rsync performance

Posted Aug 19, 2010 1:46 UTC (Thu) by joey (subscriber, #328) [Link]

No encryption of course, but it *does* calculate rolling checksums. Quoth the man page:

Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a
whole-file checksum that is generated as the file is trans&#8208;
ferred

I think that means both the client and server sides checksum the file,
even if rsync is running locally. Thus cpu usage, etc.

Most of the reason to use rsync locally is its nice interface. It should be possible to have a rsync varient that omits using the checksums, and simply overwrites the destination file always, like cp -- but with the rest of the rsync interface left intact. That should be much faster on some hardware.

For example, I have an arm fileserver that I used to use to rsync data to an external usb disk. It turns out to be faster to run rsync on a faster (intel) client, even though it has to get the data over NFS..

Since md4 tends to be 50% or so faster than md5, running rsync with --protocol=29 may also be a nice way to speed it up.

A look at rsync performance

Posted Aug 19, 2010 2:30 UTC (Thu) by Trelane (✭ supporter ✭, #56877) [Link]

"Most of the reason to use rsync locally is its nice interface."

I disagree rather strenuously. IMHO, the main reason to use rsync locally is if you're trying to copy over an update (e.g. you have a camcorder with a bunch of videos that you've previously copied, and some new videos that you've not, or backing up a large dataset that has a number of files that are an update or are new since the previous copy to backup.)

"It should be possible to have a rsync varient that omits using the checksums, and simply overwrites the destination file always, like cp -- but with the rest of the rsync interface left intact"

Or you could use the right tool for the job, e.g. tar or cp. If the files are entirely new, there's no point in using rsync; there's no need to calculate any checksums (unless you're verifying the integrity of the copy perhaps).

A look at rsync performance

Posted Aug 19, 2010 2:33 UTC (Thu) by joey (subscriber, #328) [Link]

I've written a simple rsync accellerator script, local-rsync:

http://git.kitenet.net/?p=joey/home.git;a=blob_plain;f=bi...

It takes the same options as rsync, except the src and dest directories
must be specified as the first 2 parameters. And neither directory can be remote.

It operates by simply using rsync --dry-run to determine which files need to be updated, and then copying them to the dest directory using cp. rsync is run at the end to handle everything else.

Testing on my laptop, rsync takes 19 seconds to sync a directory containing a 260 mb file. local-rsync takes 8 seconds. Roughly in line with the benchmarks in this article.

A look at rsync performance

Posted Aug 19, 2010 15:16 UTC (Thu) by jcvw (subscriber, #50475) [Link]

Note that checksumming does not explain the excess use of system time as the checksumming is done in userspace

A look at rsync performance

Posted Sep 4, 2010 8:53 UTC (Sat) by llloic (subscriber, #5331) [Link]

"It should be possible to have a rsync varient that omits using the checksums, and simply overwrites the destination file always, like cp -- but with the rest of the rsync interface left intact. That should be much faster on some hardware."

Note that rsync already do the "overwrites the destination file always, likecp" as you said with the --whole-file option, which is according "the default when both the source and destination are specified as local paths", quoting this man page.

As far as I understand, when rsync acts on local files, in addition to a "normal" cp, rsync is only computing the whole file checksum.

A look at rsync performance

Posted Aug 19, 2010 1:47 UTC (Thu) by glennc99 (guest, #6993) [Link]

So, if I understand you correctly, you've discovered that a tool optimized for bringing two large, mostly-similar directory structures into a completely similar state is not the best one to use when you're trying to copy a large tree into an empty target.

I fail to see why this should surprise you. I suppose one could have two entirely different algorithms, and have the 'front end' logic detect that you should have run something else, and proceed to pretend to be that other program, but why bother?

A look at rsync performance

Posted Aug 19, 2010 6:13 UTC (Thu) by tajyrink (subscriber, #2750) [Link]

I don't think that's interesting, but how badly ondemand works is interesting.

A look at rsync performance

Posted Dec 1, 2015 20:33 UTC (Tue) by alankila (guest, #47141) [Link]

Ondemand is a scheduler that has many failure cases.

One case occurs with emulators and video players that need to either generate a frame by a deadline, or will skip that frame entirely and attempt to meet the next frame's deadline. Because meeting a deadline implies running out of work and thus sleeping for some time, ondemand has a habit of lowering CPU speed until the realtime task is running right at the edge of feasibility with little margin (e.g. CPU is 80 % consumed and 20 % idle), and once something happens then frame skipping begins, and this causes CPU to sleep close to 50 % of the time because it is now only rendering every second frame, and that causes ondemand to lower the CPU speed further.

This is one of the most miserable CPU governors ever invented.

A look at rsync performance

Posted Dec 1, 2015 20:48 UTC (Tue) by alankila (guest, #47141) [Link]

Addendum. Forgot to mention that I worked around this by setting the ondemand cpu threshold ratio quite low, to something like 30 %. That is, if CPU usage was > 30 % then it would heuristically find a higher CPU clock speed. The default settings were more like 90 % or 70 %, which were causing unnecessary problems for that particular type of realtime applications.

I hope ondemand has been reworked many times since something like 2008 which was the last time I really tried using Linux on desktop system, but I remember that when I got an Android phone, one of the reasons why it didn't seem to react very well was again ondemand, and one of the fixes in -- was it -- Android 4.0 was to set the CPU speed to max as soon as user finger touches screen.

I would suggest that kernel offers a way for applications to signal that they are behaving poorly because they do not have enough CPU power available, or some way to request that no powersaving for CPU is to be used while they are running. Fancy algorithms trying to determine CPU speed just suck.

A look at rsync performance

Posted Dec 2, 2015 20:54 UTC (Wed) by flussence (subscriber, #85566) [Link]

That's the same experience I've had too. In the end I gave up and yanked cpufreq out of my kernel config entirely.

The Android governor you mention is called "cpufreq-interactive" — it's been doing the rounds on phones for years, but a few weeks back I saw it posted in the kernel patches section on this site, so there's some hope our regular desktops might soon benefit from interactivity.

A look at rsync performance

Posted Dec 2, 2015 21:34 UTC (Wed) by bronson (guest, #4806) [Link]

I really like that LWN never closes comments on articles. Still things to discuss 5.5 years later.

Closing comments

Posted Dec 2, 2015 21:35 UTC (Wed) by corbet (editor, #1) [Link]

That's good to hear...it's something I've contemplated occasionally. Comments on really old articles have a high probability of being spam, so it's tempting to turn them off.

A look at rsync performance

Posted Aug 19, 2010 7:58 UTC (Thu) by Darkmere (subscriber, #53695) [Link]

No, I think the take-home of this isn't the fact that "rsync which does checksumming and read comparsions is slower than tar/cp/cat" but rather that "the kernel does strange things around rsync and can be made to perform better"

Read this as a very detailed bug report to the compound of the system. Explaining both in part what goes wrong, where it goes wrong, and something about how to fix parts of it.

A look at rsync performance

Posted Aug 19, 2010 8:52 UTC (Thu) by ewen (subscriber, #4772) [Link]

If your "large, mostly similar directory structures" differ by some large files which are present in the source and not in the destination then you'd still want many of the things that rsync does (versus, eg, cp or tar), but you'd also prefer that the large new files were copied as quickly as possible; 85MB/s would be much nicer than 36MB/s, and full disk bandwidth would be better still. Based on my experiences I'd think that's actually a moderately common use case. The "destination directory is completely empty" then becomes a special case of "some large files to copy entirely".

I think this demonstrates that while the rsync algorithm helps bringing two large only somewhat differing files efficiently into sync, it performs suboptimally in the case of large directories that differ in the presence/absence of large files. Which is something the rsync algorithm wasn't really designed to handle. But something the rsync program could be relatively easily enhanced to handle (as Joey's "local-rsync" further up in the comments shows).

Ewen

A look at rsync performance

Posted Aug 19, 2010 9:15 UTC (Thu) by Liefting (guest, #8466) [Link]

For bulk copies like this, I tend to use tar over a shell pipe:

tar -cvf - * | (cd /someotherdir; tar -xvf -)

Or, when going through a network, tar in combination with nc:

receiver# nc -l 1234 | tar -xvf -
sender# tar -cvf - * | nc receiver 1234

I wonder how this stacks up against the other methods. At the very least it doesn't seem to be CPU-bound, but disk I/O bound or, in case of the network copy, network-bound.

A look at rsync performance

Posted Aug 19, 2010 9:55 UTC (Thu) by spaetz (subscriber, #32870) [Link]

> tar -cvf - * | (cd /someotherdir; tar -xvf -

At the danger of becoming off-topic. Why would you do something like that over a simple "cp"? I honestly would be interested in learning why tar is better in that case.

A look at rsync performance

Posted Aug 19, 2010 10:35 UTC (Thu) by dafid_b (guest, #67424) [Link]

when done as root, the tar pipe preserves ownership details etc..

I dont think that is part of cp.

A look at rsync performance

Posted Aug 19, 2010 10:48 UTC (Thu) by spaetz (subscriber, #32870) [Link]

I commonly use "cp -a" which includes --preserve=all: preserve the specified attributes: mode, ownership,time-stamps, context, links, xattr, all

but perhaps that is missing out things that tar manages to preserve. Not sure. Just curious in any case.

A look at rsync performance

Posted Aug 19, 2010 15:08 UTC (Thu) by bronson (guest, #4806) [Link]

It's a graybeard thing. 20 years ago, cp would screw up permissions, dates, ownership, symlinks, device files, etc. Different platforms would require different command-line options and then screw up different things. It was insane. Tar, on the other hand, pretty much got it right on every platform.

Nowadays cp -a works well everywhere (in my experience) so there's no need to resort to tar. It's just damage from the Unix wars.

A look at rsync performance

Posted Aug 19, 2010 19:31 UTC (Thu) by pj (subscriber, #4506) [Link]

One advantage is that it's easily modified to work over ssh:

tar cf - . | ssh user@remote "cd /dest/dir; tar xf -)

or

(ssh user@remote "cd /src/dir ; tar cf - . ") | (cd /dest/dir; tar xf -)

A look at rsync performance

Posted Aug 20, 2010 11:53 UTC (Fri) by NAR (subscriber, #1313) [Link]

A 'cp' command can also be easily modified to work over the network, just add an 's' to the front of 'cp' :-)

A look at rsync performance

Posted Aug 20, 2010 12:28 UTC (Fri) by dsommers (subscriber, #55274) [Link]

True ... but if you have a lot of files, especially smaller files, the tar path with ssh is way faster than scp. Try copying a git repository (~2-3MB) from one site to another site. My experiences is that tar+ssh beats scp significantly.

A look at rsync performance

Posted Aug 20, 2010 14:52 UTC (Fri) by spaetz (subscriber, #32870) [Link]

> but if you have a lot of files, especially smaller files, the tar path with ssh is way faster than scp. Try copying a git repository (~2-3MB) from one site to another site. My experiences is that tar+ssh beats scp significantly.

Only because you open a new ssh connection per file by default and tar+ssh opens only one. Which causes lots of overhead. If you reuse your ssh connection scp will be fast as well:
http://www.debian-administration.org/articles/290

A look at rsync performance

Posted Aug 21, 2010 2:05 UTC (Sat) by dmag (guest, #17775) [Link]

No, scp can't copy symlinks.

A look at rsync performance

Posted Aug 24, 2010 20:01 UTC (Tue) by BackSeat (guest, #1886) [Link]

No need for all the "cd" commands:
tar -C /src/dir -cf - . | tar -C /dest/dir -xf -

A look at rsync performance

Posted Jul 29, 2018 23:58 UTC (Sun) by bentpointer (guest, #126033) [Link]

I don't think the tar extract works with files with spaces in the name

A look at rsync performance

Posted Jul 30, 2018 8:30 UTC (Mon) by farnz (subscriber, #17727) [Link]

Why not? It does for me:

$ cd source
$ touch "i am a fish"
$ cd ..
$ mkdir dest
$ tar -C source/ -cf - . | tar -C dest/ -xf -
$ cd dest/
$ ls
i am a fish
$ ls -l
total 0
-rw-r--r--  1 farnz  users  0 30 Jul 09:28 i am a fish

A look at rsync performance

Posted Aug 19, 2010 21:08 UTC (Thu) by evgeny (subscriber, #774) [Link]

There is one thing tar and cp -a do differently, which, depending on what you do could be either a feature or a misfeature. tar tries to restore files according to _literal_ names of the owner (if they exist; and they do by default). This can be overridden with the --numeric-owner flag. E.g. if you forget to specify this flag and untar a backup of a virtual container from the host, you'll end up with a mess of file ownerships. Was bitten by this once...

A look at rsync performance

Posted Aug 25, 2010 3:09 UTC (Wed) by roelofs (guest, #2599) [Link]

Nowadays cp -a works well everywhere (in my experience) so there's no need to resort to tar.

BSDs included? In my experience they've been mighty picky about the GNUisms (or "things that would have been GNUisms if someone else hadn't done them first") they're willing to implement. I remember being surprised by something along those lines just a couple of months ago, though I've forgotten the details already.

But perhaps cp -a came from BSD in the first place...

Greg

+1 Informative

Posted Aug 25, 2010 13:25 UTC (Wed) by dmarti (subscriber, #11625) [Link]

Just ssh-ed in to a FreeBSD 7.2 system -- `cp -a` works, and `-a` is in the man page.

A look at rsync performance

Posted Aug 19, 2010 10:50 UTC (Thu) by valhalla (subscriber, #56634) [Link]

cp -p preserves mode, ownership and timestamps, and the --preserve option can be used to do a finer selection of what should be preserved.

A look at rsync performance

Posted Aug 19, 2010 10:56 UTC (Thu) by dafid_b (guest, #67424) [Link]

ok - i read the fine cp manual page and now think that preserving ownership details etc are part of cp. However i am still confused as many of the notes in the manual page refer to topics I know nothing about.

I would use the tar pipe of old as I expect it to build a proper copy in the new location.

Having read the cp manual page again (and again) I fear my confidence in tar might be misplaced :(.

Anyone know of a tutorial for each of the cp options?

A look at rsync performance

Posted Aug 21, 2010 6:16 UTC (Sat) by dirtyepic (subscriber, #30178) [Link]

for me, --exclude.

A look at rsync performance

Posted Aug 19, 2010 10:53 UTC (Thu) by ewen (subscriber, #4772) [Link]

If you have a directory with, say, 100 * 2GB files in it, and another directory which has 96 of those files, and a few older ones, then using your tar pipeline requires transfering 200GB of data -- but using rsync only requires transferring 8GB of data. I know which I'd prefer. (And the tar technique still leaves you having to figure out which files no longer belong and remove them.)

There are a bunch of reasons for using rsync as shorthand for "make these two directories the same", even without needing the rsync algorithm to synchronise changes within an individual file. And it seems to me that adding a special case for "whole new file" into the rsync program, that copied with maximum efficiency, would be valuable. Which I think was (one of) the points of the original article.

Ewen

PS: I use "tar -cpf - . | (cd /dest && tar -xpf -)" for a bunch of safety reasons, and to preserve at least some permissions. With GNU tar that'll copy most things; with traditional unix tar, less so, but it gets closer than most tools on traditional unix. (GNU cp has an "-a" extension which will also preserve most things.)

PPS: For the later questioner, using a tar pipeline historically had better performance because it scheduled two processes which kept more I/O in flight. I've not looked recently to see if that's still the case, and given the performance numbers in the article it may not be the case (eg, the kernel's readahead may do just as well, if not better).

A look at rsync performance

Posted Aug 19, 2010 22:53 UTC (Thu) by Comet (subscriber, #11646) [Link]

-W, --whole-file copy files whole (without rsync algorithm)

A look at rsync performance

Posted Aug 20, 2010 4:29 UTC (Fri) by jcvw (subscriber, #50475) [Link]

I tried that. It doesn't help. The amount of user and system time is still incredibly high (compared to a simple cp). rsync doesn't use a tight read/write loop, like cp does, but (even in local cases) uses two processes, a socket and lots of select system calls. The -W doesn't change anything there (unfortunately).

A look at rsync performance

Posted Aug 20, 2010 4:46 UTC (Fri) by dlang (guest, #313) [Link]

my first reaction on reading this is that the processes are stalling, and the behavior you describe to improve performance sounds like it's on the same tack.

did you try tweaking the buffer sizes that rsync uses to see if larger buffers may smooth things out a bit?

I would still expect it to take significantly more CPU time than a plain cp, I'm not bothered by that (it's really designed for a different job that it excels at, this is a degenerate corner case for it), but the throughput should be higher.

while the approach is less efficient than the read()/write() in a tight loop, if it can keep the buffers between the processes full it should be able to do fairly well (as your testing shows when you tweak things so that the threads aren't waiting for each other much), so perhaps larger buffers can avoid the stalls.

A look at rsync performance

Posted Aug 19, 2010 11:55 UTC (Thu) by zmower (subscriber, #3005) [Link]

Using netcat looks tricky (and less secure) compared to :
sender$ tar cf - * | ssh user@receiver tar -C $dir xvf -

A look at rsync performance

Posted Aug 23, 2010 23:49 UTC (Mon) by bronson (guest, #4806) [Link]

Yes, but it has WAY less overhead. I know there's a way to tell ssh to use a less CPU-heavy cipher but I always forget how.

A look at rsync performance

Posted Aug 20, 2010 12:01 UTC (Fri) by rvfh (subscriber, #31018) [Link]

Would this handle Ctrl-C better than cp? It's always a problem if you launch a 'cp -au' because the mtime of the file is only set once the copy is finished (for obvious reasons), so interrupting the copy leaves you with a broken file that's newer than anything else, and thus you cannot recover unless you find out which file it is (using find is an option, but can be slow over a big number of files).

(sorry for going OT)

A look at rsync performance

Posted Aug 23, 2010 10:39 UTC (Mon) by error27 (subscriber, #8346) [Link]

That should be "nc -q 0 receiver 1234". I wouldn't trust netcat for data I cared about...

A look at rsync performance

Posted Aug 26, 2010 14:44 UTC (Thu) by ariveira (guest, #57833) [Link]

> For bulk copies like this, I tend to use tar over a shell pipe:
> tar -cvf - * | (cd /someotherdir; tar -xvf -)

recently pax ( the posix archiver ) was brought to my attention

pax -rw . /someotherdir

Not to mention its awesome -s option.

A look at rsync performance

Posted Sep 8, 2010 15:52 UTC (Wed) by daenzer (subscriber, #7050) [Link]

Indeed, this seems like an rsync performance bug that should get fixed.

A look at rsync performance

Posted Aug 19, 2010 15:19 UTC (Thu) by jcvw (subscriber, #50475) [Link]

The same performance problems occur when retrieving data from an rsync package like so: rsync -a server::package/dir localdir.
By using the performance frequency governor (on both client and server) throughput is more than doubled on a gigabit network.

A look at rsync performance

Posted Aug 19, 2010 11:25 UTC (Thu) by jengelh (guest, #33263) [Link]

Now what you could have also checked was mmap and splice-based copying :-)

http://tinyurl.com/35v8o6p
xcp -m|-s foo bar

A look at rsync performance

Posted Aug 20, 2010 22:56 UTC (Fri) by jmorris42 (guest, #2203) [Link]

This isn't the only performance problem with rsync. I have a filesystem full of home directories. A 600G filesystem with about 500G filled. Doing an rsync between it and a backup server takes over 24 hours even though less than a GB typically changes between backups. Worse it consumes 1.6GB of ram and unless reniced makes the performance of the file server stink as it becomes entirely disk bound with a fair amount of CPU as well.

A look at rsync performance

Posted Aug 21, 2010 9:32 UTC (Sat) by shlomif (guest, #11299) [Link]

In this module-authors@perl.org thread, the CPAN admins have complained that rsync does an equivalent of "find . -type f" over the network every time, and, as a result, reducing the number of files in the CPAN will yield a good benefit.

A look at rsync performance

Posted Aug 23, 2010 19:39 UTC (Mon) by knobunc (subscriber, #4678) [Link]

What rsync flags are you using? I use -avP --delete to sync a multi-terabyte tree and it performs admirably.

A look at rsync performance

Posted Aug 26, 2010 11:31 UTC (Thu) by chojrak11 (guest, #52056) [Link]

-H for example can be a problem here. Again quote from the manpage:

Note that -a does not preserve hardlinks, because finding multi-
ply-linked files is expensive. You must separately specify -H.


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds