Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
tar -cvf - * | (cd /someotherdir; tar -xvf -)
Or, when going through a network, tar in combination with nc:
receiver# nc -l 1234 | tar -xvf -
sender# tar -cvf - * | nc receiver 1234
I wonder how this stacks up against the other methods. At the very least it doesn't seem to be CPU-bound, but disk I/O bound or, in case of the network copy, network-bound.
A look at rsync performance
Posted Aug 19, 2010 9:55 UTC (Thu) by spaetz (subscriber, #32870)
At the danger of becoming off-topic. Why would you do something like that over a simple "cp"? I honestly would be interested in learning why tar is better in that case.
Posted Aug 19, 2010 10:35 UTC (Thu) by dafid_b (guest, #67424)
I dont think that is part of cp.
Posted Aug 19, 2010 10:48 UTC (Thu) by spaetz (subscriber, #32870)
but perhaps that is missing out things that tar manages to preserve. Not sure. Just curious in any case.
Posted Aug 19, 2010 15:08 UTC (Thu) by bronson (subscriber, #4806)
Nowadays cp -a works well everywhere (in my experience) so there's no need to resort to tar. It's just damage from the Unix wars.
Posted Aug 19, 2010 19:31 UTC (Thu) by pj (subscriber, #4506)
tar cf - . | ssh user@remote "cd /dest/dir; tar xf -)
(ssh user@remote "cd /src/dir ; tar cf - . ") | (cd /dest/dir; tar xf -)
Posted Aug 20, 2010 11:53 UTC (Fri) by NAR (subscriber, #1313)
Posted Aug 20, 2010 12:28 UTC (Fri) by dsommers (subscriber, #55274)
Posted Aug 20, 2010 14:52 UTC (Fri) by spaetz (subscriber, #32870)
Only because you open a new ssh connection per file by default and tar+ssh opens only one. Which causes lots of overhead. If you reuse your ssh connection scp will be fast as well:
Posted Aug 21, 2010 2:05 UTC (Sat) by dmag (subscriber, #17775)
Posted Aug 24, 2010 20:01 UTC (Tue) by BackSeat (subscriber, #1886)
tar -C /src/dir -cf - . | tar -C /dest/dir -xf -
Posted Aug 19, 2010 21:08 UTC (Thu) by evgeny (guest, #774)
Posted Aug 25, 2010 3:09 UTC (Wed) by roelofs (guest, #2599)
BSDs included? In my experience they've been mighty picky about the GNUisms (or "things that would have been GNUisms if someone else hadn't done them first") they're willing to implement. I remember being surprised by something along those lines just a couple of months ago, though I've forgotten the details already.
But perhaps cp -a came from BSD in the first place...
Posted Aug 25, 2010 13:25 UTC (Wed) by dmarti (subscriber, #11625)
Posted Aug 19, 2010 10:50 UTC (Thu) by valhalla (subscriber, #56634)
Posted Aug 19, 2010 10:56 UTC (Thu) by dafid_b (guest, #67424)
I would use the tar pipe of old as I expect it to build a proper copy in the new location.
Having read the cp manual page again (and again) I fear my confidence in tar might be misplaced :(.
Anyone know of a tutorial for each of the cp options?
Posted Aug 21, 2010 6:16 UTC (Sat) by dirtyepic (subscriber, #30178)
Posted Aug 19, 2010 10:53 UTC (Thu) by ewen (subscriber, #4772)
There are a bunch of reasons for using rsync as shorthand for "make these two directories the same", even without needing the rsync algorithm to synchronise changes within an individual file. And it seems to me that adding a special case for "whole new file" into the rsync program, that copied with maximum efficiency, would be valuable. Which I think was (one of) the points of the original article.
PS: I use "tar -cpf - . | (cd /dest && tar -xpf -)" for a bunch of safety reasons, and to preserve at least some permissions. With GNU tar that'll copy most things; with traditional unix tar, less so, but it gets closer than most tools on traditional unix. (GNU cp has an "-a" extension which will also preserve most things.)
PPS: For the later questioner, using a tar pipeline historically had better performance because it scheduled two processes which kept more I/O in flight. I've not looked recently to see if that's still the case, and given the performance numbers in the article it may not be the case (eg, the kernel's readahead may do just as well, if not better).
Posted Aug 19, 2010 22:53 UTC (Thu) by Comet (subscriber, #11646)
Posted Aug 20, 2010 4:29 UTC (Fri) by jcvw (subscriber, #50475)
Posted Aug 20, 2010 4:46 UTC (Fri) by dlang (✭ supporter ✭, #313)
did you try tweaking the buffer sizes that rsync uses to see if larger buffers may smooth things out a bit?
I would still expect it to take significantly more CPU time than a plain cp, I'm not bothered by that (it's really designed for a different job that it excels at, this is a degenerate corner case for it), but the throughput should be higher.
while the approach is less efficient than the read()/write() in a tight loop, if it can keep the buffers between the processes full it should be able to do fairly well (as your testing shows when you tweak things so that the threads aren't waiting for each other much), so perhaps larger buffers can avoid the stalls.
Posted Aug 19, 2010 11:55 UTC (Thu) by zmower (subscriber, #3005)
Posted Aug 23, 2010 23:49 UTC (Mon) by bronson (subscriber, #4806)
Posted Aug 20, 2010 12:01 UTC (Fri) by rvfh (subscriber, #31018)
(sorry for going OT)
Posted Aug 23, 2010 10:39 UTC (Mon) by error27 (subscriber, #8346)
Posted Aug 26, 2010 14:44 UTC (Thu) by ariveira (guest, #57833)
recently pax ( the posix archiver ) was brought to my attention
pax -rw . /someotherdir
Not to mention its awesome -s option.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds