tar + rsync + untar. Any speed benefit over just rsync?Does tar actually compress files, or just group them...
Find this cartoon
USPS Back Room - Trespassing?
How to politely tell someone they did not hit "reply to all" in an email?
Why haven't we yet tried accelerating a space station with people inside to a near light speed?
How to cut a climbing rope?
Why A=2 and B=1 in the call signs for Spirit and Opportunity?
Do photons bend spacetime or not?
The art of clickbait captions
What's difference between "depends on" and "is blocked by" relations between issues in Jira next-gen board?
Why did Drogon spare this character?
Gravitational Force Between Numbers
Why didn't Thanos use the Time Stone to stop the Avengers' plan?
Is it legal to have an abortion in another state or abroad?
Is it possible to remotely hack the GPS system and disable GPS service worldwide?
Why was this character made Grand Maester?
Why isn't Tyrion mentioned in the in-universe book "A Song of Ice and Fire"?
WordPress 5.2.1 deactivated my jQuery
Can I tell a prospective employee that everyone in the team is leaving?
What are the conditions for RAA?
Is there a simple example that empirical evidence is misleading?
What Armor Optimization applies to a Mithral full plate?
On San Andreas Speedruns, why do players blow up the Picador in the mission Ryder?
Python program to take in two strings and print the larger string
Gravitational effects of a single human body on the motion of planets
tar + rsync + untar. Any speed benefit over just rsync?
Does tar actually compress files, or just group them together?How to compress and transfer folders from one server to another over networkRsync over Rsync via cronjobHow can I speed up operations on sparse files with tar, gzip, rsync?How to tar, split, pipe, untar over sshDoes rsync over any type of checksum?Rsync contents of tar with remote serverUntar File.tar.gz from Shell w/Nested .tarUnable to untar file after tar errorrsync optimized for speedrsync over NFS - inconsistent speedLimit tar read speed
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).
I was just wondering if there are reasons to believe that,
tar + rsync + untar
Or simply
tar (from src to dest) + untar
could be faster in practice than
rsync
when transferring the files for the first time.
I am interested in an answer that addresses the above in two scenarios: using compression and not using it.
Update
I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar
was consistently faster than running rsync
directly (both without compression).
rsync tar
|
show 1 more comment
I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).
I was just wondering if there are reasons to believe that,
tar + rsync + untar
Or simply
tar (from src to dest) + untar
could be faster in practice than
rsync
when transferring the files for the first time.
I am interested in an answer that addresses the above in two scenarios: using compression and not using it.
Update
I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar
was consistently faster than running rsync
directly (both without compression).
rsync tar
Are you running rsync in daemon mode at the other end?
– JBRWilkinson
Feb 5 '12 at 20:16
4
Re. your ancillary question:tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'
– Gilles
Feb 5 '12 at 22:57
3
Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size
– Tatjana Heuser
Feb 6 '12 at 19:08
Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.
– Amelio Vazquez-Reina
Feb 6 '12 at 19:22
1
I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.
– Tatjana Heuser
Feb 7 '12 at 10:00
|
show 1 more comment
I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).
I was just wondering if there are reasons to believe that,
tar + rsync + untar
Or simply
tar (from src to dest) + untar
could be faster in practice than
rsync
when transferring the files for the first time.
I am interested in an answer that addresses the above in two scenarios: using compression and not using it.
Update
I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar
was consistently faster than running rsync
directly (both without compression).
rsync tar
I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).
I was just wondering if there are reasons to believe that,
tar + rsync + untar
Or simply
tar (from src to dest) + untar
could be faster in practice than
rsync
when transferring the files for the first time.
I am interested in an answer that addresses the above in two scenarios: using compression and not using it.
Update
I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar
was consistently faster than running rsync
directly (both without compression).
rsync tar
rsync tar
edited Feb 3 '13 at 14:27
Amelio Vazquez-Reina
asked Feb 5 '12 at 19:22
Amelio Vazquez-ReinaAmelio Vazquez-Reina
13.2k56140241
13.2k56140241
Are you running rsync in daemon mode at the other end?
– JBRWilkinson
Feb 5 '12 at 20:16
4
Re. your ancillary question:tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'
– Gilles
Feb 5 '12 at 22:57
3
Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size
– Tatjana Heuser
Feb 6 '12 at 19:08
Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.
– Amelio Vazquez-Reina
Feb 6 '12 at 19:22
1
I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.
– Tatjana Heuser
Feb 7 '12 at 10:00
|
show 1 more comment
Are you running rsync in daemon mode at the other end?
– JBRWilkinson
Feb 5 '12 at 20:16
4
Re. your ancillary question:tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'
– Gilles
Feb 5 '12 at 22:57
3
Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size
– Tatjana Heuser
Feb 6 '12 at 19:08
Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.
– Amelio Vazquez-Reina
Feb 6 '12 at 19:22
1
I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.
– Tatjana Heuser
Feb 7 '12 at 10:00
Are you running rsync in daemon mode at the other end?
– JBRWilkinson
Feb 5 '12 at 20:16
Are you running rsync in daemon mode at the other end?
– JBRWilkinson
Feb 5 '12 at 20:16
4
4
Re. your ancillary question:
tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'
– Gilles
Feb 5 '12 at 22:57
Re. your ancillary question:
tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'
– Gilles
Feb 5 '12 at 22:57
3
3
Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size
– Tatjana Heuser
Feb 6 '12 at 19:08
Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size
– Tatjana Heuser
Feb 6 '12 at 19:08
Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.
– Amelio Vazquez-Reina
Feb 6 '12 at 19:22
Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.
– Amelio Vazquez-Reina
Feb 6 '12 at 19:22
1
1
I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.
– Tatjana Heuser
Feb 7 '12 at 10:00
I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.
– Tatjana Heuser
Feb 7 '12 at 10:00
|
show 1 more comment
7 Answers
7
active
oldest
votes
When you send the same set of files, rsync
is better suited because it will only send differences. tar
will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete
.
If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync
won't have to do any task more than tar
anyway.
Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use rsync
over ssh
, you may also use either tar+ssh
tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'
or just scp
scp -Cr srcdir user@server:destdir
General rule, keep it simple.
UPDATE:
I've created 59M demo data
mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done
and tested several times the file transfer to a remote server (not in the same lan), using both methods
time rsync -r tmp server:tmp2
real 0m11.520s
user 0m0.940s
sys 0m0.472s
time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)
real 0m15.026s
user 0m0.944s
sys 0m0.700s
while keeping separate logs from the ssh traffic packets sent
wc -l rsync.log rsync+tar.log
36730 rsync.log
37962 rsync+tar.log
74692 total
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast calledrsync
;)
– 0xC0000022L
Feb 8 '12 at 20:32
2
BTW if you use the flagz
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files
– Populus
Mar 27 '15 at 15:09
1
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
add a comment |
rsync
also does compression. Use the -z
flag. If running over ssh
, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync
compression. It seems quite effective. And I'd suggest skipping usage of tar
or any other pre/post compression.
I usually use rsync as rsync -abvz --partial...
.
Note thatrsync
by default skips compressing files with certain suffixes including.gz
and.tgz
and others; search thersync
man page for--skip-compress
for the full list.
– Wildcard
Feb 2 '18 at 0:22
add a comment |
I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.
Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.
The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.
The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.
1a/ tar files from source machine over the network to a .tar file on remote machine
$ tar cf /mnt/backup/cache.tar ~/.cache
1b/ untar that tar file on the remote machine itself
$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar
2/ rsync files from source machine over the network to remote machine
$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest
I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.
Timings:
1a - 33 seconds
1b - 1 minutes 48 seconds
2 - 22 minutes
It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.
I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.
Nick
1
Without using-z
to have rsync do compression, this test seems incomplete.
– Wildcard
Feb 2 '18 at 0:23
1
Tar without its ownz
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes,-z
would be sensible.
– Neek
Mar 29 '18 at 3:15
add a comment |
Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.
The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.
When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.
For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.
Postscriptum:
These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
add a comment |
For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync
saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.
I don't know exactly the internals of rsync
. Whether the file stats cause lag depends on how rsync
transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.
But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!
add a comment |
I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync
and ssh/tar
to see how they compare.
The results:
rsync
transferred the files at an average rate of 2.76 megabytes per
second.
ssh/tar
transferred the files at an average rate of 4.18
megabytes per second.
The details:
My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync
but in this case, the ssh/tar
is a functional solution.
My job with rsync
consists of:
rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/
where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress
is not productive for compressed files after I started but I was not going to go back restart.)
I started another with ssh and tar that has:
ssh otherSystem "cd /the/other/dir/; tar cf - ." | tar xvf -
You will observe this copies everything, sorry this is not a 100% apples to apples comparison.
I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.
The SSL connection through the intermediary is accomplished via the ~/.ssh/config
entry:
Host otherSystem
Hostname dataSource.otherSide.com
User myUser
Port 22
ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
IdentityFile id_rsa.priv
add a comment |
Time this:
tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f30953%2ftar-rsync-untar-any-speed-benefit-over-just-rsync%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
When you send the same set of files, rsync
is better suited because it will only send differences. tar
will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete
.
If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync
won't have to do any task more than tar
anyway.
Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use rsync
over ssh
, you may also use either tar+ssh
tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'
or just scp
scp -Cr srcdir user@server:destdir
General rule, keep it simple.
UPDATE:
I've created 59M demo data
mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done
and tested several times the file transfer to a remote server (not in the same lan), using both methods
time rsync -r tmp server:tmp2
real 0m11.520s
user 0m0.940s
sys 0m0.472s
time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)
real 0m15.026s
user 0m0.944s
sys 0m0.700s
while keeping separate logs from the ssh traffic packets sent
wc -l rsync.log rsync+tar.log
36730 rsync.log
37962 rsync+tar.log
74692 total
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast calledrsync
;)
– 0xC0000022L
Feb 8 '12 at 20:32
2
BTW if you use the flagz
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files
– Populus
Mar 27 '15 at 15:09
1
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
add a comment |
When you send the same set of files, rsync
is better suited because it will only send differences. tar
will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete
.
If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync
won't have to do any task more than tar
anyway.
Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use rsync
over ssh
, you may also use either tar+ssh
tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'
or just scp
scp -Cr srcdir user@server:destdir
General rule, keep it simple.
UPDATE:
I've created 59M demo data
mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done
and tested several times the file transfer to a remote server (not in the same lan), using both methods
time rsync -r tmp server:tmp2
real 0m11.520s
user 0m0.940s
sys 0m0.472s
time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)
real 0m15.026s
user 0m0.944s
sys 0m0.700s
while keeping separate logs from the ssh traffic packets sent
wc -l rsync.log rsync+tar.log
36730 rsync.log
37962 rsync+tar.log
74692 total
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast calledrsync
;)
– 0xC0000022L
Feb 8 '12 at 20:32
2
BTW if you use the flagz
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files
– Populus
Mar 27 '15 at 15:09
1
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
add a comment |
When you send the same set of files, rsync
is better suited because it will only send differences. tar
will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete
.
If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync
won't have to do any task more than tar
anyway.
Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use rsync
over ssh
, you may also use either tar+ssh
tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'
or just scp
scp -Cr srcdir user@server:destdir
General rule, keep it simple.
UPDATE:
I've created 59M demo data
mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done
and tested several times the file transfer to a remote server (not in the same lan), using both methods
time rsync -r tmp server:tmp2
real 0m11.520s
user 0m0.940s
sys 0m0.472s
time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)
real 0m15.026s
user 0m0.944s
sys 0m0.700s
while keeping separate logs from the ssh traffic packets sent
wc -l rsync.log rsync+tar.log
36730 rsync.log
37962 rsync+tar.log
74692 total
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.
When you send the same set of files, rsync
is better suited because it will only send differences. tar
will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar
loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete
.
If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync
doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync
won't have to do any task more than tar
anyway.
Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.
Tip2: If you use rsync
over ssh
, you may also use either tar+ssh
tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'
or just scp
scp -Cr srcdir user@server:destdir
General rule, keep it simple.
UPDATE:
I've created 59M demo data
mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done
and tested several times the file transfer to a remote server (not in the same lan), using both methods
time rsync -r tmp server:tmp2
real 0m11.520s
user 0m0.940s
sys 0m0.472s
time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)
real 0m15.026s
user 0m0.944s
sys 0m0.700s
while keeping separate logs from the ssh traffic packets sent
wc -l rsync.log rsync+tar.log
36730 rsync.log
37962 rsync+tar.log
74692 total
In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.
I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.
Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.
edited Feb 8 '12 at 20:27
answered Feb 5 '12 at 21:44
forcefsckforcefsck
5,8862331
5,8862331
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast calledrsync
;)
– 0xC0000022L
Feb 8 '12 at 20:32
2
BTW if you use the flagz
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files
– Populus
Mar 27 '15 at 15:09
1
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
add a comment |
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast calledrsync
;)
– 0xC0000022L
Feb 8 '12 at 20:32
2
BTW if you use the flagz
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files
– Populus
Mar 27 '15 at 15:09
1
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called
rsync
;)– 0xC0000022L
Feb 8 '12 at 20:32
Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called
rsync
;)– 0xC0000022L
Feb 8 '12 at 20:32
2
2
BTW if you use the flag
z
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files– Populus
Mar 27 '15 at 15:09
BTW if you use the flag
z
with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files– Populus
Mar 27 '15 at 15:09
1
1
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
@Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.
– forcefsck
Mar 28 '15 at 14:08
add a comment |
rsync
also does compression. Use the -z
flag. If running over ssh
, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync
compression. It seems quite effective. And I'd suggest skipping usage of tar
or any other pre/post compression.
I usually use rsync as rsync -abvz --partial...
.
Note thatrsync
by default skips compressing files with certain suffixes including.gz
and.tgz
and others; search thersync
man page for--skip-compress
for the full list.
– Wildcard
Feb 2 '18 at 0:22
add a comment |
rsync
also does compression. Use the -z
flag. If running over ssh
, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync
compression. It seems quite effective. And I'd suggest skipping usage of tar
or any other pre/post compression.
I usually use rsync as rsync -abvz --partial...
.
Note thatrsync
by default skips compressing files with certain suffixes including.gz
and.tgz
and others; search thersync
man page for--skip-compress
for the full list.
– Wildcard
Feb 2 '18 at 0:22
add a comment |
rsync
also does compression. Use the -z
flag. If running over ssh
, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync
compression. It seems quite effective. And I'd suggest skipping usage of tar
or any other pre/post compression.
I usually use rsync as rsync -abvz --partial...
.
rsync
also does compression. Use the -z
flag. If running over ssh
, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync
compression. It seems quite effective. And I'd suggest skipping usage of tar
or any other pre/post compression.
I usually use rsync as rsync -abvz --partial...
.
answered Feb 5 '12 at 21:27
Faheem MithaFaheem Mitha
23.6k1887138
23.6k1887138
Note thatrsync
by default skips compressing files with certain suffixes including.gz
and.tgz
and others; search thersync
man page for--skip-compress
for the full list.
– Wildcard
Feb 2 '18 at 0:22
add a comment |
Note thatrsync
by default skips compressing files with certain suffixes including.gz
and.tgz
and others; search thersync
man page for--skip-compress
for the full list.
– Wildcard
Feb 2 '18 at 0:22
Note that
rsync
by default skips compressing files with certain suffixes including .gz
and .tgz
and others; search the rsync
man page for --skip-compress
for the full list.– Wildcard
Feb 2 '18 at 0:22
Note that
rsync
by default skips compressing files with certain suffixes including .gz
and .tgz
and others; search the rsync
man page for --skip-compress
for the full list.– Wildcard
Feb 2 '18 at 0:22
add a comment |
I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.
Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.
The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.
The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.
1a/ tar files from source machine over the network to a .tar file on remote machine
$ tar cf /mnt/backup/cache.tar ~/.cache
1b/ untar that tar file on the remote machine itself
$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar
2/ rsync files from source machine over the network to remote machine
$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest
I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.
Timings:
1a - 33 seconds
1b - 1 minutes 48 seconds
2 - 22 minutes
It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.
I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.
Nick
1
Without using-z
to have rsync do compression, this test seems incomplete.
– Wildcard
Feb 2 '18 at 0:23
1
Tar without its ownz
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes,-z
would be sensible.
– Neek
Mar 29 '18 at 3:15
add a comment |
I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.
Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.
The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.
The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.
1a/ tar files from source machine over the network to a .tar file on remote machine
$ tar cf /mnt/backup/cache.tar ~/.cache
1b/ untar that tar file on the remote machine itself
$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar
2/ rsync files from source machine over the network to remote machine
$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest
I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.
Timings:
1a - 33 seconds
1b - 1 minutes 48 seconds
2 - 22 minutes
It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.
I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.
Nick
1
Without using-z
to have rsync do compression, this test seems incomplete.
– Wildcard
Feb 2 '18 at 0:23
1
Tar without its ownz
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes,-z
would be sensible.
– Neek
Mar 29 '18 at 3:15
add a comment |
I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.
Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.
The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.
The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.
1a/ tar files from source machine over the network to a .tar file on remote machine
$ tar cf /mnt/backup/cache.tar ~/.cache
1b/ untar that tar file on the remote machine itself
$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar
2/ rsync files from source machine over the network to remote machine
$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest
I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.
Timings:
1a - 33 seconds
1b - 1 minutes 48 seconds
2 - 22 minutes
It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.
I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.
Nick
I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.
Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.
The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.
The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.
1a/ tar files from source machine over the network to a .tar file on remote machine
$ tar cf /mnt/backup/cache.tar ~/.cache
1b/ untar that tar file on the remote machine itself
$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar
2/ rsync files from source machine over the network to remote machine
$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest
I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.
Timings:
1a - 33 seconds
1b - 1 minutes 48 seconds
2 - 22 minutes
It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.
I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.
Nick
answered Feb 3 '13 at 9:10
NeekNeek
15114
15114
1
Without using-z
to have rsync do compression, this test seems incomplete.
– Wildcard
Feb 2 '18 at 0:23
1
Tar without its ownz
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes,-z
would be sensible.
– Neek
Mar 29 '18 at 3:15
add a comment |
1
Without using-z
to have rsync do compression, this test seems incomplete.
– Wildcard
Feb 2 '18 at 0:23
1
Tar without its ownz
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes,-z
would be sensible.
– Neek
Mar 29 '18 at 3:15
1
1
Without using
-z
to have rsync do compression, this test seems incomplete.– Wildcard
Feb 2 '18 at 0:23
Without using
-z
to have rsync do compression, this test seems incomplete.– Wildcard
Feb 2 '18 at 0:23
1
1
Tar without its own
z
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z
would be sensible.– Neek
Mar 29 '18 at 3:15
Tar without its own
z
argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z
would be sensible.– Neek
Mar 29 '18 at 3:15
add a comment |
Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.
The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.
When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.
For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.
Postscriptum:
These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
add a comment |
Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.
The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.
When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.
For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.
Postscriptum:
These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
add a comment |
Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.
The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.
When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.
For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.
Postscriptum:
These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.
Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.
The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.
When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.
For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.
Postscriptum:
These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.
edited Feb 7 '12 at 9:57
answered Feb 6 '12 at 10:25
Tatjana HeuserTatjana Heuser
65644
65644
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
add a comment |
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.
– forcefsck
Feb 8 '12 at 20:25
add a comment |
For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync
saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.
I don't know exactly the internals of rsync
. Whether the file stats cause lag depends on how rsync
transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.
But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!
add a comment |
For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync
saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.
I don't know exactly the internals of rsync
. Whether the file stats cause lag depends on how rsync
transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.
But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!
add a comment |
For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync
saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.
I don't know exactly the internals of rsync
. Whether the file stats cause lag depends on how rsync
transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.
But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!
For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync
saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.
I don't know exactly the internals of rsync
. Whether the file stats cause lag depends on how rsync
transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.
But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!
answered Feb 5 '12 at 19:28
njsgnjsg
9,29412126
9,29412126
add a comment |
add a comment |
I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync
and ssh/tar
to see how they compare.
The results:
rsync
transferred the files at an average rate of 2.76 megabytes per
second.
ssh/tar
transferred the files at an average rate of 4.18
megabytes per second.
The details:
My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync
but in this case, the ssh/tar
is a functional solution.
My job with rsync
consists of:
rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/
where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress
is not productive for compressed files after I started but I was not going to go back restart.)
I started another with ssh and tar that has:
ssh otherSystem "cd /the/other/dir/; tar cf - ." | tar xvf -
You will observe this copies everything, sorry this is not a 100% apples to apples comparison.
I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.
The SSL connection through the intermediary is accomplished via the ~/.ssh/config
entry:
Host otherSystem
Hostname dataSource.otherSide.com
User myUser
Port 22
ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
IdentityFile id_rsa.priv
add a comment |
I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync
and ssh/tar
to see how they compare.
The results:
rsync
transferred the files at an average rate of 2.76 megabytes per
second.
ssh/tar
transferred the files at an average rate of 4.18
megabytes per second.
The details:
My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync
but in this case, the ssh/tar
is a functional solution.
My job with rsync
consists of:
rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/
where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress
is not productive for compressed files after I started but I was not going to go back restart.)
I started another with ssh and tar that has:
ssh otherSystem "cd /the/other/dir/; tar cf - ." | tar xvf -
You will observe this copies everything, sorry this is not a 100% apples to apples comparison.
I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.
The SSL connection through the intermediary is accomplished via the ~/.ssh/config
entry:
Host otherSystem
Hostname dataSource.otherSide.com
User myUser
Port 22
ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
IdentityFile id_rsa.priv
add a comment |
I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync
and ssh/tar
to see how they compare.
The results:
rsync
transferred the files at an average rate of 2.76 megabytes per
second.
ssh/tar
transferred the files at an average rate of 4.18
megabytes per second.
The details:
My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync
but in this case, the ssh/tar
is a functional solution.
My job with rsync
consists of:
rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/
where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress
is not productive for compressed files after I started but I was not going to go back restart.)
I started another with ssh and tar that has:
ssh otherSystem "cd /the/other/dir/; tar cf - ." | tar xvf -
You will observe this copies everything, sorry this is not a 100% apples to apples comparison.
I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.
The SSL connection through the intermediary is accomplished via the ~/.ssh/config
entry:
Host otherSystem
Hostname dataSource.otherSide.com
User myUser
Port 22
ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
IdentityFile id_rsa.priv
I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync
and ssh/tar
to see how they compare.
The results:
rsync
transferred the files at an average rate of 2.76 megabytes per
second.
ssh/tar
transferred the files at an average rate of 4.18
megabytes per second.
The details:
My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync
but in this case, the ssh/tar
is a functional solution.
My job with rsync
consists of:
rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/
where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress
is not productive for compressed files after I started but I was not going to go back restart.)
I started another with ssh and tar that has:
ssh otherSystem "cd /the/other/dir/; tar cf - ." | tar xvf -
You will observe this copies everything, sorry this is not a 100% apples to apples comparison.
I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.
The SSL connection through the intermediary is accomplished via the ~/.ssh/config
entry:
Host otherSystem
Hostname dataSource.otherSide.com
User myUser
Port 22
ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
IdentityFile id_rsa.priv
answered 2 hours ago
user1683793user1683793
1515
1515
add a comment |
add a comment |
Time this:
tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"
add a comment |
Time this:
tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"
add a comment |
Time this:
tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"
Time this:
tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"
edited Mar 4 '13 at 23:58
jasonwryan
51.5k14136190
51.5k14136190
answered Mar 4 '13 at 23:33
user33553user33553
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f30953%2ftar-rsync-untar-any-speed-benefit-over-just-rsync%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you running rsync in daemon mode at the other end?
– JBRWilkinson
Feb 5 '12 at 20:16
4
Re. your ancillary question:
tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'
– Gilles
Feb 5 '12 at 22:57
3
Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size
– Tatjana Heuser
Feb 6 '12 at 19:08
Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.
– Amelio Vazquez-Reina
Feb 6 '12 at 19:22
1
I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.
– Tatjana Heuser
Feb 7 '12 at 10:00