tar + rsync + untar. Any speed benefit over just rsync?Does tar actually compress files, or just group them...

Find this cartoon

USPS Back Room - Trespassing?

How to politely tell someone they did not hit "reply to all" in an email?

Why haven't we yet tried accelerating a space station with people inside to a near light speed?

How to cut a climbing rope?

Why A=2 and B=1 in the call signs for Spirit and Opportunity?

Do photons bend spacetime or not?

The art of clickbait captions

What's difference between "depends on" and "is blocked by" relations between issues in Jira next-gen board?

Why did Drogon spare this character?

Gravitational Force Between Numbers

Why didn't Thanos use the Time Stone to stop the Avengers' plan?

Is it legal to have an abortion in another state or abroad?

Is it possible to remotely hack the GPS system and disable GPS service worldwide?

Why was this character made Grand Maester?

Why isn't Tyrion mentioned in the in-universe book "A Song of Ice and Fire"?

WordPress 5.2.1 deactivated my jQuery

Can I tell a prospective employee that everyone in the team is leaving?

What are the conditions for RAA?

Is there a simple example that empirical evidence is misleading?

What Armor Optimization applies to a Mithral full plate?

On San Andreas Speedruns, why do players blow up the Picador in the mission Ryder?

Python program to take in two strings and print the larger string

Gravitational effects of a single human body on the motion of planets



tar + rsync + untar. Any speed benefit over just rsync?


Does tar actually compress files, or just group them together?How to compress and transfer folders from one server to another over networkRsync over Rsync via cronjobHow can I speed up operations on sparse files with tar, gzip, rsync?How to tar, split, pipe, untar over sshDoes rsync over any type of checksum?Rsync contents of tar with remote serverUntar File.tar.gz from Shell w/Nested .tarUnable to untar file after tar errorrsync optimized for speedrsync over NFS - inconsistent speedLimit tar read speed






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







24















I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).



I was just wondering if there are reasons to believe that,



 tar + rsync + untar


Or simply



 tar (from src to dest) + untar


could be faster in practice than



rsync 


when transferring the files for the first time.



I am interested in an answer that addresses the above in two scenarios: using compression and not using it.



Update



I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar was consistently faster than running rsync directly (both without compression).










share|improve this question

























  • Are you running rsync in daemon mode at the other end?

    – JBRWilkinson
    Feb 5 '12 at 20:16








  • 4





    Re. your ancillary question: tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'

    – Gilles
    Feb 5 '12 at 22:57






  • 3





    Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size

    – Tatjana Heuser
    Feb 6 '12 at 19:08











  • Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.

    – Amelio Vazquez-Reina
    Feb 6 '12 at 19:22








  • 1





    I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.

    – Tatjana Heuser
    Feb 7 '12 at 10:00




















24















I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).



I was just wondering if there are reasons to believe that,



 tar + rsync + untar


Or simply



 tar (from src to dest) + untar


could be faster in practice than



rsync 


when transferring the files for the first time.



I am interested in an answer that addresses the above in two scenarios: using compression and not using it.



Update



I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar was consistently faster than running rsync directly (both without compression).










share|improve this question

























  • Are you running rsync in daemon mode at the other end?

    – JBRWilkinson
    Feb 5 '12 at 20:16








  • 4





    Re. your ancillary question: tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'

    – Gilles
    Feb 5 '12 at 22:57






  • 3





    Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size

    – Tatjana Heuser
    Feb 6 '12 at 19:08











  • Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.

    – Amelio Vazquez-Reina
    Feb 6 '12 at 19:22








  • 1





    I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.

    – Tatjana Heuser
    Feb 7 '12 at 10:00
















24












24








24


10






I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).



I was just wondering if there are reasons to believe that,



 tar + rsync + untar


Or simply



 tar (from src to dest) + untar


could be faster in practice than



rsync 


when transferring the files for the first time.



I am interested in an answer that addresses the above in two scenarios: using compression and not using it.



Update



I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar was consistently faster than running rsync directly (both without compression).










share|improve this question
















I often find myself sending folders with 10K - 100K of files to a remote machine (within the same network on-campus).



I was just wondering if there are reasons to believe that,



 tar + rsync + untar


Or simply



 tar (from src to dest) + untar


could be faster in practice than



rsync 


when transferring the files for the first time.



I am interested in an answer that addresses the above in two scenarios: using compression and not using it.



Update



I have just run some experiments moving 10,000 small files (total size = 50 MB), and tar+rsync+untar was consistently faster than running rsync directly (both without compression).







rsync tar






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 3 '13 at 14:27







Amelio Vazquez-Reina

















asked Feb 5 '12 at 19:22









Amelio Vazquez-ReinaAmelio Vazquez-Reina

13.2k56140241




13.2k56140241













  • Are you running rsync in daemon mode at the other end?

    – JBRWilkinson
    Feb 5 '12 at 20:16








  • 4





    Re. your ancillary question: tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'

    – Gilles
    Feb 5 '12 at 22:57






  • 3





    Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size

    – Tatjana Heuser
    Feb 6 '12 at 19:08











  • Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.

    – Amelio Vazquez-Reina
    Feb 6 '12 at 19:22








  • 1





    I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.

    – Tatjana Heuser
    Feb 7 '12 at 10:00





















  • Are you running rsync in daemon mode at the other end?

    – JBRWilkinson
    Feb 5 '12 at 20:16








  • 4





    Re. your ancillary question: tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'

    – Gilles
    Feb 5 '12 at 22:57






  • 3





    Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size

    – Tatjana Heuser
    Feb 6 '12 at 19:08











  • Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.

    – Amelio Vazquez-Reina
    Feb 6 '12 at 19:22








  • 1





    I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.

    – Tatjana Heuser
    Feb 7 '12 at 10:00



















Are you running rsync in daemon mode at the other end?

– JBRWilkinson
Feb 5 '12 at 20:16







Are you running rsync in daemon mode at the other end?

– JBRWilkinson
Feb 5 '12 at 20:16






4




4





Re. your ancillary question: tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'

– Gilles
Feb 5 '12 at 22:57





Re. your ancillary question: tar cf - . | ssh remotehost 'cd /target/dir && tar xf -'

– Gilles
Feb 5 '12 at 22:57




3




3





Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size

– Tatjana Heuser
Feb 6 '12 at 19:08





Syncing smaller files individually through rsync or scp results in each file starting at least one own data packet over the net. If the file is small and the packets are many, this results in increased protocol overhead. Now count in that there are more than one data packets for each file by means of rsync protocol as well (transferring checksums, comparing...), the protocol overhead quickly builds up. See Wikipedia on MTU size

– Tatjana Heuser
Feb 6 '12 at 19:08













Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.

– Amelio Vazquez-Reina
Feb 6 '12 at 19:22







Thanks @TatjanaHeuser - if you add this to your answer and don't mind backing up the claim that rsync uses at least one packet per file, I would accept it.

– Amelio Vazquez-Reina
Feb 6 '12 at 19:22






1




1





I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.

– Tatjana Heuser
Feb 7 '12 at 10:00







I found an interesting read stating that with scp and rsync the delay is to be blamed to different reasons: scp behaving basically like I described, but rsync optimizing network payload at the increased cost of building up large data structures for handling that. I've included that into my answer and will check on it this weekend.

– Tatjana Heuser
Feb 7 '12 at 10:00












7 Answers
7






active

oldest

votes


















24














When you send the same set of files, rsync is better suited because it will only send differences. tar will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete.



If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync won't have to do any task more than tar anyway.



Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.



Tip2: If you use rsync over ssh, you may also use either tar+ssh



tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'


or just scp



scp -Cr srcdir user@server:destdir


General rule, keep it simple.



UPDATE:



I've created 59M demo data



mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done


and tested several times the file transfer to a remote server (not in the same lan), using both methods



time rsync -r  tmp server:tmp2

real 0m11.520s
user 0m0.940s
sys 0m0.472s

time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)

real 0m15.026s
user 0m0.944s
sys 0m0.700s


while keeping separate logs from the ssh traffic packets sent



wc -l rsync.log rsync+tar.log 
36730 rsync.log
37962 rsync+tar.log
74692 total


In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.



I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.



Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.






share|improve this answer


























  • Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

    – 0xC0000022L
    Feb 8 '12 at 20:32






  • 2





    BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

    – Populus
    Mar 27 '15 at 15:09






  • 1





    @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

    – forcefsck
    Mar 28 '15 at 14:08





















8














rsync also does compression. Use the -z flag. If running over ssh, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync compression. It seems quite effective. And I'd suggest skipping usage of tar or any other pre/post compression.



I usually use rsync as rsync -abvz --partial....






share|improve this answer
























  • Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

    – Wildcard
    Feb 2 '18 at 0:22





















5














I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.



Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.



The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.



The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.



1a/ tar files from source machine over the network to a .tar file on remote machine

$ tar cf /mnt/backup/cache.tar ~/.cache

1b/ untar that tar file on the remote machine itself

$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar

2/ rsync files from source machine over the network to remote machine

$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest


I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.



Timings:



1a - 33 seconds

1b - 1 minutes 48 seconds

2 - 22 minutes


It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.



I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.



Nick






share|improve this answer



















  • 1





    Without using -z to have rsync do compression, this test seems incomplete.

    – Wildcard
    Feb 2 '18 at 0:23






  • 1





    Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

    – Neek
    Mar 29 '18 at 3:15





















3














Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.



The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.



When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.



For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.



Postscriptum:
These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.






share|improve this answer


























  • Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

    – forcefsck
    Feb 8 '12 at 20:25



















2














For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.



I don't know exactly the internals of rsync. Whether the file stats cause lag depends on how rsync transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.



But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!






share|improve this answer































    1














    I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync and ssh/tar to see how they compare.



    The results:





    • rsync transferred the files at an average rate of 2.76 megabytes per
      second.


    • ssh/tar transferred the files at an average rate of 4.18
      megabytes per second.


    The details:
    My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync but in this case, the ssh/tar is a functional solution.



    My job with rsync consists of:



    rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/


    where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress is not productive for compressed files after I started but I was not going to go back restart.)



    I started another with ssh and tar that has:



    ssh otherSystem "cd /the/other/dir/;  tar cf - ." | tar xvf -


    You will observe this copies everything, sorry this is not a 100% apples to apples comparison.



    I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.



    The SSL connection through the intermediary is accomplished via the ~/.ssh/config entry:



    Host otherSystem
    Hostname dataSource.otherSide.com
    User myUser
    Port 22
    ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
    IdentityFile id_rsa.priv





    share|improve this answer































      0














      Time this:



      tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"





      share|improve this answer


























        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "106"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f30953%2ftar-rsync-untar-any-speed-benefit-over-just-rsync%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        7 Answers
        7






        active

        oldest

        votes








        7 Answers
        7






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        24














        When you send the same set of files, rsync is better suited because it will only send differences. tar will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete.



        If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync won't have to do any task more than tar anyway.



        Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.



        Tip2: If you use rsync over ssh, you may also use either tar+ssh



        tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'


        or just scp



        scp -Cr srcdir user@server:destdir


        General rule, keep it simple.



        UPDATE:



        I've created 59M demo data



        mkdir tmp; cd tmp
        for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done


        and tested several times the file transfer to a remote server (not in the same lan), using both methods



        time rsync -r  tmp server:tmp2

        real 0m11.520s
        user 0m0.940s
        sys 0m0.472s

        time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)

        real 0m15.026s
        user 0m0.944s
        sys 0m0.700s


        while keeping separate logs from the ssh traffic packets sent



        wc -l rsync.log rsync+tar.log 
        36730 rsync.log
        37962 rsync+tar.log
        74692 total


        In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.



        I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.



        Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.






        share|improve this answer


























        • Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

          – 0xC0000022L
          Feb 8 '12 at 20:32






        • 2





          BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

          – Populus
          Mar 27 '15 at 15:09






        • 1





          @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

          – forcefsck
          Mar 28 '15 at 14:08


















        24














        When you send the same set of files, rsync is better suited because it will only send differences. tar will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete.



        If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync won't have to do any task more than tar anyway.



        Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.



        Tip2: If you use rsync over ssh, you may also use either tar+ssh



        tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'


        or just scp



        scp -Cr srcdir user@server:destdir


        General rule, keep it simple.



        UPDATE:



        I've created 59M demo data



        mkdir tmp; cd tmp
        for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done


        and tested several times the file transfer to a remote server (not in the same lan), using both methods



        time rsync -r  tmp server:tmp2

        real 0m11.520s
        user 0m0.940s
        sys 0m0.472s

        time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)

        real 0m15.026s
        user 0m0.944s
        sys 0m0.700s


        while keeping separate logs from the ssh traffic packets sent



        wc -l rsync.log rsync+tar.log 
        36730 rsync.log
        37962 rsync+tar.log
        74692 total


        In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.



        I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.



        Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.






        share|improve this answer


























        • Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

          – 0xC0000022L
          Feb 8 '12 at 20:32






        • 2





          BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

          – Populus
          Mar 27 '15 at 15:09






        • 1





          @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

          – forcefsck
          Mar 28 '15 at 14:08
















        24












        24








        24







        When you send the same set of files, rsync is better suited because it will only send differences. tar will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete.



        If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync won't have to do any task more than tar anyway.



        Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.



        Tip2: If you use rsync over ssh, you may also use either tar+ssh



        tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'


        or just scp



        scp -Cr srcdir user@server:destdir


        General rule, keep it simple.



        UPDATE:



        I've created 59M demo data



        mkdir tmp; cd tmp
        for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done


        and tested several times the file transfer to a remote server (not in the same lan), using both methods



        time rsync -r  tmp server:tmp2

        real 0m11.520s
        user 0m0.940s
        sys 0m0.472s

        time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)

        real 0m15.026s
        user 0m0.944s
        sys 0m0.700s


        while keeping separate logs from the ssh traffic packets sent



        wc -l rsync.log rsync+tar.log 
        36730 rsync.log
        37962 rsync+tar.log
        74692 total


        In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.



        I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.



        Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.






        share|improve this answer















        When you send the same set of files, rsync is better suited because it will only send differences. tar will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete.



        If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync won't have to do any task more than tar anyway.



        Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.



        Tip2: If you use rsync over ssh, you may also use either tar+ssh



        tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'


        or just scp



        scp -Cr srcdir user@server:destdir


        General rule, keep it simple.



        UPDATE:



        I've created 59M demo data



        mkdir tmp; cd tmp
        for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done


        and tested several times the file transfer to a remote server (not in the same lan), using both methods



        time rsync -r  tmp server:tmp2

        real 0m11.520s
        user 0m0.940s
        sys 0m0.472s

        time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)

        real 0m15.026s
        user 0m0.944s
        sys 0m0.700s


        while keeping separate logs from the ssh traffic packets sent



        wc -l rsync.log rsync+tar.log 
        36730 rsync.log
        37962 rsync+tar.log
        74692 total


        In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.



        I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.



        Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Feb 8 '12 at 20:27

























        answered Feb 5 '12 at 21:44









        forcefsckforcefsck

        5,8862331




        5,8862331













        • Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

          – 0xC0000022L
          Feb 8 '12 at 20:32






        • 2





          BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

          – Populus
          Mar 27 '15 at 15:09






        • 1





          @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

          – forcefsck
          Mar 28 '15 at 14:08





















        • Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

          – 0xC0000022L
          Feb 8 '12 at 20:32






        • 2





          BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

          – Populus
          Mar 27 '15 at 15:09






        • 1





          @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

          – forcefsck
          Mar 28 '15 at 14:08



















        Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

        – 0xC0000022L
        Feb 8 '12 at 20:32





        Indeed. The "only what's needed" is an important aspect, although it can sometimes be unruly, that beast called rsync ;)

        – 0xC0000022L
        Feb 8 '12 at 20:32




        2




        2





        BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

        – Populus
        Mar 27 '15 at 15:09





        BTW if you use the flag z with rsync it will compress the connection. With the amount of CPU power we have nowadays, the compression is trivial compared to the amount of bandwidth you save, which can be ~1/10 of uncompressed for text files

        – Populus
        Mar 27 '15 at 15:09




        1




        1





        @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

        – forcefsck
        Mar 28 '15 at 14:08







        @Populus, you'll notice I'm using compression on my original reply. However in the tests I added later it doesn't matter that much, data from urandom doesn't compress much... if at all.

        – forcefsck
        Mar 28 '15 at 14:08















        8














        rsync also does compression. Use the -z flag. If running over ssh, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync compression. It seems quite effective. And I'd suggest skipping usage of tar or any other pre/post compression.



        I usually use rsync as rsync -abvz --partial....






        share|improve this answer
























        • Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

          – Wildcard
          Feb 2 '18 at 0:22


















        8














        rsync also does compression. Use the -z flag. If running over ssh, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync compression. It seems quite effective. And I'd suggest skipping usage of tar or any other pre/post compression.



        I usually use rsync as rsync -abvz --partial....






        share|improve this answer
























        • Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

          – Wildcard
          Feb 2 '18 at 0:22
















        8












        8








        8







        rsync also does compression. Use the -z flag. If running over ssh, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync compression. It seems quite effective. And I'd suggest skipping usage of tar or any other pre/post compression.



        I usually use rsync as rsync -abvz --partial....






        share|improve this answer













        rsync also does compression. Use the -z flag. If running over ssh, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync compression. It seems quite effective. And I'd suggest skipping usage of tar or any other pre/post compression.



        I usually use rsync as rsync -abvz --partial....







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 5 '12 at 21:27









        Faheem MithaFaheem Mitha

        23.6k1887138




        23.6k1887138













        • Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

          – Wildcard
          Feb 2 '18 at 0:22





















        • Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

          – Wildcard
          Feb 2 '18 at 0:22



















        Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

        – Wildcard
        Feb 2 '18 at 0:22







        Note that rsync by default skips compressing files with certain suffixes including .gz and .tgz and others; search the rsync man page for --skip-compress for the full list.

        – Wildcard
        Feb 2 '18 at 0:22













        5














        I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.



        Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.



        The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.



        The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.



        1a/ tar files from source machine over the network to a .tar file on remote machine

        $ tar cf /mnt/backup/cache.tar ~/.cache

        1b/ untar that tar file on the remote machine itself

        $ ssh admin@nas_box
        [admin@nas_box] $ tar xf cache.tar

        2/ rsync files from source machine over the network to remote machine

        $ mkdir /mnt/backup/cachetest
        $ rsync -ah .cache /mnt/backup/cachetest


        I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.



        Timings:



        1a - 33 seconds

        1b - 1 minutes 48 seconds

        2 - 22 minutes


        It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.



        I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.



        Nick






        share|improve this answer



















        • 1





          Without using -z to have rsync do compression, this test seems incomplete.

          – Wildcard
          Feb 2 '18 at 0:23






        • 1





          Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

          – Neek
          Mar 29 '18 at 3:15


















        5














        I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.



        Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.



        The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.



        The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.



        1a/ tar files from source machine over the network to a .tar file on remote machine

        $ tar cf /mnt/backup/cache.tar ~/.cache

        1b/ untar that tar file on the remote machine itself

        $ ssh admin@nas_box
        [admin@nas_box] $ tar xf cache.tar

        2/ rsync files from source machine over the network to remote machine

        $ mkdir /mnt/backup/cachetest
        $ rsync -ah .cache /mnt/backup/cachetest


        I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.



        Timings:



        1a - 33 seconds

        1b - 1 minutes 48 seconds

        2 - 22 minutes


        It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.



        I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.



        Nick






        share|improve this answer



















        • 1





          Without using -z to have rsync do compression, this test seems incomplete.

          – Wildcard
          Feb 2 '18 at 0:23






        • 1





          Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

          – Neek
          Mar 29 '18 at 3:15
















        5












        5








        5







        I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.



        Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.



        The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.



        The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.



        1a/ tar files from source machine over the network to a .tar file on remote machine

        $ tar cf /mnt/backup/cache.tar ~/.cache

        1b/ untar that tar file on the remote machine itself

        $ ssh admin@nas_box
        [admin@nas_box] $ tar xf cache.tar

        2/ rsync files from source machine over the network to remote machine

        $ mkdir /mnt/backup/cachetest
        $ rsync -ah .cache /mnt/backup/cachetest


        I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.



        Timings:



        1a - 33 seconds

        1b - 1 minutes 48 seconds

        2 - 22 minutes


        It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.



        I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.



        Nick






        share|improve this answer













        I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.



        Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.



        The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.



        The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.



        1a/ tar files from source machine over the network to a .tar file on remote machine

        $ tar cf /mnt/backup/cache.tar ~/.cache

        1b/ untar that tar file on the remote machine itself

        $ ssh admin@nas_box
        [admin@nas_box] $ tar xf cache.tar

        2/ rsync files from source machine over the network to remote machine

        $ mkdir /mnt/backup/cachetest
        $ rsync -ah .cache /mnt/backup/cachetest


        I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.



        Timings:



        1a - 33 seconds

        1b - 1 minutes 48 seconds

        2 - 22 minutes


        It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.



        I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.



        Nick







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Feb 3 '13 at 9:10









        NeekNeek

        15114




        15114








        • 1





          Without using -z to have rsync do compression, this test seems incomplete.

          – Wildcard
          Feb 2 '18 at 0:23






        • 1





          Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

          – Neek
          Mar 29 '18 at 3:15
















        • 1





          Without using -z to have rsync do compression, this test seems incomplete.

          – Wildcard
          Feb 2 '18 at 0:23






        • 1





          Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

          – Neek
          Mar 29 '18 at 3:15










        1




        1





        Without using -z to have rsync do compression, this test seems incomplete.

        – Wildcard
        Feb 2 '18 at 0:23





        Without using -z to have rsync do compression, this test seems incomplete.

        – Wildcard
        Feb 2 '18 at 0:23




        1




        1





        Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

        – Neek
        Mar 29 '18 at 3:15







        Tar without its own z argument, as I used it, does not compress data (see unix.stackexchange.com/questions/127169/…), so as far as I can see using rsync without compression is a fair comparison. If I were passing the tar output through a compression library like bzip2 or gzip then yes, -z would be sensible.

        – Neek
        Mar 29 '18 at 3:15













        3














        Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.



        The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.



        When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
        As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.



        For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.



        Postscriptum:
        These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.






        share|improve this answer


























        • Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

          – forcefsck
          Feb 8 '12 at 20:25
















        3














        Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.



        The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.



        When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
        As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.



        For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.



        Postscriptum:
        These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.






        share|improve this answer


























        • Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

          – forcefsck
          Feb 8 '12 at 20:25














        3












        3








        3







        Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.



        The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.



        When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
        As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.



        For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.



        Postscriptum:
        These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.






        share|improve this answer















        Using rsync to send a tar archive as asked actually would be a waste or ressources, since you'd add a verification layer to the process. Rsync would checksum the tar file for correctness, when you'd rather have the check on the individual files. (It doesn't help to know that the tar file which may have been defective on the sending side already shows the same effect on the receiving end). If you're sending an archive, ssh/scp is all you need.



        The one reason you might have to select sending an archive would be if the tar of your choice were able to preserve more of the filesystem specials, such as Access Control List or other Metadata often stored in Extended Attributes (Solaris) or Ressource Forks (MacOS). When dealing with such things, your main concern will be as to which tools are able to preserve all information that's associated with the file on the source filesystem, providing the target filesystem has the capability to keep track of them as well.



        When speed is your main concern, it depends a lot on the size of your files. In general, a multitude of tiny files will scale badly over rsync or scp, since the'll all waste individual network packets each, where a tar file would include several of them within the data load of a single network packet. Even better if the tar file were compressed, since the small files would most likely compress better as a whole than individually.
        As far as I know, both rsync and scp fail to optimize when sending entire single files as in an initial transfer, having each file occupy an entire data frame with its entire protocol overhead (and wasting more on checking forth and back). However Janecek states this to be true for scp only, detailling that rsync would optimize the network traffic but at the cost of building huge data structures in memory. See article Efficient File Transfer, Janecek 2006. So according to him it's still true that both scp and rsync scale badly on small files, but for entirely different reasons. Guess I'll have to dig into sources this weekend to find out.



        For practical relevance, if you know you're sending mostly larger files, there won't be much of a difference in speed, and using rsync has the added benefit of being able to take up where it left when interrupted.



        Postscriptum:
        These days, rdist seems to sink into oblivition, but before the days of rsync, it was a very capable tool and widely used (safely when used over ssh, unsafe otherwise). I would not perform as good as rsync though since it didn't optimize to just transfer content that had changed. Its main difference to rsync lies in the way it is configured, and how the rules for updating files are spelled out.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Feb 7 '12 at 9:57

























        answered Feb 6 '12 at 10:25









        Tatjana HeuserTatjana Heuser

        65644




        65644













        • Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

          – forcefsck
          Feb 8 '12 at 20:25



















        • Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

          – forcefsck
          Feb 8 '12 at 20:25

















        Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

        – forcefsck
        Feb 8 '12 at 20:25





        Rsync doesn't add a verification layer. It only uses checksums to find differences on existing files, not to verify the result. In case where the copy is fresh, no checksums are made. In case where the copy is not fresh, checksums save you bandwidth.

        – forcefsck
        Feb 8 '12 at 20:25











        2














        For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.



        I don't know exactly the internals of rsync. Whether the file stats cause lag depends on how rsync transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.



        But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!






        share|improve this answer




























          2














          For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.



          I don't know exactly the internals of rsync. Whether the file stats cause lag depends on how rsync transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.



          But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!






          share|improve this answer


























            2












            2








            2







            For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.



            I don't know exactly the internals of rsync. Whether the file stats cause lag depends on how rsync transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.



            But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!






            share|improve this answer













            For small directories (small as in used disk space), it depends on the overhead of checking the file information for the files being synced. On one hand, rsync saves the time of transferring the unmodified files, on the other hand, it indeed has to transfer information about each file.



            I don't know exactly the internals of rsync. Whether the file stats cause lag depends on how rsync transfers data — if file stats are transferred one by one, then the RTT may make tar+rsync+untar faster.



            But if you have, say 1 GiB of data, rsync will be way faster, well, unless your connection is really fast!







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Feb 5 '12 at 19:28









            njsgnjsg

            9,29412126




            9,29412126























                1














                I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync and ssh/tar to see how they compare.



                The results:





                • rsync transferred the files at an average rate of 2.76 megabytes per
                  second.


                • ssh/tar transferred the files at an average rate of 4.18
                  megabytes per second.


                The details:
                My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync but in this case, the ssh/tar is a functional solution.



                My job with rsync consists of:



                rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/


                where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress is not productive for compressed files after I started but I was not going to go back restart.)



                I started another with ssh and tar that has:



                ssh otherSystem "cd /the/other/dir/;  tar cf - ." | tar xvf -


                You will observe this copies everything, sorry this is not a 100% apples to apples comparison.



                I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.



                The SSL connection through the intermediary is accomplished via the ~/.ssh/config entry:



                Host otherSystem
                Hostname dataSource.otherSide.com
                User myUser
                Port 22
                ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
                IdentityFile id_rsa.priv





                share|improve this answer




























                  1














                  I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync and ssh/tar to see how they compare.



                  The results:





                  • rsync transferred the files at an average rate of 2.76 megabytes per
                    second.


                  • ssh/tar transferred the files at an average rate of 4.18
                    megabytes per second.


                  The details:
                  My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync but in this case, the ssh/tar is a functional solution.



                  My job with rsync consists of:



                  rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/


                  where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress is not productive for compressed files after I started but I was not going to go back restart.)



                  I started another with ssh and tar that has:



                  ssh otherSystem "cd /the/other/dir/;  tar cf - ." | tar xvf -


                  You will observe this copies everything, sorry this is not a 100% apples to apples comparison.



                  I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.



                  The SSL connection through the intermediary is accomplished via the ~/.ssh/config entry:



                  Host otherSystem
                  Hostname dataSource.otherSide.com
                  User myUser
                  Port 22
                  ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
                  IdentityFile id_rsa.priv





                  share|improve this answer


























                    1












                    1








                    1







                    I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync and ssh/tar to see how they compare.



                    The results:





                    • rsync transferred the files at an average rate of 2.76 megabytes per
                      second.


                    • ssh/tar transferred the files at an average rate of 4.18
                      megabytes per second.


                    The details:
                    My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync but in this case, the ssh/tar is a functional solution.



                    My job with rsync consists of:



                    rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/


                    where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress is not productive for compressed files after I started but I was not going to go back restart.)



                    I started another with ssh and tar that has:



                    ssh otherSystem "cd /the/other/dir/;  tar cf - ." | tar xvf -


                    You will observe this copies everything, sorry this is not a 100% apples to apples comparison.



                    I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.



                    The SSL connection through the intermediary is accomplished via the ~/.ssh/config entry:



                    Host otherSystem
                    Hostname dataSource.otherSide.com
                    User myUser
                    Port 22
                    ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
                    IdentityFile id_rsa.priv





                    share|improve this answer













                    I had to move a few terabytes of data across the country, exactly once. As an experiment, I ran two of the transfers using rsync and ssh/tar to see how they compare.



                    The results:





                    • rsync transferred the files at an average rate of 2.76 megabytes per
                      second.


                    • ssh/tar transferred the files at an average rate of 4.18
                      megabytes per second.


                    The details:
                    My data consists of millions of .gz compressed files, the average size of which is 10 megabytes but some are over a gigabyte. There is a directory structure but it is dwarfed by the size of the data inside the files. If I had almost anything else to do, I would only have used rsync but in this case, the ssh/tar is a functional solution.



                    My job with rsync consists of:



                    rsync --compress --stats --no-blocking-io --files-from=fileList.txt -av otherSystem:/the/other/dir/ dest/


                    where fileList.txt is a great long list of the relative pathnames of the files on the other side. (I noticed that the --compress is not productive for compressed files after I started but I was not going to go back restart.)



                    I started another with ssh and tar that has:



                    ssh otherSystem "cd /the/other/dir/;  tar cf - ." | tar xvf -


                    You will observe this copies everything, sorry this is not a 100% apples to apples comparison.



                    I should add that while I am using the internal company network, I have to go through an intermediary to get to the data source computer. The ping time from my target computer to the intermediary is 21 ms and from the intermediary to the data source is 26 ms. This was the same for both transfers.



                    The SSL connection through the intermediary is accomplished via the ~/.ssh/config entry:



                    Host otherSystem
                    Hostname dataSource.otherSide.com
                    User myUser
                    Port 22
                    ProxyCommand ssh -q -W %h:%p intermediary.otherSide.com
                    IdentityFile id_rsa.priv






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 2 hours ago









                    user1683793user1683793

                    1515




                    1515























                        0














                        Time this:



                        tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"





                        share|improve this answer






























                          0














                          Time this:



                          tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"





                          share|improve this answer




























                            0












                            0








                            0







                            Time this:



                            tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"





                            share|improve this answer















                            Time this:



                            tar cf - ~/.cache | ssh admin@nas_box "(cd /destination ; tar xf -)"






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Mar 4 '13 at 23:58









                            jasonwryan

                            51.5k14136190




                            51.5k14136190










                            answered Mar 4 '13 at 23:33









                            user33553user33553

                            1




                            1






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f30953%2ftar-rsync-untar-any-speed-benefit-over-just-rsync%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

                                Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

                                Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...