Given a list of files, some duplicates, some not, show checksum of only the duplicatesCreating a list of...

Why is softmax function used to calculate probabilities although we can divide each value by the sum of the vector?

Do the books ever say oliphaunts aren’t elephants?

Unknown indication below upper stave

Should I accept an invitation to give a talk from someone who might review my proposal?

Was Donald Trump at ground zero helping out on 9-11?

Self-deportation of American Citizens from US

Exploiting the delay when a festival ticket is scanned

How can Paypal know my card is being used in another account?

Should I intervene when a colleague in a different department makes students run laps as part of their grade?

Does dual boot harm a laptop battery or reduce its life?

How should I quote American English speakers in a British English essay?

What is the meaning of "stationarity of statistics" and "locality of pixel dependencies"?

Composing fill in the blanks

Why would a personal invisible shield be necessary?

What are the cons of stateless password generators?

How did astronauts using rovers tell direction without compasses on the Moon?

How to efficiently shred a lot of cabbage?

Why did I lose on time with 3 pawns vs Knight. Shouldn't it be a draw?

What is the reason for cards stating "Until end of turn, you don't lose this mana as steps and phases end"?

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

Do 3/8 (37.5%) of Quadratics Have No x-Intercepts?

A variant of the Multiple Traveling Salesman Problem

Is it okay for me to decline a project on ethical grounds?

Did Vladimir Lenin have a cat?



Given a list of files, some duplicates, some not, show checksum of only the duplicates


Creating a list of files, removing “duplicates” with different suffixClustering identical files ignoring spaces & linebreaksHow to see if there are any matching characters in a string?Print only unique lines from file not the duplicatesOn applying commands to groups of lines from stdinWhy does tar list multiple entries for some files?What's the best way to sort?Assigning variables from text in filenamefind the relevant files with their checksumRe-order lines and merge others based on a specific criteria






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







0















There must be an "easy" way to do this, but I can't figure out what it is.



Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):



5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
365a6d8b18cab348d92db610dfc46264 bar.txt
ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


I would like to sort file.txt and have the output:




  1. Only show me lines if the md5 sum indicates the files are duplicates

  2. Put a blank line between each "group" of duplicates.


so it would look like this:



542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


(In the real case, it could be 2 duplicates or 10, or more.)



I'm guessing there might be a ruby or python guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.










share|improve this question







New contributor



TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




























    0















    There must be an "easy" way to do this, but I can't figure out what it is.



    Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):



    5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
    365a6d8b18cab348d92db610dfc46264 bar.txt
    ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
    b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
    5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
    542ed609dfc4d0cae44c4b7be6d66382 mba.txt
    310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
    542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


    I would like to sort file.txt and have the output:




    1. Only show me lines if the md5 sum indicates the files are duplicates

    2. Put a blank line between each "group" of duplicates.


    so it would look like this:



    542ed609dfc4d0cae44c4b7be6d66382 mba.txt
    542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

    5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
    5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


    (In the real case, it could be 2 duplicates or 10, or more.)



    I'm guessing there might be a ruby or python guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.










    share|improve this question







    New contributor



    TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.
























      0












      0








      0








      There must be an "easy" way to do this, but I can't figure out what it is.



      Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):



      5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
      365a6d8b18cab348d92db610dfc46264 bar.txt
      ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
      b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
      5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
      542ed609dfc4d0cae44c4b7be6d66382 mba.txt
      310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
      542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


      I would like to sort file.txt and have the output:




      1. Only show me lines if the md5 sum indicates the files are duplicates

      2. Put a blank line between each "group" of duplicates.


      so it would look like this:



      542ed609dfc4d0cae44c4b7be6d66382 mba.txt
      542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

      5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
      5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


      (In the real case, it could be 2 duplicates or 10, or more.)



      I'm guessing there might be a ruby or python guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.










      share|improve this question







      New contributor



      TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      There must be an "easy" way to do this, but I can't figure out what it is.



      Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):



      5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
      365a6d8b18cab348d92db610dfc46264 bar.txt
      ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
      b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
      5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
      542ed609dfc4d0cae44c4b7be6d66382 mba.txt
      310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
      542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


      I would like to sort file.txt and have the output:




      1. Only show me lines if the md5 sum indicates the files are duplicates

      2. Put a blank line between each "group" of duplicates.


      so it would look like this:



      542ed609dfc4d0cae44c4b7be6d66382 mba.txt
      542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

      5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
      5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


      (In the real case, it could be 2 duplicates or 10, or more.)



      I'm guessing there might be a ruby or python guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.







      shell-script text-processing python






      share|improve this question







      New contributor



      TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share|improve this question







      New contributor



      TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share|improve this question




      share|improve this question






      New contributor



      TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      asked 1 hour ago









      TJ LuomaTJ Luoma

      1011 bronze badge




      1011 bronze badge




      New contributor



      TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




      New contributor




      TJ Luoma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.



























          2 Answers
          2






          active

          oldest

          votes


















          3














          $ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt 
          | awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'

          5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
          5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt

          542ed609dfc4d0cae44c4b7be6d66382 mba.txt
          542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


          (Thanks to "cas" for the awk suggestion.)






          share|improve this answer























          • 1





            +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

            – cas
            28 mins ago













          • @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

            – Ray Butterworth
            11 mins ago



















          1














          With a perl Hash of Arrays:



          $ perl -alne '
          push @{ $h{$F[0]} }, $_
          }{
          for $k (sort keys %h) {
          @a = @{ $h{$k} };
          print join "n", @a, "" if $#a > 0
          }
          ' file.txt
          542ed609dfc4d0cae44c4b7be6d66382 mba.txt
          542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

          5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
          5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


          Note that this prints a trailing blank line after the last record. The sort is optional.





          share




























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f533030%2fgiven-a-list-of-files-some-duplicates-some-not-show-checksum-of-only-the-dupl%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            3














            $ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt 
            | awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'

            5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
            5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt

            542ed609dfc4d0cae44c4b7be6d66382 mba.txt
            542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


            (Thanks to "cas" for the awk suggestion.)






            share|improve this answer























            • 1





              +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

              – cas
              28 mins ago













            • @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

              – Ray Butterworth
              11 mins ago
















            3














            $ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt 
            | awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'

            5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
            5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt

            542ed609dfc4d0cae44c4b7be6d66382 mba.txt
            542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


            (Thanks to "cas" for the awk suggestion.)






            share|improve this answer























            • 1





              +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

              – cas
              28 mins ago













            • @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

              – Ray Butterworth
              11 mins ago














            3












            3








            3







            $ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt 
            | awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'

            5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
            5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt

            542ed609dfc4d0cae44c4b7be6d66382 mba.txt
            542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


            (Thanks to "cas" for the awk suggestion.)






            share|improve this answer















            $ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt 
            | awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'

            5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
            5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt

            542ed609dfc4d0cae44c4b7be6d66382 mba.txt
            542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt


            (Thanks to "cas" for the awk suggestion.)







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 14 mins ago

























            answered 47 mins ago









            Ray ButterworthRay Butterworth

            1815 bronze badges




            1815 bronze badges











            • 1





              +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

              – cas
              28 mins ago













            • @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

              – Ray Butterworth
              11 mins ago














            • 1





              +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

              – cas
              28 mins ago













            • @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

              – Ray Butterworth
              11 mins ago








            1




            1





            +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

            – cas
            28 mins ago







            +1. and just pipe the output to awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)

            – cas
            28 mins ago















            @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

            – Ray Butterworth
            11 mins ago





            @cas, thanks. I'd just come up with something similar, but not quite as nice: awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.

            – Ray Butterworth
            11 mins ago













            1














            With a perl Hash of Arrays:



            $ perl -alne '
            push @{ $h{$F[0]} }, $_
            }{
            for $k (sort keys %h) {
            @a = @{ $h{$k} };
            print join "n", @a, "" if $#a > 0
            }
            ' file.txt
            542ed609dfc4d0cae44c4b7be6d66382 mba.txt
            542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

            5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
            5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


            Note that this prints a trailing blank line after the last record. The sort is optional.





            share






























              1














              With a perl Hash of Arrays:



              $ perl -alne '
              push @{ $h{$F[0]} }, $_
              }{
              for $k (sort keys %h) {
              @a = @{ $h{$k} };
              print join "n", @a, "" if $#a > 0
              }
              ' file.txt
              542ed609dfc4d0cae44c4b7be6d66382 mba.txt
              542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

              5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
              5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


              Note that this prints a trailing blank line after the last record. The sort is optional.





              share




























                1












                1








                1







                With a perl Hash of Arrays:



                $ perl -alne '
                push @{ $h{$F[0]} }, $_
                }{
                for $k (sort keys %h) {
                @a = @{ $h{$k} };
                print join "n", @a, "" if $#a > 0
                }
                ' file.txt
                542ed609dfc4d0cae44c4b7be6d66382 mba.txt
                542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

                5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
                5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


                Note that this prints a trailing blank line after the last record. The sort is optional.





                share













                With a perl Hash of Arrays:



                $ perl -alne '
                push @{ $h{$F[0]} }, $_
                }{
                for $k (sort keys %h) {
                @a = @{ $h{$k} };
                print join "n", @a, "" if $#a > 0
                }
                ' file.txt
                542ed609dfc4d0cae44c4b7be6d66382 mba.txt
                542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt

                5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
                5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt


                Note that this prints a trailing blank line after the last record. The sort is optional.






                share











                share


                share










                answered 6 mins ago









                steeldriversteeldriver

                41.3k4 gold badges56 silver badges93 bronze badges




                41.3k4 gold badges56 silver badges93 bronze badges

























                    TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.













                    TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.












                    TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f533030%2fgiven-a-list-of-files-some-duplicates-some-not-show-checksum-of-only-the-dupl%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

                    Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

                    Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...