Given a list of files, some duplicates, some not, show checksum of only the duplicatesCreating a list of...
Why is softmax function used to calculate probabilities although we can divide each value by the sum of the vector?
Do the books ever say oliphaunts aren’t elephants?
Unknown indication below upper stave
Should I accept an invitation to give a talk from someone who might review my proposal?
Was Donald Trump at ground zero helping out on 9-11?
Self-deportation of American Citizens from US
Exploiting the delay when a festival ticket is scanned
How can Paypal know my card is being used in another account?
Should I intervene when a colleague in a different department makes students run laps as part of their grade?
Does dual boot harm a laptop battery or reduce its life?
How should I quote American English speakers in a British English essay?
What is the meaning of "stationarity of statistics" and "locality of pixel dependencies"?
Composing fill in the blanks
Why would a personal invisible shield be necessary?
What are the cons of stateless password generators?
How did astronauts using rovers tell direction without compasses on the Moon?
How to efficiently shred a lot of cabbage?
Why did I lose on time with 3 pawns vs Knight. Shouldn't it be a draw?
What is the reason for cards stating "Until end of turn, you don't lose this mana as steps and phases end"?
Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?
Do 3/8 (37.5%) of Quadratics Have No x-Intercepts?
A variant of the Multiple Traveling Salesman Problem
Is it okay for me to decline a project on ethical grounds?
Did Vladimir Lenin have a cat?
Given a list of files, some duplicates, some not, show checksum of only the duplicates
Creating a list of files, removing “duplicates” with different suffixClustering identical files ignoring spaces & linebreaksHow to see if there are any matching characters in a string?Print only unique lines from file not the duplicatesOn applying commands to groups of lines from stdinWhy does tar list multiple entries for some files?What's the best way to sort?Assigning variables from text in filenamefind the relevant files with their checksumRe-order lines and merge others based on a specific criteria
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
There must be an "easy" way to do this, but I can't figure out what it is.
Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
365a6d8b18cab348d92db610dfc46264 bar.txt
ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
I would like to sort file.txt
and have the output:
- Only show me lines if the md5 sum indicates the files are duplicates
- Put a blank line between each "group" of duplicates.
so it would look like this:
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
(In the real case, it could be 2 duplicates or 10, or more.)
I'm guessing there might be a ruby
or python
guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.
shell-script text-processing python
New contributor
add a comment |
There must be an "easy" way to do this, but I can't figure out what it is.
Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
365a6d8b18cab348d92db610dfc46264 bar.txt
ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
I would like to sort file.txt
and have the output:
- Only show me lines if the md5 sum indicates the files are duplicates
- Put a blank line between each "group" of duplicates.
so it would look like this:
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
(In the real case, it could be 2 duplicates or 10, or more.)
I'm guessing there might be a ruby
or python
guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.
shell-script text-processing python
New contributor
add a comment |
There must be an "easy" way to do this, but I can't figure out what it is.
Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
365a6d8b18cab348d92db610dfc46264 bar.txt
ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
I would like to sort file.txt
and have the output:
- Only show me lines if the md5 sum indicates the files are duplicates
- Put a blank line between each "group" of duplicates.
so it would look like this:
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
(In the real case, it could be 2 duplicates or 10, or more.)
I'm guessing there might be a ruby
or python
guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.
shell-script text-processing python
New contributor
There must be an "easy" way to do this, but I can't figure out what it is.
Assume you have a plain text "file.txt" which has lines in this format (md5 sums followed by filenames):
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
365a6d8b18cab348d92db610dfc46264 bar.txt
ae42d992bf622bdc425d37b04ec9c2d5 mini.txt
b8e9ff5502d5dbe38b3fd5e3363caacf tyrion.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
310ee92ebc69ed79c1837fc53983b7f8 mini luoma.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
I would like to sort file.txt
and have the output:
- Only show me lines if the md5 sum indicates the files are duplicates
- Put a blank line between each "group" of duplicates.
so it would look like this:
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
(In the real case, it could be 2 duplicates or 10, or more.)
I'm guessing there might be a ruby
or python
guru out there who can figure this one out, but I'm open to pretty much any practical solution out there.
shell-script text-processing python
shell-script text-processing python
New contributor
New contributor
New contributor
asked 1 hour ago
TJ LuomaTJ Luoma
1011 bronze badge
1011 bronze badge
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt
| awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
(Thanks to "cas" for the awk suggestion.)
1
+1. and just pipe the output toawk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)
– cas
28 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.
– Ray Butterworth
11 mins ago
add a comment |
With a perl Hash of Arrays:
$ perl -alne '
push @{ $h{$F[0]} }, $_
}{
for $k (sort keys %h) {
@a = @{ $h{$k} };
print join "n", @a, "" if $#a > 0
}
' file.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
Note that this prints a trailing blank line after the last record. The sort
is optional.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f533030%2fgiven-a-list-of-files-some-duplicates-some-not-show-checksum-of-only-the-dupl%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt
| awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
(Thanks to "cas" for the awk suggestion.)
1
+1. and just pipe the output toawk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)
– cas
28 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.
– Ray Butterworth
11 mins ago
add a comment |
$ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt
| awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
(Thanks to "cas" for the awk suggestion.)
1
+1. and just pipe the output toawk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)
– cas
28 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.
– Ray Butterworth
11 mins ago
add a comment |
$ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt
| awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
(Thanks to "cas" for the awk suggestion.)
$ grep -f <(cut -d' ' -f1 file.txt | sort | uniq -d) file.txt
| awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
(Thanks to "cas" for the awk suggestion.)
edited 14 mins ago
answered 47 mins ago
Ray ButterworthRay Butterworth
1815 bronze badges
1815 bronze badges
1
+1. and just pipe the output toawk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)
– cas
28 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.
– Ray Butterworth
11 mins ago
add a comment |
1
+1. and just pipe the output toawk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)
– cas
28 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.
– Ray Butterworth
11 mins ago
1
1
+1. and just pipe the output to
awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)– cas
28 mins ago
+1. and just pipe the output to
awk 'last && last != $1 { printf "n" }; { last=$1 ; print}'
. This prints a blank line whenever variable last is not empty and is different to the current $1. feel free to add this to your answer :)– cas
28 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:
awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.– Ray Butterworth
11 mins ago
@cas, thanks. I'd just come up with something similar, but not quite as nice:
awk 'BEGIN { x="" }; { if (x != $1) print(""); print; x=$1 }'
. It's amazing how much one forgets in only a few years, and even more amazing how quickly one relearns it.– Ray Butterworth
11 mins ago
add a comment |
With a perl Hash of Arrays:
$ perl -alne '
push @{ $h{$F[0]} }, $_
}{
for $k (sort keys %h) {
@a = @{ $h{$k} };
print join "n", @a, "" if $#a > 0
}
' file.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
Note that this prints a trailing blank line after the last record. The sort
is optional.
add a comment |
With a perl Hash of Arrays:
$ perl -alne '
push @{ $h{$F[0]} }, $_
}{
for $k (sort keys %h) {
@a = @{ $h{$k} };
print join "n", @a, "" if $#a > 0
}
' file.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
Note that this prints a trailing blank line after the last record. The sort
is optional.
add a comment |
With a perl Hash of Arrays:
$ perl -alne '
push @{ $h{$F[0]} }, $_
}{
for $k (sort keys %h) {
@a = @{ $h{$k} };
print join "n", @a, "" if $#a > 0
}
' file.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
Note that this prints a trailing blank line after the last record. The sort
is optional.
With a perl Hash of Arrays:
$ perl -alne '
push @{ $h{$F[0]} }, $_
}{
for $k (sort keys %h) {
@a = @{ $h{$k} };
print join "n", @a, "" if $#a > 0
}
' file.txt
542ed609dfc4d0cae44c4b7be6d66382 mba.txt
542ed609dfc4d0cae44c4b7be6d66382 tyrion final.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 foo.txt
5ee434a2ebcf4c3c98ee07e9c1efddc0 imac.txt
Note that this prints a trailing blank line after the last record. The sort
is optional.
answered 6 mins ago
steeldriversteeldriver
41.3k4 gold badges56 silver badges93 bronze badges
41.3k4 gold badges56 silver badges93 bronze badges
add a comment |
add a comment |
TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.
TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.
TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.
TJ Luoma is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f533030%2fgiven-a-list-of-files-some-duplicates-some-not-show-checksum-of-only-the-dupl%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown