how to merge two files to skip duplicate dataCompare two files with first column and remove duplicate row...
A high quality contribution but an annoying error is present in my published article
Leaving a job that I just took based on false promise of a raise. What do I tell future interviewers?
What are the benefits and disadvantages if a creature has multiple tails, e.g., Kyuubi or Nekomata?
The quicker I go up, the sooner I’ll go down - Riddle
Is it right to extend flaps only in the white arc?
Is it really necessary to have a four hour meeting in Sprint planning?
Safely hang a mirror that does not have hooks
I reverse the source code, you negate the output!
How much damage can be done just by heating matter?
What are these ingforms of learning?
Do all creatures have souls?
What is this utensil for?
Can this word order be rearranged?
How to make interviewee comfortable interviewing in lounge chairs
Resolving moral conflict
Where Does VDD+0.3V Input Limit Come From on IC chips?
Can Northern Ireland's border issue be solved by repartition?
How can I repair this gas leak on my new range? Teflon tape isn't working
Is the mass of paint relevant in rocket design?
How can an attacker use robots.txt?
Conditionally execute a command if a specific package is loaded
Hiking with a mule or two?
What is the meaning of "heutig" in this sentence?
Is it a good idea to leave minor world details to the reader's imagination?
how to merge two files to skip duplicate data
Compare two files with first column and remove duplicate row from 2nd file in shell scriptHow to merge two files based on the matching of two columns?Merge some tab-delimited filesHow to merge two files in the same row?Merge two fileshow to merge two files based on single columnShuffling two files and merge datatwo input files data processingMerge two data from two columns in 100+ files into one separate filehow can I merge two text files together
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have two different large files (each one more than 300,000 lines) and I want to combine them in a specific way. Some rows of the two data measure the same thing; when columns 9, 14, 15, 16, 17 are reciprocally equal, I suppose they are measuring the same thing and I want to output from file1 and skip the file2 rows in order to skip duplicate data. Otherwise, I want to output all rows from both files. The other columns in the duplicate rows may not be equal reciprocally, and the precision of file 1 is better than file 2, and that is the reason for choosing rows from file1 rather than file 2.
For example, since columns 9, 14, 15, 16, and 17 of the three first lines of the two following files are reciprocally equal, so the first three lines of the two datasets measure the same thing and thus I want to output from File 1 and skip the File2 data. For the fourth lines of the datasets, since columns 14 of the two files are not reciprocally equal. I output both lines from two files.
I used this script
awk '!seen[$9,$14,$15,$16,$17]++' File1 File2 > output
, which works well for small data. However, when I use it for large data sets, it skips some not duplicate data and I dont know the reason. It is deeply appreciated if anyone could help which script I can use for merging the datasets.
File1
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
File2
24 102 22 100 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
24 102 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
output
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
shell-script text-processing awk gawk join
add a comment
|
I have two different large files (each one more than 300,000 lines) and I want to combine them in a specific way. Some rows of the two data measure the same thing; when columns 9, 14, 15, 16, 17 are reciprocally equal, I suppose they are measuring the same thing and I want to output from file1 and skip the file2 rows in order to skip duplicate data. Otherwise, I want to output all rows from both files. The other columns in the duplicate rows may not be equal reciprocally, and the precision of file 1 is better than file 2, and that is the reason for choosing rows from file1 rather than file 2.
For example, since columns 9, 14, 15, 16, and 17 of the three first lines of the two following files are reciprocally equal, so the first three lines of the two datasets measure the same thing and thus I want to output from File 1 and skip the File2 data. For the fourth lines of the datasets, since columns 14 of the two files are not reciprocally equal. I output both lines from two files.
I used this script
awk '!seen[$9,$14,$15,$16,$17]++' File1 File2 > output
, which works well for small data. However, when I use it for large data sets, it skips some not duplicate data and I dont know the reason. It is deeply appreciated if anyone could help which script I can use for merging the datasets.
File1
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
File2
24 102 22 100 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
24 102 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
output
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
shell-script text-processing awk gawk join
add a comment
|
I have two different large files (each one more than 300,000 lines) and I want to combine them in a specific way. Some rows of the two data measure the same thing; when columns 9, 14, 15, 16, 17 are reciprocally equal, I suppose they are measuring the same thing and I want to output from file1 and skip the file2 rows in order to skip duplicate data. Otherwise, I want to output all rows from both files. The other columns in the duplicate rows may not be equal reciprocally, and the precision of file 1 is better than file 2, and that is the reason for choosing rows from file1 rather than file 2.
For example, since columns 9, 14, 15, 16, and 17 of the three first lines of the two following files are reciprocally equal, so the first three lines of the two datasets measure the same thing and thus I want to output from File 1 and skip the File2 data. For the fourth lines of the datasets, since columns 14 of the two files are not reciprocally equal. I output both lines from two files.
I used this script
awk '!seen[$9,$14,$15,$16,$17]++' File1 File2 > output
, which works well for small data. However, when I use it for large data sets, it skips some not duplicate data and I dont know the reason. It is deeply appreciated if anyone could help which script I can use for merging the datasets.
File1
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
File2
24 102 22 100 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
24 102 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
output
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
shell-script text-processing awk gawk join
I have two different large files (each one more than 300,000 lines) and I want to combine them in a specific way. Some rows of the two data measure the same thing; when columns 9, 14, 15, 16, 17 are reciprocally equal, I suppose they are measuring the same thing and I want to output from file1 and skip the file2 rows in order to skip duplicate data. Otherwise, I want to output all rows from both files. The other columns in the duplicate rows may not be equal reciprocally, and the precision of file 1 is better than file 2, and that is the reason for choosing rows from file1 rather than file 2.
For example, since columns 9, 14, 15, 16, and 17 of the three first lines of the two following files are reciprocally equal, so the first three lines of the two datasets measure the same thing and thus I want to output from File 1 and skip the File2 data. For the fourth lines of the datasets, since columns 14 of the two files are not reciprocally equal. I output both lines from two files.
I used this script
awk '!seen[$9,$14,$15,$16,$17]++' File1 File2 > output
, which works well for small data. However, when I use it for large data sets, it skips some not duplicate data and I dont know the reason. It is deeply appreciated if anyone could help which script I can use for merging the datasets.
File1
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
File2
24 102 22 100 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
24 102 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
output
28 208 48 198 1110 2.04 33 0.0 34.40 3.3 0.0 0 8.0 1985 1 1 SMO1 -9 -9 -9
24 102 26 99 2100 2.61 129 0.0 42.90 3.3 0.0 0 8.0 1985 1 1 EYA -9 -9 -9
89 294 26 106 1162 4.54 -115 0.0 70.80 3.3 0.0 0 8.0 1985 1 1 GYA -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 1985 1 1 KOL1 -9 -9 -9
38 88 41 86 1100 3.50 155 0.0 56.30 3.8 0.0 0 10.0 2000 1 1 KOL1 -9 -9 -9
shell-script text-processing awk gawk join
shell-script text-processing awk gawk join
edited 7 mins ago
Freddy
7,9431 gold badge6 silver badges29 bronze badges
7,9431 gold badge6 silver badges29 bronze badges
asked 58 mins ago
EsiEsi
53 bronze badges
53 bronze badges
add a comment
|
add a comment
|
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542732%2fhow-to-merge-two-files-to-skip-duplicate-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f542732%2fhow-to-merge-two-files-to-skip-duplicate-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown