Match 4 columns and replace 1 in 2 filesMatching values within columnsCompare 2 delimited files and output...

How does the Moon's gravity affect Earth's oceans despite Earth's stronger gravitational pull?

What are the advantages of this gold finger shape?

What are these panels underneath the wing root of a A380?

How would armour (and combat) change if the fighter didn't need to actually wear it?

Setting up a Mathematical Institute of Refereeing?

What allows us to use imaginary numbers?

RAII wrapper for SQLite transactions

How do I answer an interview question about not meeting deadlines?

Escape Velocity - Won't the orbital path just become larger with higher initial velocity?

Is the Microsoft recommendation to use C# properties applicable to game development?

Why don't modern jet engines use forced exhaust mixing?

Insert or push_back to end of a std::vector?

What if a restaurant suddenly cannot accept credit cards, and the customer has no cash?

Does writing regular diary entries count as writing practice?

What is the spellcasting ability of a Barbarian Totem Warrior?

Visa on arrival to exit airport in Russia

Are there liquid fueled rocket boosters having coaxial fuel/oxidizer tanks?

Why is the battery jumpered to a resistor in this schematic?

What should we do with manuals from the 80s?

What would cause a nuclear power plant to break down after 2000 years, but not sooner?

Can anyone help me what's wrong here as i can prove 0 = 1?

100 Years of GCHQ - A quick afternoon puzzle!

Is there a word for returning to unpreparedness?

Is there a fallacy about "appeal to 'big words'"?

Match 4 columns and replace 1 in 2 files

Matching values within columnsCompare 2 delimited files and output differencesCompare two files and matched line send to new fileMerge and print matching and non matching values between a smaller file and a huge fileMatching two main columns at the same time between files, and paste supplementary columns into the output file when those main columns matchMatching 2 main columns between files; and paste other columns into the output file when those main columns match. Keep row size of 1st file intactMerging two files, one column at a timeExtract row if both column values appear in a single column from a separate fileMerging columns from 200+ big files into one tableReplacing matching entries in one column of a file by another column from a different file

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have 2 files, and column 1 of file 1 must replace with column 2 of file 2, after column 2,3,4-5 or 5-4 (cross-match) of file 1 match with the column 1,4,5-6 or 6-5 of file 2.

file 1

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

1:79137     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

1:79033      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

1:118630     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

1:533179     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

file 2

1       1:79033_A_G     0       79033   A       G

1       1:79137_A_T     0       79137   T       A

1       1:118630_C_T    0       118630  T       C

1       1:533179_A_G    0       533179  G       A

I need the output to look like this:

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

    1:79137_A_T     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

    1:79033_A_G      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

    1:118630_C_T     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

    1:533179_A_G     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

The files don't have the exact number of rows and the files are not tab-delimited. I tried the below code but it doesn't work, Can you correct my code?

awk 'NR==FNR{chr[$1]=$1;snp[$2]=$2;pos[$4]=$4;a1[$5]=$5;a2[$6]=$6;next} ($1 in chr)&&($4 in pos)&& ((($5 in a1) && ($6 in a2)) || (($6 in a1) && ($5 in a2))) {$2==snp[$2]}' file 2 file1

asked yesterday

cookiemonster

311 bronze badge

New contributor

Do you need to preserve the exact amount of whitespace between columns, or is it OK to collapse each gap down to a single space (which awk tends to do)?

– JigglyNaga
yesterday

It is OK to collapse each gap down.

– cookiemonster
yesterday

1

Sounds like you might be interested in our sister site: Bioinformatics.

– terdon♦
yesterday

add a comment |

I have 2 files, and column 1 of file 1 must replace with column 2 of file 2, after column 2,3,4-5 or 5-4 (cross-match) of file 1 match with the column 1,4,5-6 or 6-5 of file 2.

file 1

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

1:79137     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

1:79033      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

1:118630     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

1:533179     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

file 2

1       1:79033_A_G     0       79033   A       G

1       1:79137_A_T     0       79137   T       A

1       1:118630_C_T    0       118630  T       C

1       1:533179_A_G    0       533179  G       A

I need the output to look like this:

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

    1:79137_A_T     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

    1:79033_A_G      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

    1:118630_C_T     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

    1:533179_A_G     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

The files don't have the exact number of rows and the files are not tab-delimited. I tried the below code but it doesn't work, Can you correct my code?

awk 'NR==FNR{chr[$1]=$1;snp[$2]=$2;pos[$4]=$4;a1[$5]=$5;a2[$6]=$6;next} ($1 in chr)&&($4 in pos)&& ((($5 in a1) && ($6 in a2)) || (($6 in a1) && ($5 in a2))) {$2==snp[$2]}' file 2 file1

asked yesterday

cookiemonster

311 bronze badge

New contributor

Do you need to preserve the exact amount of whitespace between columns, or is it OK to collapse each gap down to a single space (which awk tends to do)?

– JigglyNaga
yesterday

It is OK to collapse each gap down.

– cookiemonster
yesterday

1

Sounds like you might be interested in our sister site: Bioinformatics.

– terdon♦
yesterday

add a comment |

I have 2 files, and column 1 of file 1 must replace with column 2 of file 2, after column 2,3,4-5 or 5-4 (cross-match) of file 1 match with the column 1,4,5-6 or 6-5 of file 2.

file 1

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

1:79137     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

1:79033      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

1:118630     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

1:533179     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

file 2

1       1:79033_A_G     0       79033   A       G

1       1:79137_A_T     0       79137   T       A

1       1:118630_C_T    0       118630  T       C

1       1:533179_A_G    0       533179  G       A

I need the output to look like this:

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

    1:79137_A_T     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

    1:79033_A_G      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

    1:118630_C_T     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

    1:533179_A_G     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

The files don't have the exact number of rows and the files are not tab-delimited. I tried the below code but it doesn't work, Can you correct my code?

awk 'NR==FNR{chr[$1]=$1;snp[$2]=$2;pos[$4]=$4;a1[$5]=$5;a2[$6]=$6;next} ($1 in chr)&&($4 in pos)&& ((($5 in a1) && ($6 in a2)) || (($6 in a1) && ($5 in a2))) {$2==snp[$2]}' file 2 file1

asked yesterday

cookiemonster

311 bronze badge

New contributor

I have 2 files, and column 1 of file 1 must replace with column 2 of file 2, after column 2,3,4-5 or 5-4 (cross-match) of file 1 match with the column 1,4,5-6 or 6-5 of file 2.

file 1

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

1:79137     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

1:79033      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

1:118630     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

1:533179     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

file 2

1       1:79033_A_G     0       79033   A       G

1       1:79137_A_T     0       79137   T       A

1       1:118630_C_T    0       118630  T       C

1       1:533179_A_G    0       533179  G       A

I need the output to look like this:

SNP     Chr     Pos     EA      NEA     EAF     Beta    SE      Pvalue  Neff

    1:79137_A_T     1       79137       A       T       0.25    -0.026  0.0073  4.0e-04 231420

    1:79033_A_G      1       79033        A       G       0.0047  -0.038  0.056   4.9e-01 225429

    1:118630_C_T     1       118630       C       T       0.99    -0.033  0.055   5.5e-01 226311

    1:533179_A_G     1       533179       A       G       1       -0.098  0.19    6.1e-01 185906

The files don't have the exact number of rows and the files are not tab-delimited. I tried the below code but it doesn't work, Can you correct my code?

awk 'NR==FNR{chr[$1]=$1;snp[$2]=$2;pos[$4]=$4;a1[$5]=$5;a2[$6]=$6;next} ($1 in chr)&&($4 in pos)&& ((($5 in a1) && ($6 in a2)) || (($6 in a1) && ($5 in a2))) {$2==snp[$2]}' file 2 file1

text-processing awk

asked yesterday

cookiemonster

311 bronze badge

New contributor

asked yesterday

cookiemonster

311 bronze badge

New contributor

asked yesterday

cookiemonster

311 bronze badge

New contributor

asked yesterday

cookiemonster

311 bronze badge

asked yesterday

cookiemonster

311 bronze badge

New contributor

Do you need to preserve the exact amount of whitespace between columns, or is it OK to collapse each gap down to a single space (which awk tends to do)?

– JigglyNaga
yesterday

It is OK to collapse each gap down.

– cookiemonster
yesterday

1

Sounds like you might be interested in our sister site: Bioinformatics.

– terdon♦
yesterday

add a comment |

Do you need to preserve the exact amount of whitespace between columns, or is it OK to collapse each gap down to a single space (which awk tends to do)?

– JigglyNaga
yesterday

It is OK to collapse each gap down.

– cookiemonster
yesterday

1

Sounds like you might be interested in our sister site: Bioinformatics.

– terdon♦
yesterday

Do you need to preserve the exact amount of whitespace between columns, or is it OK to collapse each gap down to a single space (which awk tends to do)?

– JigglyNaga
yesterday

It is OK to collapse each gap down.

– cookiemonster
yesterday

Sounds like you might be interested in our sister site: Bioinformatics.

– terdon♦
yesterday

add a comment |

2 Answers
2

active

oldest

votes

I'd do this in Perl because it has a sort function that lets us treat A T and T A as the same thing easily. For example:

$ perl -lane 'if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; }else{$var=join("", sort($F[2],$F[3],$F[4])); $F[1]=$name{$var} if $name{$var};print join "t", @F; } $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Or, slightly more legibly:

$ perl -lane 'if(!$k){

                $name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; 

              }

              else{

                $var=join("", sort($F[2],$F[3],$F[4])); 

                $F[1]=$name{$var} if $name{$var};

                print join "t", @F; 

             } 

            $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Explanation

perl -lane: the -a makes perl act like awk, automatically splitting its input into the array @F on whitespace. Since perl arrays start at 0, $F[0] will be the first field, $F[1] will be the second etc. Field N is $F[N-1]. The -n makes perl read its arguments as text files and apply the script given by -e to each line of them. The -l just removes trailing newlines from each input line and adds a newline to each print call.

$k++ if eof : this increments the the variable $k by 1 if we've reached the end of a file (eof). We can then use if(!$k) (if $k is not defined) as an equivalent to NR==FNR in awk.

if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1];}: if this is the first file,file2, sort fields 4,5 and 6, join them into a string and use that string as the key in the hash (associative array)name. Then, save the variant's name from file2 as the value associated with that key. The sorting lets us treatA TandT A` as equivalent.

else{ : if we're now reading the second file, file1.

$var=join("", sort($F[2],$F[3],$F[4])); : build the key. This time using fields 3, 4 and 5.

$F[1]=$name{$var} if $name{$var}; : set the 2nd field to the value stored in the name hash, if there is a value for this key. The if is needed to make sure we don't change the header or any other variants that might be present in file1 but not in file2.

print join "t", @F; : print the fields, including the change just made above.

edited yesterday

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

add a comment |

The join command will do the work of joining up matching lines from multiple files. But it has some requirements on its input files, so you'll need to make some temporary files along the way, with a few extra fields.

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }' < file1 | sort > 1.tmp

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}' < file2 | sort > 2.tmp



sed q file1

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp | sort -t % -n | awk -F %  '{print $2" "$3}'

Step by step

Preprocessing the first file:

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }'

Example output:

1 118630 C T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311%4

Those 3 fields, separated by %, are:

the "key" that has to be matched (input fields 2-5)

the original line, minus the first column (which is going to be replaced)

the original line number (so we can restore the file order after sort)

This output is piped through sort and into a temporary file, because join requires its inputs to have been sorted.

For the second file:

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}'

Example output:

1 118630 C T%1:118630_C_T

1 118630 T C%1:118630_C_T

As you specified that fields 5 and 6 should match either way round, a second line is printed with them swapped (provided that they aren't identical). The %-separated fields here are

the "key" to be matched

column 2

Again, the output is piped through sort and into another temporary file.

The sed q copies the first line, ie. the headers, from file1. (Unlike all the awk steps, it doesn't collapse the whitespace, so the columns are even less well aligned than in the input.)

Then comes the main "join" step:

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp

The -t % sets the separator to % (rather than whitespace). The -o argument produces the following three fields of output:

file 1, column 3: the line number

file 2, column 2: the original column 2 from file2

file 1, column 2: the rest of the line from file1

Example output line:

4%1:118630_C_T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

Then sort can restore the original file order (sort numerically, field separator %)

sort -t % -n

and a final awk can remove that line number and the %s

awk -F %  '{print $2" "$3}'

Final output line:

1:118630_C_T  1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

edited yesterday

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

cookiemonster is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f535726%2fmatch-4-columns-and-replace-1-in-2-files%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I'd do this in Perl because it has a sort function that lets us treat A T and T A as the same thing easily. For example:

$ perl -lane 'if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; }else{$var=join("", sort($F[2],$F[3],$F[4])); $F[1]=$name{$var} if $name{$var};print join "t", @F; } $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Or, slightly more legibly:

$ perl -lane 'if(!$k){

                $name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; 

              }

              else{

                $var=join("", sort($F[2],$F[3],$F[4])); 

                $F[1]=$name{$var} if $name{$var};

                print join "t", @F; 

             } 

            $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Explanation

perl -lane: the -a makes perl act like awk, automatically splitting its input into the array @F on whitespace. Since perl arrays start at 0, $F[0] will be the first field, $F[1] will be the second etc. Field N is $F[N-1]. The -n makes perl read its arguments as text files and apply the script given by -e to each line of them. The -l just removes trailing newlines from each input line and adds a newline to each print call.

$k++ if eof : this increments the the variable $k by 1 if we've reached the end of a file (eof). We can then use if(!$k) (if $k is not defined) as an equivalent to NR==FNR in awk.

if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1];}: if this is the first file,file2, sort fields 4,5 and 6, join them into a string and use that string as the key in the hash (associative array)name. Then, save the variant's name from file2 as the value associated with that key. The sorting lets us treatA TandT A` as equivalent.

else{ : if we're now reading the second file, file1.

$var=join("", sort($F[2],$F[3],$F[4])); : build the key. This time using fields 3, 4 and 5.

$F[1]=$name{$var} if $name{$var}; : set the 2nd field to the value stored in the name hash, if there is a value for this key. The if is needed to make sure we don't change the header or any other variants that might be present in file1 but not in file2.

print join "t", @F; : print the fields, including the change just made above.

edited yesterday

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

add a comment |

I'd do this in Perl because it has a sort function that lets us treat A T and T A as the same thing easily. For example:

$ perl -lane 'if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; }else{$var=join("", sort($F[2],$F[3],$F[4])); $F[1]=$name{$var} if $name{$var};print join "t", @F; } $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Or, slightly more legibly:

$ perl -lane 'if(!$k){

                $name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; 

              }

              else{

                $var=join("", sort($F[2],$F[3],$F[4])); 

                $F[1]=$name{$var} if $name{$var};

                print join "t", @F; 

             } 

            $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Explanation

perl -lane: the -a makes perl act like awk, automatically splitting its input into the array @F on whitespace. Since perl arrays start at 0, $F[0] will be the first field, $F[1] will be the second etc. Field N is $F[N-1]. The -n makes perl read its arguments as text files and apply the script given by -e to each line of them. The -l just removes trailing newlines from each input line and adds a newline to each print call.

$k++ if eof : this increments the the variable $k by 1 if we've reached the end of a file (eof). We can then use if(!$k) (if $k is not defined) as an equivalent to NR==FNR in awk.

if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1];}: if this is the first file,file2, sort fields 4,5 and 6, join them into a string and use that string as the key in the hash (associative array)name. Then, save the variant's name from file2 as the value associated with that key. The sorting lets us treatA TandT A` as equivalent.

else{ : if we're now reading the second file, file1.

$var=join("", sort($F[2],$F[3],$F[4])); : build the key. This time using fields 3, 4 and 5.

$F[1]=$name{$var} if $name{$var}; : set the 2nd field to the value stored in the name hash, if there is a value for this key. The if is needed to make sure we don't change the header or any other variants that might be present in file1 but not in file2.

print join "t", @F; : print the fields, including the change just made above.

edited yesterday

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

add a comment |

I'd do this in Perl because it has a sort function that lets us treat A T and T A as the same thing easily. For example:

$ perl -lane 'if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; }else{$var=join("", sort($F[2],$F[3],$F[4])); $F[1]=$name{$var} if $name{$var};print join "t", @F; } $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Or, slightly more legibly:

$ perl -lane 'if(!$k){

                $name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; 

              }

              else{

                $var=join("", sort($F[2],$F[3],$F[4])); 

                $F[1]=$name{$var} if $name{$var};

                print join "t", @F; 

             } 

            $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Explanation

perl -lane: the -a makes perl act like awk, automatically splitting its input into the array @F on whitespace. Since perl arrays start at 0, $F[0] will be the first field, $F[1] will be the second etc. Field N is $F[N-1]. The -n makes perl read its arguments as text files and apply the script given by -e to each line of them. The -l just removes trailing newlines from each input line and adds a newline to each print call.

$k++ if eof : this increments the the variable $k by 1 if we've reached the end of a file (eof). We can then use if(!$k) (if $k is not defined) as an equivalent to NR==FNR in awk.

if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1];}: if this is the first file,file2, sort fields 4,5 and 6, join them into a string and use that string as the key in the hash (associative array)name. Then, save the variant's name from file2 as the value associated with that key. The sorting lets us treatA TandT A` as equivalent.

else{ : if we're now reading the second file, file1.

$var=join("", sort($F[2],$F[3],$F[4])); : build the key. This time using fields 3, 4 and 5.

$F[1]=$name{$var} if $name{$var}; : set the 2nd field to the value stored in the name hash, if there is a value for this key. The if is needed to make sure we don't change the header or any other variants that might be present in file1 but not in file2.

print join "t", @F; : print the fields, including the change just made above.

edited yesterday

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

I'd do this in Perl because it has a sort function that lets us treat A T and T A as the same thing easily. For example:

$ perl -lane 'if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; }else{$var=join("", sort($F[2],$F[3],$F[4])); $F[1]=$name{$var} if $name{$var};print join "t", @F; } $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Or, slightly more legibly:

$ perl -lane 'if(!$k){

                $name{join("",sort($F[3],$F[4],$F[5]))}=$F[1]; 

              }

              else{

                $var=join("", sort($F[2],$F[3],$F[4])); 

                $F[1]=$name{$var} if $name{$var};

                print join "t", @F; 

             } 

            $k++ if eof' file2 file1

SNP Chr Pos EA  NEA EAF Beta    SE  Pvalue  Neff

1:79137 1:79137_A_T 79137   A   T   0.25    -0.026  0.0073  4.0e-04 231420

1:79033 1:79033_A_G 79033   A   G   0.0047  -0.038  0.056   4.9e-01 225429

1:118630    1:118630_C_T    118630  C   T   0.99    -0.033  0.055   5.5e-01 226311

1:533179    1:533179_A_G    533179  A   G   1   -0.098  0.19    6.1e-01 185906

Explanation

perl -lane: the -a makes perl act like awk, automatically splitting its input into the array @F on whitespace. Since perl arrays start at 0, $F[0] will be the first field, $F[1] will be the second etc. Field N is $F[N-1]. The -n makes perl read its arguments as text files and apply the script given by -e to each line of them. The -l just removes trailing newlines from each input line and adds a newline to each print call.

$k++ if eof : this increments the the variable $k by 1 if we've reached the end of a file (eof). We can then use if(!$k) (if $k is not defined) as an equivalent to NR==FNR in awk.

if(!$k){$name{join("",sort($F[3],$F[4],$F[5]))}=$F[1];}: if this is the first file,file2, sort fields 4,5 and 6, join them into a string and use that string as the key in the hash (associative array)name. Then, save the variant's name from file2 as the value associated with that key. The sorting lets us treatA TandT A` as equivalent.

else{ : if we're now reading the second file, file1.

$var=join("", sort($F[2],$F[3],$F[4])); : build the key. This time using fields 3, 4 and 5.

$F[1]=$name{$var} if $name{$var}; : set the 2nd field to the value stored in the name hash, if there is a value for this key. The if is needed to make sure we don't change the header or any other variants that might be present in file1 but not in file2.

print join "t", @F; : print the fields, including the change just made above.

edited yesterday

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

edited yesterday

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

answered yesterday

terdon♦

140k34 gold badges287 silver badges466 bronze badges

add a comment |

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }' < file1 | sort > 1.tmp

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}' < file2 | sort > 2.tmp



sed q file1

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp | sort -t % -n | awk -F %  '{print $2" "$3}'

Step by step

Preprocessing the first file:

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }'

Example output:

1 118630 C T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311%4

Those 3 fields, separated by %, are:

the "key" that has to be matched (input fields 2-5)

the original line, minus the first column (which is going to be replaced)

the original line number (so we can restore the file order after sort)

This output is piped through sort and into a temporary file, because join requires its inputs to have been sorted.

For the second file:

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}'

Example output:

1 118630 C T%1:118630_C_T

1 118630 T C%1:118630_C_T

As you specified that fields 5 and 6 should match either way round, a second line is printed with them swapped (provided that they aren't identical). The %-separated fields here are

the "key" to be matched

column 2

Again, the output is piped through sort and into another temporary file.

The sed q copies the first line, ie. the headers, from file1. (Unlike all the awk steps, it doesn't collapse the whitespace, so the columns are even less well aligned than in the input.)

Then comes the main "join" step:

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp

The -t % sets the separator to % (rather than whitespace). The -o argument produces the following three fields of output:

file 1, column 3: the line number

file 2, column 2: the original column 2 from file2

file 1, column 2: the rest of the line from file1

Example output line:

4%1:118630_C_T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

Then sort can restore the original file order (sort numerically, field separator %)

sort -t % -n

and a final awk can remove that line number and the %s

awk -F %  '{print $2" "$3}'

Final output line:

1:118630_C_T  1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

edited yesterday

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

add a comment |

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }' < file1 | sort > 1.tmp

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}' < file2 | sort > 2.tmp



sed q file1

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp | sort -t % -n | awk -F %  '{print $2" "$3}'

Step by step

Preprocessing the first file:

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }'

Example output:

1 118630 C T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311%4

Those 3 fields, separated by %, are:

the "key" that has to be matched (input fields 2-5)

the original line, minus the first column (which is going to be replaced)

the original line number (so we can restore the file order after sort)

This output is piped through sort and into a temporary file, because join requires its inputs to have been sorted.

For the second file:

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}'

Example output:

1 118630 C T%1:118630_C_T

1 118630 T C%1:118630_C_T

As you specified that fields 5 and 6 should match either way round, a second line is printed with them swapped (provided that they aren't identical). The %-separated fields here are

the "key" to be matched

column 2

Again, the output is piped through sort and into another temporary file.

The sed q copies the first line, ie. the headers, from file1. (Unlike all the awk steps, it doesn't collapse the whitespace, so the columns are even less well aligned than in the input.)

Then comes the main "join" step:

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp

The -t % sets the separator to % (rather than whitespace). The -o argument produces the following three fields of output:

file 1, column 3: the line number

file 2, column 2: the original column 2 from file2

file 1, column 2: the rest of the line from file1

Example output line:

4%1:118630_C_T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

Then sort can restore the original file order (sort numerically, field separator %)

sort -t % -n

and a final awk can remove that line number and the %s

awk -F %  '{print $2" "$3}'

Final output line:

1:118630_C_T  1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

edited yesterday

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

add a comment |

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }' < file1 | sort > 1.tmp

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}' < file2 | sort > 2.tmp



sed q file1

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp | sort -t % -n | awk -F %  '{print $2" "$3}'

Step by step

Preprocessing the first file:

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }'

Example output:

1 118630 C T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311%4

Those 3 fields, separated by %, are:

the "key" that has to be matched (input fields 2-5)

the original line, minus the first column (which is going to be replaced)

the original line number (so we can restore the file order after sort)

This output is piped through sort and into a temporary file, because join requires its inputs to have been sorted.

For the second file:

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}'

Example output:

1 118630 C T%1:118630_C_T

1 118630 T C%1:118630_C_T

As you specified that fields 5 and 6 should match either way round, a second line is printed with them swapped (provided that they aren't identical). The %-separated fields here are

the "key" to be matched

column 2

Again, the output is piped through sort and into another temporary file.

The sed q copies the first line, ie. the headers, from file1. (Unlike all the awk steps, it doesn't collapse the whitespace, so the columns are even less well aligned than in the input.)

Then comes the main "join" step:

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp

The -t % sets the separator to % (rather than whitespace). The -o argument produces the following three fields of output:

file 1, column 3: the line number

file 2, column 2: the original column 2 from file2

file 1, column 2: the rest of the line from file1

Example output line:

4%1:118630_C_T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

Then sort can restore the original file order (sort numerically, field separator %)

sort -t % -n

and a final awk can remove that line number and the %s

awk -F %  '{print $2" "$3}'

Final output line:

1:118630_C_T  1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

edited yesterday

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }' < file1 | sort > 1.tmp

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}' < file2 | sort > 2.tmp



sed q file1

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp | sort -t % -n | awk -F %  '{print $2" "$3}'

Step by step

Preprocessing the first file:

awk '{$1="";print $2" "$3" "$4" "$5 "%" $0 "%" NR }'

Example output:

1 118630 C T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311%4

Those 3 fields, separated by %, are:

the "key" that has to be matched (input fields 2-5)

the original line, minus the first column (which is going to be replaced)

the original line number (so we can restore the file order after sort)

This output is piped through sort and into a temporary file, because join requires its inputs to have been sorted.

For the second file:

awk '{print $1" "$4" "$5" "$6"%"$2} $5 != $6 {print $1" "$4" "$6" "$5"%"$2}'

Example output:

1 118630 C T%1:118630_C_T

1 118630 T C%1:118630_C_T

As you specified that fields 5 and 6 should match either way round, a second line is printed with them swapped (provided that they aren't identical). The %-separated fields here are

the "key" to be matched

column 2

Again, the output is piped through sort and into another temporary file.

The sed q copies the first line, ie. the headers, from file1. (Unlike all the awk steps, it doesn't collapse the whitespace, so the columns are even less well aligned than in the input.)

Then comes the main "join" step:

join -t % -o 1.3 2.2 1.2  1.tmp 2.tmp

The -t % sets the separator to % (rather than whitespace). The -o argument produces the following three fields of output:

file 1, column 3: the line number

file 2, column 2: the original column 2 from file2

file 1, column 2: the rest of the line from file1

Example output line:

4%1:118630_C_T% 1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

Then sort can restore the original file order (sort numerically, field separator %)

sort -t % -n

and a final awk can remove that line number and the %s

awk -F %  '{print $2" "$3}'

Final output line:

1:118630_C_T  1 118630 C T 0.99 -0.033 0.055 5.5e-01 226311

edited yesterday

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

edited yesterday

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

answered yesterday

JigglyNaga

4,41210 silver badges37 bronze badges

add a comment |

cookiemonster is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

cookiemonster is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs

Match 4 columns and replace 1 in 2 filesMatching values within columnsCompare 2 delimited files and output...

2 Answers
2

Explanation

Step by step

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Explanation

Explanation

Explanation

Explanation

Step by step

Step by step

Step by step

Step by step

Post as a guest

Popular posts from this blog

Hudson River Historic District Contents Geography History The district today Aesthetics Cultural...

The number designs the writing. Feandra Aversely Definition: The act of ingrafting a sprig or shoot of one...

Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

Match 4 columns and replace 1 in 2 filesMatching values within columnsCompare 2 delimited files and output...

2 Answers 2

Explanation

Step by step

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Explanation

Explanation

Explanation

Explanation

Step by step

Step by step

Step by step

Step by step

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Hudson River Historic District Contents Geography History The district today Aesthetics Cultural...

The number designs the writing. Feandra Aversely Definition: The act of ingrafting a sprig or shoot of one...

Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

2 Answers
2

2 Answers
2

2 Answers
2