conditional replacing rows with a number Announcing the arrival of Valued Associate #679:...

Is it fair for a professor to grade us on the possession of past papers?

Why are there no cargo aircraft with "flying wing" design?

What does the "x" in "x86" represent?

For a new assistant professor in CS, how to build/manage a publication pipeline

If my PI received research grants from a company to be able to pay my postdoc salary, did I have a potential conflict interest too?

What is the meaning of the simile “quick as silk”?

Can anything be seen from the center of the Boötes void? How dark would it be?

Extracting terms with certain heads in a function

How do I stop a creek from eroding my steep embankment?

Why do we bend a book to keep it straight?

Can a party unilaterally change candidates in preparation for a General election?

Using audio cues to encourage good posture

Is it common practice to audition new musicians 1-2-1 before rehearsing with the entire band?

Significance of Cersei's obsession with elephants?

Irreducible of finite Krull dimension implies quasi-compact?

Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?

Why didn't Eitri join the fight?

When the Haste spell ends on a creature, do attackers have advantage against that creature?

What causes the direction of lightning flashes?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

What is homebrew?

First console to have temporary backward compatibility

Is it cost-effective to upgrade an old-ish Giant Escape R3 commuter bike with entry-level branded parts (wheels, drivetrain)?

Do I really need to have a message in a novel to appeal to readers?



conditional replacing rows with a number



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionbash scripts - remove duplicate rows with smaller valuePerl one-liner for replacing values greater than a threshholdsort CSV by number of column in rows?How to select rows based on how many consecutive times a number is present in a column?How to calculate the average number of columns across the rows as well as the maximum numbers of columns in a file in unix?How to split rows in a huge data file based on number of column within them in linux ?How to join rows with single columns to a maximum of 4 columns in one row?How to get count of unique rows in a file?replacing values in one with the values in another fileextract columns from TRUE/FALSE matrix based on proportion of TRUE values within the column





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







4















I have a directory containing nearly 11 million small files: like this



wa_filtering_DP15_good_pops_snps_file_1
wa_filtering_DP15_good_pops_snps_file_2
.
.
.
wa_filtering_DP15_good_pops_snps_file_11232111


and each file has only 2 rows and 315 columns looks like this:



1   0   0   0   0   0   0   0   0   0   1   2   1   
0 0 0 0 0 0 0 0 0 0 0 0 0


I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:



1   9   9   9   9   9   9   9   9   9   1   2   1   
0 9 9 9 9 9 9 9 9 9 0 0 0


Can someone help me out to figure out how to do that?
Thanks










share|improve this question

























  • With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

    – glenn jackman
    Sep 20 '17 at 19:33











  • I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

    – Wildcard
    Sep 20 '17 at 21:08


















4















I have a directory containing nearly 11 million small files: like this



wa_filtering_DP15_good_pops_snps_file_1
wa_filtering_DP15_good_pops_snps_file_2
.
.
.
wa_filtering_DP15_good_pops_snps_file_11232111


and each file has only 2 rows and 315 columns looks like this:



1   0   0   0   0   0   0   0   0   0   1   2   1   
0 0 0 0 0 0 0 0 0 0 0 0 0


I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:



1   9   9   9   9   9   9   9   9   9   1   2   1   
0 9 9 9 9 9 9 9 9 9 0 0 0


Can someone help me out to figure out how to do that?
Thanks










share|improve this question

























  • With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

    – glenn jackman
    Sep 20 '17 at 19:33











  • I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

    – Wildcard
    Sep 20 '17 at 21:08














4












4








4








I have a directory containing nearly 11 million small files: like this



wa_filtering_DP15_good_pops_snps_file_1
wa_filtering_DP15_good_pops_snps_file_2
.
.
.
wa_filtering_DP15_good_pops_snps_file_11232111


and each file has only 2 rows and 315 columns looks like this:



1   0   0   0   0   0   0   0   0   0   1   2   1   
0 0 0 0 0 0 0 0 0 0 0 0 0


I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:



1   9   9   9   9   9   9   9   9   9   1   2   1   
0 9 9 9 9 9 9 9 9 9 0 0 0


Can someone help me out to figure out how to do that?
Thanks










share|improve this question
















I have a directory containing nearly 11 million small files: like this



wa_filtering_DP15_good_pops_snps_file_1
wa_filtering_DP15_good_pops_snps_file_2
.
.
.
wa_filtering_DP15_good_pops_snps_file_11232111


and each file has only 2 rows and 315 columns looks like this:



1   0   0   0   0   0   0   0   0   0   1   2   1   
0 0 0 0 0 0 0 0 0 0 0 0 0


I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:



1   9   9   9   9   9   9   9   9   9   1   2   1   
0 9 9 9 9 9 9 9 9 9 0 0 0


Can someone help me out to figure out how to do that?
Thanks







text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 20 '17 at 18:03









Jeff Schaller

45.1k1164147




45.1k1164147










asked Sep 20 '17 at 17:29









Anna1364Anna1364

456214




456214













  • With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

    – glenn jackman
    Sep 20 '17 at 19:33











  • I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

    – Wildcard
    Sep 20 '17 at 21:08



















  • With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

    – glenn jackman
    Sep 20 '17 at 19:33











  • I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

    – Wildcard
    Sep 20 '17 at 21:08

















With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33





With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33













I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08





I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08










6 Answers
6






active

oldest

votes


















1














Here is awk solution.



awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";
for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile


Explanations:




  • split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.


  • getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.


  • for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.


  • for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.





To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.



awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out";
for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"
}' wa_filtering_DP15_good_pops_snps_file_{1..11232111}





share|improve this answer


























  • If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

    – Kusalananda
    Sep 20 '17 at 18:24











  • this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

    – αғsнιη
    Sep 20 '17 at 18:25





















1














For kicks, here's Ruby



ruby -e '
data = File.readlines(ARGV.shift)
.map {|line| line.split.map(&:to_i)}
.transpose
.map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
.transpose
.each {|row| puts row.join(" ")}
' file




1 9 9 9 9 9 9 9 9 9 1 2 1
0 9 9 9 9 9 9 9 9 9 0 0 0


To replace all the files:



ruby -e '
require "tempfile"
require "pathname"
Pathname.new("/path/to/your/files/").each_child do |pathname|
next unless pathname.file?
temp = Tempfile.new(pathname.basename.to_s)
filename = pathname.to_s
File.readlines(filename)
.map {|line| line.split.map(&:to_i)}
.transpose
.map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
.transpose
.each {|row| temp.puts row.join(" ")}
temp.close
File.link filename, filename+".bak"
File.rename temp.path, filename
end
'





share|improve this answer


























  • You may not want the File.link step if you're running out of inodes.

    – glenn jackman
    Sep 20 '17 at 19:34



















1














This is an alternative approach, which might be slow for million of files compared to pure awk solutions.



Using something like this, you can transpose rows to columns:



$ cat file1
1 0 0 0 0 0 0 0 0 0 1 2 1
0 0 0 0 0 0 0 0 0 0 0 0 0

$ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')
1-0
0-0
0-0
0-0
0-0
0-0
0-0
0-0
0-0
0-0
1-0
2-0
1-0


You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:



$ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))
$ echo "$f1"
1-0
9-9
9-9
9-9
9-9
9-9
9-9
9-9
9-9
9-9
1-0
2-0
1-0


You can now revert back from columns to rows like:



$ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")
1 9 9 9 9 9 9 9 9 9 1 2 1
0 9 9 9 9 9 9 9 9 9 0 0 0


And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.



Only thing left is to loop over all files. Can be done with a kind of bash loop:



for f in ./wa_filtering_DP15_good_pops_snps_file_*;do
f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))
awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...
done





share|improve this answer































    1














    With awk:



    NR == 1 {   # save the values from 1st line in array t
    split($0, t, FS);
    }

    NR == 2 { # compare values from second line with those stored in array t
    for ( i = 1; i <= NF; ++i ) {
    # build l1 and l2 (line 1 and line 2) based on comparison
    if ($i == 0 && t[i] == 0) {
    l1 = (i == 1 ? 9 : l1 OFS 9 );
    l2 = (i == 1 ? 9 : l2 OFS 9 );
    } else {
    l1 = (i == 1 ? t[i] : l1 OFS t[i] );
    l2 = (i == 1 ? $i : l2 OFS $i );
    }
    }
    }

    END { # output the two constructed lines
    print l1;
    print l2;
    }


    Running it on the example file:



    $ awk -f script.awk file
    1 9 9 9 9 9 9 9 9 9 1 2 1
    0 9 9 9 9 9 9 9 9 9 0 0 0


    Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:



    mkdir modified

    for name in wa_filtering_DP15_good_pops_snps_file_*; do
    awk -f script.awk "$name" >"modified/$name.new"
    done


    This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.




    • I opted for creating new files so that the originals are left unmodified.

    • I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.


    In general, try not to create millions of files in a single directory. Instead either




    1. create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

    2. create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.




    The following variant will be more efficiently run on millions of files:



    FNR == 1    {   # save the values from 1st line in array t
    split($0, t, FS);
    }

    FNR == 2 { # compare values from second line with those stored in array t
    for ( i = 1; i <= NF; ++i ) {
    # build l1 and l2 (line 1 and line 2) based on comparison
    if ($i == 0 && t[i] == 0) {
    l1 = (i == 1 ? 9 : l1 OFS 9 );
    l2 = (i == 1 ? 9 : l2 OFS 9 );
    } else {
    l1 = (i == 1 ? t[i] : l1 OFS t[i] );
    l2 = (i == 1 ? $i : l2 OFS $i );
    }
    }

    # create output filename based on input filename
    # and output the two lines
    f = "modified/" FILENAME ".new";
    print l1 >f;
    print l2 >f;
    }


    To run it:



    mkdir modified

    find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*'
    -exec awk -f script.awk {} +


    The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.






    share|improve this answer


























    • I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

      – MiniMax
      Sep 21 '17 at 12:57













    • @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

      – Kusalananda
      Sep 21 '17 at 13:00



















    0














    First variant:



    For single file:



    datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose


    For many files do the same in the loop:



    for i in *; do datamash -W transpose < "$i" |
    sed 's/0t0/9t9/' |
    datamash transpose > "new_$i"; done


    This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.



    Second variant:



    This is a solution for the single file, for multiple files use loop, as in the previous variant.



    tr 'n' 't' < input.txt |
    awk '{
    num = NF / 2;
    for(up = 1; up <= NF; up++) {
    if(up <= num) {
    low = num + up;
    if(!$up && !$low) {
    $up = 9;
    $low = 9;
    }
    }

    printf "%st", $up;

    if(up % num == 0)
    print "";
    }
    }'


    Explanation





    1. tr 'n' 't' < input.txt - join two lines together.


    2. awk


      • checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.

      • if both elements are 0, it changes them to 9.

      • print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.

      • Each time the element number is a multiple of the number of elements in the line, adds a new line.




    Input



    1   0   0   0   0   0   0   0   0   0   1   2   1
    0 0 0 0 0 0 0 0 0 0 0 0 0


    Output



    1   9   9   9   9   9   9   9   9   9   1   2   1
    0 9 9 9 9 9 9 9 9 9 0 0 0





    share|improve this answer


























    • 'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

      – RonJohn
      Sep 21 '17 at 0:59






    • 1





      @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

      – Kusalananda
      Sep 21 '17 at 6:15











    • But the '*' will expand in the bash command buffer, certainly overflowing it.

      – RonJohn
      Sep 22 '17 at 15:40











    • @RonJohn Here is the answer to your question.

      – MiniMax
      Sep 22 '17 at 17:22



















    0














    Probably not efficient enough for 11 million files but it's a different approach on the substitution. Takes one argument on the command line; the name of the directory where all of the files are stored. The name for the directory could be hard coded instead (see notes in code). The base name for the file is already hard coded without the number at the end (not required).



    #!/bin/bash

    # compare two rows in a file
    # when both are 0, change both to 9
    # otherwise keep original value

    ProgName=${0##*/}
    Pid=$$
    DBG_FNAME=""
    scriptUsage() {
    cat <<ENDUSE

    $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]

    path/to/directory: Path to directory (NO trailing '/')
    -f|--filename: Print the each file name to stdout after complete
    -d|--debug: Run in debug mode (Implies filename option - SEE NOTE*)
    -h|--help: Print this help message

    NOTE: USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]
    You DO NOT need both together!

    ENDUSE
    }

    # check args
    #!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'
    [[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }
    [[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }
    [[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }
    if (( $# > 2 ))
    then
    DBG_FNAME=1
    >&2 echo "Running in debug mode from using ${2} & ${3} together!"
    echo "PID is: $Pid"
    sleep 2
    set -x
    else
    [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1
    [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }
    fi
    #!!# to here #!!#
    # directory as arg[1] or change to hardcoded
    WorkDir="$1"

    # check for/remove trailing slash
    [[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}

    # given file root withOUT number ending
    WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"


    ##== MAIN LOOP
    for file in ${WorkFile}*
    do
    # reset these after each file
    TopRow=""
    BotRow=""
    NewTop=""
    NewBot=""
    SKIPME=""

    # get top row of file
    TopRow=$(sed -n '1{p;q}' $file)
    # get bottom row of file
    BotRow=$(sed -n '2{p;q}' $file)

    ##-- EACH FILE LOOP
    for (( f=0; f<${#TopRow}; f++ ))
    do
    if [[ -n $SKIPME ]]
    then
    # SKIPME is -z by default so
    # this runs every other time through
    NewTop="${NewTop} "
    NewBot="${NewBot} "
    SKIPME=""
    elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))
    then
    # 0+0=0 so change to 9
    NewTop="${NewTop}9"
    NewBot="${NewBot}9"
    SKIPME=1
    else
    # (1+0 or 0+1)!=0 so keep originals
    NewTop="${NewTop}${TopRow:${f}:1}"
    NewBot="${NewBot}${BotRow:${f}:1}"
    SKIPME=1
    fi
    done
    ##--

    # overwrite original file
    printf "%sn%s" "$NewTop" "$NewBot" > $file

    # if -f|--filename given print file name
    [[ -n $DBG_FNAME ]] && echo "$file is complete"
    done
    ##==


    DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.






    share|improve this answer


























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393455%2fconditional-replacing-rows-with-a-number%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      Here is awk solution.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile


      Explanations:




      • split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.


      • getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.


      • for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.


      • for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.





      To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"
      }' wa_filtering_DP15_good_pops_snps_file_{1..11232111}





      share|improve this answer


























      • If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

        – Kusalananda
        Sep 20 '17 at 18:24











      • this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

        – αғsнιη
        Sep 20 '17 at 18:25


















      1














      Here is awk solution.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile


      Explanations:




      • split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.


      • getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.


      • for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.


      • for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.





      To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"
      }' wa_filtering_DP15_good_pops_snps_file_{1..11232111}





      share|improve this answer


























      • If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

        – Kusalananda
        Sep 20 '17 at 18:24











      • this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

        – αғsнιη
        Sep 20 '17 at 18:25
















      1












      1








      1







      Here is awk solution.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile


      Explanations:




      • split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.


      • getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.


      • for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.


      • for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.





      To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"
      }' wa_filtering_DP15_good_pops_snps_file_{1..11232111}





      share|improve this answer















      Here is awk solution.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile


      Explanations:




      • split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.


      • getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.


      • for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.


      • for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.





      To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.



      awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 
      for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}
      END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out";
      for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"
      }' wa_filtering_DP15_good_pops_snps_file_{1..11232111}






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Sep 28 '17 at 16:49

























      answered Sep 20 '17 at 18:11









      αғsнιηαғsнιη

      17.4k103070




      17.4k103070













      • If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

        – Kusalananda
        Sep 20 '17 at 18:24











      • this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

        – αғsнιη
        Sep 20 '17 at 18:25





















      • If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

        – Kusalananda
        Sep 20 '17 at 18:24











      • this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

        – αғsнιη
        Sep 20 '17 at 18:25



















      If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

      – Kusalananda
      Sep 20 '17 at 18:24





      If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

      – Kusalananda
      Sep 20 '17 at 18:24













      this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

      – αғsнιη
      Sep 20 '17 at 18:25







      this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

      – αғsнιη
      Sep 20 '17 at 18:25















      1














      For kicks, here's Ruby



      ruby -e '
      data = File.readlines(ARGV.shift)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| puts row.join(" ")}
      ' file




      1 9 9 9 9 9 9 9 9 9 1 2 1
      0 9 9 9 9 9 9 9 9 9 0 0 0


      To replace all the files:



      ruby -e '
      require "tempfile"
      require "pathname"
      Pathname.new("/path/to/your/files/").each_child do |pathname|
      next unless pathname.file?
      temp = Tempfile.new(pathname.basename.to_s)
      filename = pathname.to_s
      File.readlines(filename)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| temp.puts row.join(" ")}
      temp.close
      File.link filename, filename+".bak"
      File.rename temp.path, filename
      end
      '





      share|improve this answer


























      • You may not want the File.link step if you're running out of inodes.

        – glenn jackman
        Sep 20 '17 at 19:34
















      1














      For kicks, here's Ruby



      ruby -e '
      data = File.readlines(ARGV.shift)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| puts row.join(" ")}
      ' file




      1 9 9 9 9 9 9 9 9 9 1 2 1
      0 9 9 9 9 9 9 9 9 9 0 0 0


      To replace all the files:



      ruby -e '
      require "tempfile"
      require "pathname"
      Pathname.new("/path/to/your/files/").each_child do |pathname|
      next unless pathname.file?
      temp = Tempfile.new(pathname.basename.to_s)
      filename = pathname.to_s
      File.readlines(filename)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| temp.puts row.join(" ")}
      temp.close
      File.link filename, filename+".bak"
      File.rename temp.path, filename
      end
      '





      share|improve this answer


























      • You may not want the File.link step if you're running out of inodes.

        – glenn jackman
        Sep 20 '17 at 19:34














      1












      1








      1







      For kicks, here's Ruby



      ruby -e '
      data = File.readlines(ARGV.shift)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| puts row.join(" ")}
      ' file




      1 9 9 9 9 9 9 9 9 9 1 2 1
      0 9 9 9 9 9 9 9 9 9 0 0 0


      To replace all the files:



      ruby -e '
      require "tempfile"
      require "pathname"
      Pathname.new("/path/to/your/files/").each_child do |pathname|
      next unless pathname.file?
      temp = Tempfile.new(pathname.basename.to_s)
      filename = pathname.to_s
      File.readlines(filename)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| temp.puts row.join(" ")}
      temp.close
      File.link filename, filename+".bak"
      File.rename temp.path, filename
      end
      '





      share|improve this answer















      For kicks, here's Ruby



      ruby -e '
      data = File.readlines(ARGV.shift)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| puts row.join(" ")}
      ' file




      1 9 9 9 9 9 9 9 9 9 1 2 1
      0 9 9 9 9 9 9 9 9 9 0 0 0


      To replace all the files:



      ruby -e '
      require "tempfile"
      require "pathname"
      Pathname.new("/path/to/your/files/").each_child do |pathname|
      next unless pathname.file?
      temp = Tempfile.new(pathname.basename.to_s)
      filename = pathname.to_s
      File.readlines(filename)
      .map {|line| line.split.map(&:to_i)}
      .transpose
      .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}
      .transpose
      .each {|row| temp.puts row.join(" ")}
      temp.close
      File.link filename, filename+".bak"
      File.rename temp.path, filename
      end
      '






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Sep 20 '17 at 19:28

























      answered Sep 20 '17 at 18:37









      glenn jackmanglenn jackman

      53.1k573114




      53.1k573114













      • You may not want the File.link step if you're running out of inodes.

        – glenn jackman
        Sep 20 '17 at 19:34



















      • You may not want the File.link step if you're running out of inodes.

        – glenn jackman
        Sep 20 '17 at 19:34

















      You may not want the File.link step if you're running out of inodes.

      – glenn jackman
      Sep 20 '17 at 19:34





      You may not want the File.link step if you're running out of inodes.

      – glenn jackman
      Sep 20 '17 at 19:34











      1














      This is an alternative approach, which might be slow for million of files compared to pure awk solutions.



      Using something like this, you can transpose rows to columns:



      $ cat file1
      1 0 0 0 0 0 0 0 0 0 1 2 1
      0 0 0 0 0 0 0 0 0 0 0 0 0

      $ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')
      1-0
      0-0
      0-0
      0-0
      0-0
      0-0
      0-0
      0-0
      0-0
      0-0
      1-0
      2-0
      1-0


      You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:



      $ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))
      $ echo "$f1"
      1-0
      9-9
      9-9
      9-9
      9-9
      9-9
      9-9
      9-9
      9-9
      9-9
      1-0
      2-0
      1-0


      You can now revert back from columns to rows like:



      $ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")
      1 9 9 9 9 9 9 9 9 9 1 2 1
      0 9 9 9 9 9 9 9 9 9 0 0 0


      And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.



      Only thing left is to loop over all files. Can be done with a kind of bash loop:



      for f in ./wa_filtering_DP15_good_pops_snps_file_*;do
      f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))
      awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...
      done





      share|improve this answer




























        1














        This is an alternative approach, which might be slow for million of files compared to pure awk solutions.



        Using something like this, you can transpose rows to columns:



        $ cat file1
        1 0 0 0 0 0 0 0 0 0 1 2 1
        0 0 0 0 0 0 0 0 0 0 0 0 0

        $ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')
        1-0
        0-0
        0-0
        0-0
        0-0
        0-0
        0-0
        0-0
        0-0
        0-0
        1-0
        2-0
        1-0


        You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:



        $ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))
        $ echo "$f1"
        1-0
        9-9
        9-9
        9-9
        9-9
        9-9
        9-9
        9-9
        9-9
        9-9
        1-0
        2-0
        1-0


        You can now revert back from columns to rows like:



        $ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")
        1 9 9 9 9 9 9 9 9 9 1 2 1
        0 9 9 9 9 9 9 9 9 9 0 0 0


        And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.



        Only thing left is to loop over all files. Can be done with a kind of bash loop:



        for f in ./wa_filtering_DP15_good_pops_snps_file_*;do
        f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))
        awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...
        done





        share|improve this answer


























          1












          1








          1







          This is an alternative approach, which might be slow for million of files compared to pure awk solutions.



          Using something like this, you can transpose rows to columns:



          $ cat file1
          1 0 0 0 0 0 0 0 0 0 1 2 1
          0 0 0 0 0 0 0 0 0 0 0 0 0

          $ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')
          1-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          1-0
          2-0
          1-0


          You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:



          $ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))
          $ echo "$f1"
          1-0
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          1-0
          2-0
          1-0


          You can now revert back from columns to rows like:



          $ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")
          1 9 9 9 9 9 9 9 9 9 1 2 1
          0 9 9 9 9 9 9 9 9 9 0 0 0


          And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.



          Only thing left is to loop over all files. Can be done with a kind of bash loop:



          for f in ./wa_filtering_DP15_good_pops_snps_file_*;do
          f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))
          awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...
          done





          share|improve this answer













          This is an alternative approach, which might be slow for million of files compared to pure awk solutions.



          Using something like this, you can transpose rows to columns:



          $ cat file1
          1 0 0 0 0 0 0 0 0 0 1 2 1
          0 0 0 0 0 0 0 0 0 0 0 0 0

          $ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')
          1-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          0-0
          1-0
          2-0
          1-0


          You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:



          $ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))
          $ echo "$f1"
          1-0
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          9-9
          1-0
          2-0
          1-0


          You can now revert back from columns to rows like:



          $ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")
          1 9 9 9 9 9 9 9 9 9 1 2 1
          0 9 9 9 9 9 9 9 9 9 0 0 0


          And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.



          Only thing left is to loop over all files. Can be done with a kind of bash loop:



          for f in ./wa_filtering_DP15_good_pops_snps_file_*;do
          f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))
          awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...
          done






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Sep 20 '17 at 20:12









          George VasiliouGeorge Vasiliou

          5,83531130




          5,83531130























              1














              With awk:



              NR == 1 {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              NR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }
              }

              END { # output the two constructed lines
              print l1;
              print l2;
              }


              Running it on the example file:



              $ awk -f script.awk file
              1 9 9 9 9 9 9 9 9 9 1 2 1
              0 9 9 9 9 9 9 9 9 9 0 0 0


              Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:



              mkdir modified

              for name in wa_filtering_DP15_good_pops_snps_file_*; do
              awk -f script.awk "$name" >"modified/$name.new"
              done


              This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.




              • I opted for creating new files so that the originals are left unmodified.

              • I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.


              In general, try not to create millions of files in a single directory. Instead either




              1. create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

              2. create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.




              The following variant will be more efficiently run on millions of files:



              FNR == 1    {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              FNR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }

              # create output filename based on input filename
              # and output the two lines
              f = "modified/" FILENAME ".new";
              print l1 >f;
              print l2 >f;
              }


              To run it:



              mkdir modified

              find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*'
              -exec awk -f script.awk {} +


              The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.






              share|improve this answer


























              • I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

                – MiniMax
                Sep 21 '17 at 12:57













              • @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

                – Kusalananda
                Sep 21 '17 at 13:00
















              1














              With awk:



              NR == 1 {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              NR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }
              }

              END { # output the two constructed lines
              print l1;
              print l2;
              }


              Running it on the example file:



              $ awk -f script.awk file
              1 9 9 9 9 9 9 9 9 9 1 2 1
              0 9 9 9 9 9 9 9 9 9 0 0 0


              Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:



              mkdir modified

              for name in wa_filtering_DP15_good_pops_snps_file_*; do
              awk -f script.awk "$name" >"modified/$name.new"
              done


              This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.




              • I opted for creating new files so that the originals are left unmodified.

              • I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.


              In general, try not to create millions of files in a single directory. Instead either




              1. create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

              2. create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.




              The following variant will be more efficiently run on millions of files:



              FNR == 1    {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              FNR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }

              # create output filename based on input filename
              # and output the two lines
              f = "modified/" FILENAME ".new";
              print l1 >f;
              print l2 >f;
              }


              To run it:



              mkdir modified

              find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*'
              -exec awk -f script.awk {} +


              The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.






              share|improve this answer


























              • I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

                – MiniMax
                Sep 21 '17 at 12:57













              • @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

                – Kusalananda
                Sep 21 '17 at 13:00














              1












              1








              1







              With awk:



              NR == 1 {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              NR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }
              }

              END { # output the two constructed lines
              print l1;
              print l2;
              }


              Running it on the example file:



              $ awk -f script.awk file
              1 9 9 9 9 9 9 9 9 9 1 2 1
              0 9 9 9 9 9 9 9 9 9 0 0 0


              Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:



              mkdir modified

              for name in wa_filtering_DP15_good_pops_snps_file_*; do
              awk -f script.awk "$name" >"modified/$name.new"
              done


              This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.




              • I opted for creating new files so that the originals are left unmodified.

              • I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.


              In general, try not to create millions of files in a single directory. Instead either




              1. create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

              2. create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.




              The following variant will be more efficiently run on millions of files:



              FNR == 1    {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              FNR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }

              # create output filename based on input filename
              # and output the two lines
              f = "modified/" FILENAME ".new";
              print l1 >f;
              print l2 >f;
              }


              To run it:



              mkdir modified

              find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*'
              -exec awk -f script.awk {} +


              The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.






              share|improve this answer















              With awk:



              NR == 1 {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              NR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }
              }

              END { # output the two constructed lines
              print l1;
              print l2;
              }


              Running it on the example file:



              $ awk -f script.awk file
              1 9 9 9 9 9 9 9 9 9 1 2 1
              0 9 9 9 9 9 9 9 9 9 0 0 0


              Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:



              mkdir modified

              for name in wa_filtering_DP15_good_pops_snps_file_*; do
              awk -f script.awk "$name" >"modified/$name.new"
              done


              This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.




              • I opted for creating new files so that the originals are left unmodified.

              • I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.


              In general, try not to create millions of files in a single directory. Instead either




              1. create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

              2. create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.




              The following variant will be more efficiently run on millions of files:



              FNR == 1    {   # save the values from 1st line in array t
              split($0, t, FS);
              }

              FNR == 2 { # compare values from second line with those stored in array t
              for ( i = 1; i <= NF; ++i ) {
              # build l1 and l2 (line 1 and line 2) based on comparison
              if ($i == 0 && t[i] == 0) {
              l1 = (i == 1 ? 9 : l1 OFS 9 );
              l2 = (i == 1 ? 9 : l2 OFS 9 );
              } else {
              l1 = (i == 1 ? t[i] : l1 OFS t[i] );
              l2 = (i == 1 ? $i : l2 OFS $i );
              }
              }

              # create output filename based on input filename
              # and output the two lines
              f = "modified/" FILENAME ".new";
              print l1 >f;
              print l2 >f;
              }


              To run it:



              mkdir modified

              find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*'
              -exec awk -f script.awk {} +


              The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Sep 21 '17 at 11:35

























              answered Sep 20 '17 at 18:23









              KusalanandaKusalananda

              142k18266442




              142k18266442













              • I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

                – MiniMax
                Sep 21 '17 at 12:57













              • @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

                – Kusalananda
                Sep 21 '17 at 13:00



















              • I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

                – MiniMax
                Sep 21 '17 at 12:57













              • @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

                – Kusalananda
                Sep 21 '17 at 13:00

















              I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

              – MiniMax
              Sep 21 '17 at 12:57







              I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

              – MiniMax
              Sep 21 '17 at 12:57















              @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

              – Kusalananda
              Sep 21 '17 at 13:00





              @MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

              – Kusalananda
              Sep 21 '17 at 13:00











              0














              First variant:



              For single file:



              datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose


              For many files do the same in the loop:



              for i in *; do datamash -W transpose < "$i" |
              sed 's/0t0/9t9/' |
              datamash transpose > "new_$i"; done


              This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.



              Second variant:



              This is a solution for the single file, for multiple files use loop, as in the previous variant.



              tr 'n' 't' < input.txt |
              awk '{
              num = NF / 2;
              for(up = 1; up <= NF; up++) {
              if(up <= num) {
              low = num + up;
              if(!$up && !$low) {
              $up = 9;
              $low = 9;
              }
              }

              printf "%st", $up;

              if(up % num == 0)
              print "";
              }
              }'


              Explanation





              1. tr 'n' 't' < input.txt - join two lines together.


              2. awk


                • checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.

                • if both elements are 0, it changes them to 9.

                • print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.

                • Each time the element number is a multiple of the number of elements in the line, adds a new line.




              Input



              1   0   0   0   0   0   0   0   0   0   1   2   1
              0 0 0 0 0 0 0 0 0 0 0 0 0


              Output



              1   9   9   9   9   9   9   9   9   9   1   2   1
              0 9 9 9 9 9 9 9 9 9 0 0 0





              share|improve this answer


























              • 'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

                – RonJohn
                Sep 21 '17 at 0:59






              • 1





                @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

                – Kusalananda
                Sep 21 '17 at 6:15











              • But the '*' will expand in the bash command buffer, certainly overflowing it.

                – RonJohn
                Sep 22 '17 at 15:40











              • @RonJohn Here is the answer to your question.

                – MiniMax
                Sep 22 '17 at 17:22
















              0














              First variant:



              For single file:



              datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose


              For many files do the same in the loop:



              for i in *; do datamash -W transpose < "$i" |
              sed 's/0t0/9t9/' |
              datamash transpose > "new_$i"; done


              This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.



              Second variant:



              This is a solution for the single file, for multiple files use loop, as in the previous variant.



              tr 'n' 't' < input.txt |
              awk '{
              num = NF / 2;
              for(up = 1; up <= NF; up++) {
              if(up <= num) {
              low = num + up;
              if(!$up && !$low) {
              $up = 9;
              $low = 9;
              }
              }

              printf "%st", $up;

              if(up % num == 0)
              print "";
              }
              }'


              Explanation





              1. tr 'n' 't' < input.txt - join two lines together.


              2. awk


                • checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.

                • if both elements are 0, it changes them to 9.

                • print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.

                • Each time the element number is a multiple of the number of elements in the line, adds a new line.




              Input



              1   0   0   0   0   0   0   0   0   0   1   2   1
              0 0 0 0 0 0 0 0 0 0 0 0 0


              Output



              1   9   9   9   9   9   9   9   9   9   1   2   1
              0 9 9 9 9 9 9 9 9 9 0 0 0





              share|improve this answer


























              • 'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

                – RonJohn
                Sep 21 '17 at 0:59






              • 1





                @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

                – Kusalananda
                Sep 21 '17 at 6:15











              • But the '*' will expand in the bash command buffer, certainly overflowing it.

                – RonJohn
                Sep 22 '17 at 15:40











              • @RonJohn Here is the answer to your question.

                – MiniMax
                Sep 22 '17 at 17:22














              0












              0








              0







              First variant:



              For single file:



              datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose


              For many files do the same in the loop:



              for i in *; do datamash -W transpose < "$i" |
              sed 's/0t0/9t9/' |
              datamash transpose > "new_$i"; done


              This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.



              Second variant:



              This is a solution for the single file, for multiple files use loop, as in the previous variant.



              tr 'n' 't' < input.txt |
              awk '{
              num = NF / 2;
              for(up = 1; up <= NF; up++) {
              if(up <= num) {
              low = num + up;
              if(!$up && !$low) {
              $up = 9;
              $low = 9;
              }
              }

              printf "%st", $up;

              if(up % num == 0)
              print "";
              }
              }'


              Explanation





              1. tr 'n' 't' < input.txt - join two lines together.


              2. awk


                • checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.

                • if both elements are 0, it changes them to 9.

                • print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.

                • Each time the element number is a multiple of the number of elements in the line, adds a new line.




              Input



              1   0   0   0   0   0   0   0   0   0   1   2   1
              0 0 0 0 0 0 0 0 0 0 0 0 0


              Output



              1   9   9   9   9   9   9   9   9   9   1   2   1
              0 9 9 9 9 9 9 9 9 9 0 0 0





              share|improve this answer















              First variant:



              For single file:



              datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose


              For many files do the same in the loop:



              for i in *; do datamash -W transpose < "$i" |
              sed 's/0t0/9t9/' |
              datamash transpose > "new_$i"; done


              This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.



              Second variant:



              This is a solution for the single file, for multiple files use loop, as in the previous variant.



              tr 'n' 't' < input.txt |
              awk '{
              num = NF / 2;
              for(up = 1; up <= NF; up++) {
              if(up <= num) {
              low = num + up;
              if(!$up && !$low) {
              $up = 9;
              $low = 9;
              }
              }

              printf "%st", $up;

              if(up % num == 0)
              print "";
              }
              }'


              Explanation





              1. tr 'n' 't' < input.txt - join two lines together.


              2. awk


                • checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.

                • if both elements are 0, it changes them to 9.

                • print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.

                • Each time the element number is a multiple of the number of elements in the line, adds a new line.




              Input



              1   0   0   0   0   0   0   0   0   0   1   2   1
              0 0 0 0 0 0 0 0 0 0 0 0 0


              Output



              1   9   9   9   9   9   9   9   9   9   1   2   1
              0 9 9 9 9 9 9 9 9 9 0 0 0






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Sep 21 '17 at 21:10

























              answered Sep 21 '17 at 0:02









              MiniMaxMiniMax

              2,831819




              2,831819













              • 'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

                – RonJohn
                Sep 21 '17 at 0:59






              • 1





                @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

                – Kusalananda
                Sep 21 '17 at 6:15











              • But the '*' will expand in the bash command buffer, certainly overflowing it.

                – RonJohn
                Sep 22 '17 at 15:40











              • @RonJohn Here is the answer to your question.

                – MiniMax
                Sep 22 '17 at 17:22



















              • 'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

                – RonJohn
                Sep 21 '17 at 0:59






              • 1





                @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

                – Kusalananda
                Sep 21 '17 at 6:15











              • But the '*' will expand in the bash command buffer, certainly overflowing it.

                – RonJohn
                Sep 22 '17 at 15:40











              • @RonJohn Here is the answer to your question.

                – MiniMax
                Sep 22 '17 at 17:22

















              'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

              – RonJohn
              Sep 21 '17 at 0:59





              'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

              – RonJohn
              Sep 21 '17 at 0:59




              1




              1





              @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

              – Kusalananda
              Sep 21 '17 at 6:15





              @RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

              – Kusalananda
              Sep 21 '17 at 6:15













              But the '*' will expand in the bash command buffer, certainly overflowing it.

              – RonJohn
              Sep 22 '17 at 15:40





              But the '*' will expand in the bash command buffer, certainly overflowing it.

              – RonJohn
              Sep 22 '17 at 15:40













              @RonJohn Here is the answer to your question.

              – MiniMax
              Sep 22 '17 at 17:22





              @RonJohn Here is the answer to your question.

              – MiniMax
              Sep 22 '17 at 17:22











              0














              Probably not efficient enough for 11 million files but it's a different approach on the substitution. Takes one argument on the command line; the name of the directory where all of the files are stored. The name for the directory could be hard coded instead (see notes in code). The base name for the file is already hard coded without the number at the end (not required).



              #!/bin/bash

              # compare two rows in a file
              # when both are 0, change both to 9
              # otherwise keep original value

              ProgName=${0##*/}
              Pid=$$
              DBG_FNAME=""
              scriptUsage() {
              cat <<ENDUSE

              $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]

              path/to/directory: Path to directory (NO trailing '/')
              -f|--filename: Print the each file name to stdout after complete
              -d|--debug: Run in debug mode (Implies filename option - SEE NOTE*)
              -h|--help: Print this help message

              NOTE: USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]
              You DO NOT need both together!

              ENDUSE
              }

              # check args
              #!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'
              [[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }
              [[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }
              [[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }
              if (( $# > 2 ))
              then
              DBG_FNAME=1
              >&2 echo "Running in debug mode from using ${2} & ${3} together!"
              echo "PID is: $Pid"
              sleep 2
              set -x
              else
              [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1
              [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }
              fi
              #!!# to here #!!#
              # directory as arg[1] or change to hardcoded
              WorkDir="$1"

              # check for/remove trailing slash
              [[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}

              # given file root withOUT number ending
              WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"


              ##== MAIN LOOP
              for file in ${WorkFile}*
              do
              # reset these after each file
              TopRow=""
              BotRow=""
              NewTop=""
              NewBot=""
              SKIPME=""

              # get top row of file
              TopRow=$(sed -n '1{p;q}' $file)
              # get bottom row of file
              BotRow=$(sed -n '2{p;q}' $file)

              ##-- EACH FILE LOOP
              for (( f=0; f<${#TopRow}; f++ ))
              do
              if [[ -n $SKIPME ]]
              then
              # SKIPME is -z by default so
              # this runs every other time through
              NewTop="${NewTop} "
              NewBot="${NewBot} "
              SKIPME=""
              elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))
              then
              # 0+0=0 so change to 9
              NewTop="${NewTop}9"
              NewBot="${NewBot}9"
              SKIPME=1
              else
              # (1+0 or 0+1)!=0 so keep originals
              NewTop="${NewTop}${TopRow:${f}:1}"
              NewBot="${NewBot}${BotRow:${f}:1}"
              SKIPME=1
              fi
              done
              ##--

              # overwrite original file
              printf "%sn%s" "$NewTop" "$NewBot" > $file

              # if -f|--filename given print file name
              [[ -n $DBG_FNAME ]] && echo "$file is complete"
              done
              ##==


              DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.






              share|improve this answer






























                0














                Probably not efficient enough for 11 million files but it's a different approach on the substitution. Takes one argument on the command line; the name of the directory where all of the files are stored. The name for the directory could be hard coded instead (see notes in code). The base name for the file is already hard coded without the number at the end (not required).



                #!/bin/bash

                # compare two rows in a file
                # when both are 0, change both to 9
                # otherwise keep original value

                ProgName=${0##*/}
                Pid=$$
                DBG_FNAME=""
                scriptUsage() {
                cat <<ENDUSE

                $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]

                path/to/directory: Path to directory (NO trailing '/')
                -f|--filename: Print the each file name to stdout after complete
                -d|--debug: Run in debug mode (Implies filename option - SEE NOTE*)
                -h|--help: Print this help message

                NOTE: USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]
                You DO NOT need both together!

                ENDUSE
                }

                # check args
                #!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'
                [[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }
                [[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }
                [[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }
                if (( $# > 2 ))
                then
                DBG_FNAME=1
                >&2 echo "Running in debug mode from using ${2} & ${3} together!"
                echo "PID is: $Pid"
                sleep 2
                set -x
                else
                [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1
                [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }
                fi
                #!!# to here #!!#
                # directory as arg[1] or change to hardcoded
                WorkDir="$1"

                # check for/remove trailing slash
                [[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}

                # given file root withOUT number ending
                WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"


                ##== MAIN LOOP
                for file in ${WorkFile}*
                do
                # reset these after each file
                TopRow=""
                BotRow=""
                NewTop=""
                NewBot=""
                SKIPME=""

                # get top row of file
                TopRow=$(sed -n '1{p;q}' $file)
                # get bottom row of file
                BotRow=$(sed -n '2{p;q}' $file)

                ##-- EACH FILE LOOP
                for (( f=0; f<${#TopRow}; f++ ))
                do
                if [[ -n $SKIPME ]]
                then
                # SKIPME is -z by default so
                # this runs every other time through
                NewTop="${NewTop} "
                NewBot="${NewBot} "
                SKIPME=""
                elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))
                then
                # 0+0=0 so change to 9
                NewTop="${NewTop}9"
                NewBot="${NewBot}9"
                SKIPME=1
                else
                # (1+0 or 0+1)!=0 so keep originals
                NewTop="${NewTop}${TopRow:${f}:1}"
                NewBot="${NewBot}${BotRow:${f}:1}"
                SKIPME=1
                fi
                done
                ##--

                # overwrite original file
                printf "%sn%s" "$NewTop" "$NewBot" > $file

                # if -f|--filename given print file name
                [[ -n $DBG_FNAME ]] && echo "$file is complete"
                done
                ##==


                DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.






                share|improve this answer




























                  0












                  0








                  0







                  Probably not efficient enough for 11 million files but it's a different approach on the substitution. Takes one argument on the command line; the name of the directory where all of the files are stored. The name for the directory could be hard coded instead (see notes in code). The base name for the file is already hard coded without the number at the end (not required).



                  #!/bin/bash

                  # compare two rows in a file
                  # when both are 0, change both to 9
                  # otherwise keep original value

                  ProgName=${0##*/}
                  Pid=$$
                  DBG_FNAME=""
                  scriptUsage() {
                  cat <<ENDUSE

                  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]

                  path/to/directory: Path to directory (NO trailing '/')
                  -f|--filename: Print the each file name to stdout after complete
                  -d|--debug: Run in debug mode (Implies filename option - SEE NOTE*)
                  -h|--help: Print this help message

                  NOTE: USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]
                  You DO NOT need both together!

                  ENDUSE
                  }

                  # check args
                  #!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'
                  [[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }
                  [[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }
                  [[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }
                  if (( $# > 2 ))
                  then
                  DBG_FNAME=1
                  >&2 echo "Running in debug mode from using ${2} & ${3} together!"
                  echo "PID is: $Pid"
                  sleep 2
                  set -x
                  else
                  [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1
                  [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }
                  fi
                  #!!# to here #!!#
                  # directory as arg[1] or change to hardcoded
                  WorkDir="$1"

                  # check for/remove trailing slash
                  [[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}

                  # given file root withOUT number ending
                  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"


                  ##== MAIN LOOP
                  for file in ${WorkFile}*
                  do
                  # reset these after each file
                  TopRow=""
                  BotRow=""
                  NewTop=""
                  NewBot=""
                  SKIPME=""

                  # get top row of file
                  TopRow=$(sed -n '1{p;q}' $file)
                  # get bottom row of file
                  BotRow=$(sed -n '2{p;q}' $file)

                  ##-- EACH FILE LOOP
                  for (( f=0; f<${#TopRow}; f++ ))
                  do
                  if [[ -n $SKIPME ]]
                  then
                  # SKIPME is -z by default so
                  # this runs every other time through
                  NewTop="${NewTop} "
                  NewBot="${NewBot} "
                  SKIPME=""
                  elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))
                  then
                  # 0+0=0 so change to 9
                  NewTop="${NewTop}9"
                  NewBot="${NewBot}9"
                  SKIPME=1
                  else
                  # (1+0 or 0+1)!=0 so keep originals
                  NewTop="${NewTop}${TopRow:${f}:1}"
                  NewBot="${NewBot}${BotRow:${f}:1}"
                  SKIPME=1
                  fi
                  done
                  ##--

                  # overwrite original file
                  printf "%sn%s" "$NewTop" "$NewBot" > $file

                  # if -f|--filename given print file name
                  [[ -n $DBG_FNAME ]] && echo "$file is complete"
                  done
                  ##==


                  DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.






                  share|improve this answer















                  Probably not efficient enough for 11 million files but it's a different approach on the substitution. Takes one argument on the command line; the name of the directory where all of the files are stored. The name for the directory could be hard coded instead (see notes in code). The base name for the file is already hard coded without the number at the end (not required).



                  #!/bin/bash

                  # compare two rows in a file
                  # when both are 0, change both to 9
                  # otherwise keep original value

                  ProgName=${0##*/}
                  Pid=$$
                  DBG_FNAME=""
                  scriptUsage() {
                  cat <<ENDUSE

                  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]

                  path/to/directory: Path to directory (NO trailing '/')
                  -f|--filename: Print the each file name to stdout after complete
                  -d|--debug: Run in debug mode (Implies filename option - SEE NOTE*)
                  -h|--help: Print this help message

                  NOTE: USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]
                  You DO NOT need both together!

                  ENDUSE
                  }

                  # check args
                  #!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'
                  [[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }
                  [[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }
                  [[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }
                  if (( $# > 2 ))
                  then
                  DBG_FNAME=1
                  >&2 echo "Running in debug mode from using ${2} & ${3} together!"
                  echo "PID is: $Pid"
                  sleep 2
                  set -x
                  else
                  [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1
                  [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }
                  fi
                  #!!# to here #!!#
                  # directory as arg[1] or change to hardcoded
                  WorkDir="$1"

                  # check for/remove trailing slash
                  [[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}

                  # given file root withOUT number ending
                  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"


                  ##== MAIN LOOP
                  for file in ${WorkFile}*
                  do
                  # reset these after each file
                  TopRow=""
                  BotRow=""
                  NewTop=""
                  NewBot=""
                  SKIPME=""

                  # get top row of file
                  TopRow=$(sed -n '1{p;q}' $file)
                  # get bottom row of file
                  BotRow=$(sed -n '2{p;q}' $file)

                  ##-- EACH FILE LOOP
                  for (( f=0; f<${#TopRow}; f++ ))
                  do
                  if [[ -n $SKIPME ]]
                  then
                  # SKIPME is -z by default so
                  # this runs every other time through
                  NewTop="${NewTop} "
                  NewBot="${NewBot} "
                  SKIPME=""
                  elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))
                  then
                  # 0+0=0 so change to 9
                  NewTop="${NewTop}9"
                  NewBot="${NewBot}9"
                  SKIPME=1
                  else
                  # (1+0 or 0+1)!=0 so keep originals
                  NewTop="${NewTop}${TopRow:${f}:1}"
                  NewBot="${NewBot}${BotRow:${f}:1}"
                  SKIPME=1
                  fi
                  done
                  ##--

                  # overwrite original file
                  printf "%sn%s" "$NewTop" "$NewBot" > $file

                  # if -f|--filename given print file name
                  [[ -n $DBG_FNAME ]] && echo "$file is complete"
                  done
                  ##==


                  DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 6 hours ago









                  Rui F Ribeiro

                  42.1k1484142




                  42.1k1484142










                  answered Oct 1 '17 at 2:47









                  EnterUserNameHereEnterUserNameHere

                  10818




                  10818






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393455%2fconditional-replacing-rows-with-a-number%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Hudson River Historic District Contents Geography History The district today Aesthetics Cultural...

                      The number designs the writing. Feandra Aversely Definition: The act of ingrafting a sprig or shoot of one...

                      Ayherre Geografie Demografie Externe links Navigatiemenu43° 23′ NB, 1° 15′ WL43° 23′ NB, 1°...