conditional replacing rows with a number Announcing the arrival of Valued Associate #679:...

Is it fair for a professor to grade us on the possession of past papers?

Why are there no cargo aircraft with "flying wing" design?

What does the "x" in "x86" represent?

For a new assistant professor in CS, how to build/manage a publication pipeline

If my PI received research grants from a company to be able to pay my postdoc salary, did I have a potential conflict interest too?

What is the meaning of the simile “quick as silk”?

Can anything be seen from the center of the Boötes void? How dark would it be?

Extracting terms with certain heads in a function

How do I stop a creek from eroding my steep embankment?

Why do we bend a book to keep it straight?

Can a party unilaterally change candidates in preparation for a General election?

Using audio cues to encourage good posture

Is it common practice to audition new musicians 1-2-1 before rehearsing with the entire band?

Significance of Cersei's obsession with elephants?

Irreducible of finite Krull dimension implies quasi-compact?

Why are the trig functions versine, haversine, exsecant, etc, rarely used in modern mathematics?

Why didn't Eitri join the fight?

When the Haste spell ends on a creature, do attackers have advantage against that creature?

What causes the direction of lightning flashes?

2001: A Space Odyssey's use of the song "Daisy Bell" (Bicycle Built for Two); life imitates art or vice-versa?

What is homebrew?

First console to have temporary backward compatibility

Is it cost-effective to upgrade an old-ish Giant Escape R3 commuter bike with entry-level branded parts (wheels, drivetrain)?

Do I really need to have a message in a novel to appeal to readers?

conditional replacing rows with a number

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)

2019 Community Moderator Election Results

Why I closed the “Why is Kali so hard” questionbash scripts - remove duplicate rows with smaller valuePerl one-liner for replacing values greater than a threshholdsort CSV by number of column in rows?How to select rows based on how many consecutive times a number is present in a column?How to calculate the average number of columns across the rows as well as the maximum numbers of columns in a file in unix?How to split rows in a huge data file based on number of column within them in linux ?How to join rows with single columns to a maximum of 4 columns in one row?How to get count of unique rows in a file?replacing values in one with the values in another fileextract columns from TRUE/FALSE matrix based on proportion of TRUE values within the column

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have a directory containing nearly 11 million small files: like this

wa_filtering_DP15_good_pops_snps_file_1

wa_filtering_DP15_good_pops_snps_file_2

.

.

.

wa_filtering_DP15_good_pops_snps_file_11232111

and each file has only 2 rows and 315 columns looks like this:

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0

I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:

1   9   9   9   9   9   9   9   9   9   1   2   1   

0   9   9   9   9   9   9   9   9   9   0   0   0

Can someone help me out to figure out how to do that?
Thanks

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

asked Sep 20 '17 at 17:29

Anna1364

456214

With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33

I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08

add a comment |

I have a directory containing nearly 11 million small files: like this

wa_filtering_DP15_good_pops_snps_file_1

wa_filtering_DP15_good_pops_snps_file_2

.

.

.

wa_filtering_DP15_good_pops_snps_file_11232111

and each file has only 2 rows and 315 columns looks like this:

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0

I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:

1   9   9   9   9   9   9   9   9   9   1   2   1   

0   9   9   9   9   9   9   9   9   9   0   0   0

Can someone help me out to figure out how to do that?
Thanks

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

asked Sep 20 '17 at 17:29

Anna1364

456214

With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33

I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08

add a comment |

I have a directory containing nearly 11 million small files: like this

wa_filtering_DP15_good_pops_snps_file_1

wa_filtering_DP15_good_pops_snps_file_2

.

.

.

wa_filtering_DP15_good_pops_snps_file_11232111

and each file has only 2 rows and 315 columns looks like this:

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0

I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:

1   9   9   9   9   9   9   9   9   9   1   2   1   

0   9   9   9   9   9   9   9   9   9   0   0   0

Can someone help me out to figure out how to do that?
Thanks

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

asked Sep 20 '17 at 17:29

Anna1364

456214

I have a directory containing nearly 11 million small files: like this

wa_filtering_DP15_good_pops_snps_file_1

wa_filtering_DP15_good_pops_snps_file_2

.

.

.

wa_filtering_DP15_good_pops_snps_file_11232111

and each file has only 2 rows and 315 columns looks like this:

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0

I want to go through each file and if in each column both rows have 0 values replace them with 9 and get something like this:

1   9   9   9   9   9   9   9   9   9   1   2   1   

0   9   9   9   9   9   9   9   9   9   0   0   0

Can someone help me out to figure out how to do that?
Thanks

text-processing

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

asked Sep 20 '17 at 17:29

Anna1364

456214

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

asked Sep 20 '17 at 17:29

Anna1364

456214

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

edited Sep 20 '17 at 18:03

Jeff Schaller♦

45.1k1164147

asked Sep 20 '17 at 17:29

Anna1364

456214

asked Sep 20 '17 at 17:29

Anna1364

456214

asked Sep 20 '17 at 17:29

Anna1364

456214

With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33

I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08

add a comment |

With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33

I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08

With millions of small files, you're at risk of running out of inodes. Check with df /path/to/files versus df -i /path/to/files

– glenn jackman
Sep 20 '17 at 19:33

I have a suspicion you would be better off rearchitecting, perhaps just to set up a database, but there's not enough information here to diagnose the real situation. ;) Good luck.

– Wildcard
Sep 20 '17 at 21:08

add a comment |

6 Answers
6

active

oldest

votes

Here is awk solution.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile

Explanations:

split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.

getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.

for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.

for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.

To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"

}' wa_filtering_DP15_good_pops_snps_file_{1..11232111}

edited Sep 28 '17 at 16:49

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

– Kusalananda♦
Sep 20 '17 at 18:24

this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

– αғsнιη
Sep 20 '17 at 18:25

add a comment |

For kicks, here's Ruby

ruby -e '

    data = File.readlines(ARGV.shift)

               .map {|line| line.split.map(&:to_i)}

               .transpose

               .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

               .transpose

               .each {|row| puts row.join(" ")}

' file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

To replace all the files:

ruby -e '

    require "tempfile"

    require "pathname"

    Pathname.new("/path/to/your/files/").each_child do |pathname|

        next unless pathname.file?

        temp = Tempfile.new(pathname.basename.to_s)

        filename = pathname.to_s

        File.readlines(filename)

            .map {|line| line.split.map(&:to_i)}

            .transpose

            .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

            .transpose

            .each {|row| temp.puts row.join(" ")}

        temp.close

        File.link filename, filename+".bak"

        File.rename temp.path, filename

    end

'

edited Sep 20 '17 at 19:28

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

You may not want the File.link step if you're running out of inodes.

– glenn jackman
Sep 20 '17 at 19:34

add a comment |

This is an alternative approach, which might be slow for million of files compared to pure awk solutions.

Using something like this, you can transpose rows to columns:

$ cat file1

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0



$ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')

1-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

1-0

2-0

1-0

You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:

$ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))

$ echo "$f1"

1-0

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

1-0

2-0

1-0

You can now revert back from columns to rows like:

$ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")

1 9 9 9 9 9 9 9 9 9 1 2 1  

0 9 9 9 9 9 9 9 9 9 0 0 0

And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.

Only thing left is to loop over all files. Can be done with a kind of bash loop:

for f in ./wa_filtering_DP15_good_pops_snps_file_*;do

  f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))

  awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...

done

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

add a comment |

With awk:

NR == 1 {   # save the values from 1st line in array t

            split($0, t, FS);

        }



NR == 2 {   # compare values from second line with those stored in array t

            for ( i = 1; i <= NF; ++i ) {

                # build l1 and l2 (line 1 and line 2) based on comparison

                if ($i == 0 && t[i] == 0) {

                    l1 = (i == 1 ? 9    : l1 OFS 9    );

                    l2 = (i == 1 ? 9    : l2 OFS 9    );

                } else {

                    l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                    l2 = (i == 1 ? $i   : l2 OFS $i   );

                }

            }

        }



END     {   # output the two constructed lines

            print l1;

            print l2;

        }

Running it on the example file:

$ awk -f script.awk file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:

mkdir modified



for name in wa_filtering_DP15_good_pops_snps_file_*; do

    awk -f script.awk "$name" >"modified/$name.new"

done

This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.

I opted for creating new files so that the originals are left unmodified.

I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.

In general, try not to create millions of files in a single directory. Instead either

create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.

The following variant will be more efficiently run on millions of files:

FNR == 1    {   # save the values from 1st line in array t

                split($0, t, FS);

            }



FNR == 2    {   # compare values from second line with those stored in array t

                for ( i = 1; i <= NF; ++i ) {

                    # build l1 and l2 (line 1 and line 2) based on comparison

                    if ($i == 0 && t[i] == 0) {

                        l1 = (i == 1 ? 9    : l1 OFS 9    );

                        l2 = (i == 1 ? 9    : l2 OFS 9    );

                    } else {

                        l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                        l2 = (i == 1 ? $i   : l2 OFS $i   );

                    }

                }



                # create output filename based on input filename

                # and output the two lines

                f = "modified/" FILENAME ".new";

                print l1 >f;

                print l2 >f;

            }

To run it:

mkdir modified



find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*' 

    -exec awk -f script.awk {} +

The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.

edited Sep 21 '17 at 11:35

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

– MiniMax
Sep 21 '17 at 12:57

@MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

– Kusalananda♦
Sep 21 '17 at 13:00

add a comment |

First variant:

For single file:

datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose

For many files do the same in the loop:

for i in *; do datamash -W transpose < "$i" |

sed 's/0t0/9t9/' |

datamash transpose > "new_$i"; done

This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.

Second variant:

This is a solution for the single file, for multiple files use loop, as in the previous variant.

tr 'n' 't' < input.txt |

awk '{

    num = NF / 2;

    for(up = 1; up <= NF; up++) {

        if(up <= num) {

            low = num + up;

            if(!$up && !$low) {

                $up = 9;    

                $low = 9;

            }

        }



        printf "%st", $up;



        if(up % num == 0) 

            print "";

    }

}'

Explanation

tr 'n' 't' < input.txt - join two lines together.

awk
- checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.
- if both elements are 0, it changes them to 9.
- print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.
- Each time the element number is a multiple of the number of elements in the line, adds a new line.

Input

1   0   0   0   0   0   0   0   0   0   1   2   1

0   0   0   0   0   0   0   0   0   0   0   0   0

Output

1   9   9   9   9   9   9   9   9   9   1   2   1

0   9   9   9   9   9   9   9   9   9   0   0   0

edited Sep 21 '17 at 21:10

answered Sep 21 '17 at 0:02

MiniMax

2,831819

'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

– RonJohn
Sep 21 '17 at 0:59

1

@RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

– Kusalananda♦
Sep 21 '17 at 6:15

But the '*' will expand in the bash command buffer, certainly overflowing it.

– RonJohn
Sep 22 '17 at 15:40

@RonJohn Here is the answer to your question.

– MiniMax
Sep 22 '17 at 17:22

add a comment |

Probably not efficient enough for 11 million files but it's a different approach on the substitution. Takes one argument on the command line; the name of the directory where all of the files are stored. The name for the directory could be hard coded instead (see notes in code). The base name for the file is already hard coded without the number at the end (not required).

#!/bin/bash



# compare two rows in a file

# when both are 0, change both to 9

# otherwise keep original value



ProgName=${0##*/}

Pid=$$

DBG_FNAME=""

scriptUsage() {

cat <<ENDUSE



  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]



  path/to/directory:    Path to directory (NO trailing '/')

  -f|--filename:        Print the each file name to stdout after complete

  -d|--debug:           Run in debug mode (Implies filename option - SEE NOTE*)

  -h|--help:            Print this help message



  NOTE:  USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]

         You DO NOT need both together!



ENDUSE

}



# check args

#!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'

[[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }

[[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }

[[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }

if (( $# > 2 ))

  then

    DBG_FNAME=1

    >&2 echo "Running in debug mode from using ${2} & ${3} together!"

    echo "PID is: $Pid"

    sleep 2

    set -x

  else

    [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1

    [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }

fi

#!!# to here #!!#

# directory as arg[1] or change to hardcoded

  WorkDir="$1"



# check for/remove trailing slash

[[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}



# given file root withOUT number ending

  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"





##== MAIN LOOP

for file in ${WorkFile}*

  do

    # reset these after each file

    TopRow=""

    BotRow=""

    NewTop=""

    NewBot=""

    SKIPME=""



    # get top row of file

    TopRow=$(sed -n '1{p;q}' $file)

    # get bottom row of file

    BotRow=$(sed -n '2{p;q}' $file)



    ##-- EACH FILE LOOP

    for (( f=0; f<${#TopRow}; f++ ))

      do

        if [[ -n $SKIPME ]]

          then

            # SKIPME is -z by default so

            # this runs every other time through

            NewTop="${NewTop} "

            NewBot="${NewBot} "

            SKIPME=""

        elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))

          then

            # 0+0=0 so change to 9

            NewTop="${NewTop}9"

            NewBot="${NewBot}9"

            SKIPME=1

        else

            # (1+0 or 0+1)!=0 so keep originals

            NewTop="${NewTop}${TopRow:${f}:1}"

            NewBot="${NewBot}${BotRow:${f}:1}"

            SKIPME=1

        fi

    done

    ##--



    # overwrite original file

    printf "%sn%s" "$NewTop" "$NewBot" > $file



    # if -f|--filename given print file name

    [[ -n $DBG_FNAME ]] && echo "$file is complete"

done

##==

DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f393455%2fconditional-replacing-rows-with-a-number%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Here is awk solution.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile

Explanations:

split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.

getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.

for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.

for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.

To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"

}' wa_filtering_DP15_good_pops_snps_file_{1..11232111}

edited Sep 28 '17 at 16:49

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

– Kusalananda♦
Sep 20 '17 at 18:24

this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

– αғsнιη
Sep 20 '17 at 18:25

add a comment |

Here is awk solution.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile

Explanations:

split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.

getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.

for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.

for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.

To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"

}' wa_filtering_DP15_good_pops_snps_file_{1..11232111}

edited Sep 28 '17 at 16:49

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

– Kusalananda♦
Sep 20 '17 at 18:24

this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

– αғsнιη
Sep 20 '17 at 18:25

add a comment |

Here is awk solution.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile

Explanations:

split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.

getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.

for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.

for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.

To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"

}' wa_filtering_DP15_good_pops_snps_file_{1..11232111}

edited Sep 28 '17 at 16:49

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

Here is awk solution.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z]); printf"n"}' infile

Explanations:

split($0,ary1,/[ ]+/);: reads and splits the first line into an array ary1 with one-or-more spaces delimiters between.

getline x; split(x,ary2,/[ ]+/);: reads the second line into variable x and split it into array ary2.

for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}}: loop in array ary1 for each index in i if sum of both fields value were zero (!(0)will trigger if(1) as true condition) then set both fields value to 9.

for (r=1;r<=NF;r++) printf ("%d ", ary1[r]); printf"n";: Now print final values of each array ary1 and in next line ary2.

To apply on all ~11 million files, just save changes in FILENAME.out format where FILENAME indicate current input fileName reading by awk.

awk '{split($0,ary1,/[ ]+/); getline x; split(x,ary2,/[ ]+/); 

    for (i in ary1)if (!(ary1[i]+ary2[i])){ary1[i]=ary2[i]=9}} 

END{for (r=1;r<=NF;r++) printf ("%d ", ary1[r])>FILENAME".out"; printf"n">FILENAME".out"; 

    for (z=1;z<=NF;z++) printf ("%d ", ary2[z])>FILENAME".out"

}' wa_filtering_DP15_good_pops_snps_file_{1..11232111}

edited Sep 28 '17 at 16:49

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

edited Sep 28 '17 at 16:49

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

answered Sep 20 '17 at 18:11

αғsнιη

17.4k103070

If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

– Kusalananda♦
Sep 20 '17 at 18:24

this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

– αғsнιη
Sep 20 '17 at 18:25

add a comment |

If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

– Kusalananda♦
Sep 20 '17 at 18:24

this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

– αғsнιη
Sep 20 '17 at 18:25

If any number can be negative, then the sum of two numbers may be zero without any of them being zero...

– Kusalananda♦
Sep 20 '17 at 18:24

this one OP not mentioned yet, will update once he confirmed if has negative values, just simply change condition to ary1[i]==0 && ary2[i]==0 : )

– αғsнιη
Sep 20 '17 at 18:25

add a comment |

For kicks, here's Ruby

ruby -e '

    data = File.readlines(ARGV.shift)

               .map {|line| line.split.map(&:to_i)}

               .transpose

               .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

               .transpose

               .each {|row| puts row.join(" ")}

' file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

To replace all the files:

ruby -e '

    require "tempfile"

    require "pathname"

    Pathname.new("/path/to/your/files/").each_child do |pathname|

        next unless pathname.file?

        temp = Tempfile.new(pathname.basename.to_s)

        filename = pathname.to_s

        File.readlines(filename)

            .map {|line| line.split.map(&:to_i)}

            .transpose

            .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

            .transpose

            .each {|row| temp.puts row.join(" ")}

        temp.close

        File.link filename, filename+".bak"

        File.rename temp.path, filename

    end

'

edited Sep 20 '17 at 19:28

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

You may not want the File.link step if you're running out of inodes.

– glenn jackman
Sep 20 '17 at 19:34

add a comment |

For kicks, here's Ruby

ruby -e '

    data = File.readlines(ARGV.shift)

               .map {|line| line.split.map(&:to_i)}

               .transpose

               .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

               .transpose

               .each {|row| puts row.join(" ")}

' file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

To replace all the files:

ruby -e '

    require "tempfile"

    require "pathname"

    Pathname.new("/path/to/your/files/").each_child do |pathname|

        next unless pathname.file?

        temp = Tempfile.new(pathname.basename.to_s)

        filename = pathname.to_s

        File.readlines(filename)

            .map {|line| line.split.map(&:to_i)}

            .transpose

            .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

            .transpose

            .each {|row| temp.puts row.join(" ")}

        temp.close

        File.link filename, filename+".bak"

        File.rename temp.path, filename

    end

'

edited Sep 20 '17 at 19:28

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

You may not want the File.link step if you're running out of inodes.

– glenn jackman
Sep 20 '17 at 19:34

add a comment |

For kicks, here's Ruby

ruby -e '

    data = File.readlines(ARGV.shift)

               .map {|line| line.split.map(&:to_i)}

               .transpose

               .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

               .transpose

               .each {|row| puts row.join(" ")}

' file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

To replace all the files:

ruby -e '

    require "tempfile"

    require "pathname"

    Pathname.new("/path/to/your/files/").each_child do |pathname|

        next unless pathname.file?

        temp = Tempfile.new(pathname.basename.to_s)

        filename = pathname.to_s

        File.readlines(filename)

            .map {|line| line.split.map(&:to_i)}

            .transpose

            .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

            .transpose

            .each {|row| temp.puts row.join(" ")}

        temp.close

        File.link filename, filename+".bak"

        File.rename temp.path, filename

    end

'

edited Sep 20 '17 at 19:28

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

For kicks, here's Ruby

ruby -e '

    data = File.readlines(ARGV.shift)

               .map {|line| line.split.map(&:to_i)}

               .transpose

               .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

               .transpose

               .each {|row| puts row.join(" ")}

' file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

To replace all the files:

ruby -e '

    require "tempfile"

    require "pathname"

    Pathname.new("/path/to/your/files/").each_child do |pathname|

        next unless pathname.file?

        temp = Tempfile.new(pathname.basename.to_s)

        filename = pathname.to_s

        File.readlines(filename)

            .map {|line| line.split.map(&:to_i)}

            .transpose

            .map {|(a,b)| (a==0 && b==0) ? [9,9] : [a,b]}

            .transpose

            .each {|row| temp.puts row.join(" ")}

        temp.close

        File.link filename, filename+".bak"

        File.rename temp.path, filename

    end

'

edited Sep 20 '17 at 19:28

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

edited Sep 20 '17 at 19:28

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

answered Sep 20 '17 at 18:37

glenn jackman

53.1k573114

You may not want the File.link step if you're running out of inodes.

– glenn jackman
Sep 20 '17 at 19:34

add a comment |

You may not want the File.link step if you're running out of inodes.

– glenn jackman
Sep 20 '17 at 19:34

You may not want the File.link step if you're running out of inodes.

– glenn jackman
Sep 20 '17 at 19:34

add a comment |

This is an alternative approach, which might be slow for million of files compared to pure awk solutions.

Using something like this, you can transpose rows to columns:

$ cat file1

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0



$ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')

1-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

1-0

2-0

1-0

You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:

$ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))

$ echo "$f1"

1-0

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

1-0

2-0

1-0

You can now revert back from columns to rows like:

$ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")

1 9 9 9 9 9 9 9 9 9 1 2 1  

0 9 9 9 9 9 9 9 9 9 0 0 0

And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.

Only thing left is to loop over all files. Can be done with a kind of bash loop:

for f in ./wa_filtering_DP15_good_pops_snps_file_*;do

  f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))

  awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...

done

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

add a comment |

This is an alternative approach, which might be slow for million of files compared to pure awk solutions.

Using something like this, you can transpose rows to columns:

$ cat file1

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0



$ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')

1-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

1-0

2-0

1-0

You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:

$ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))

$ echo "$f1"

1-0

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

1-0

2-0

1-0

You can now revert back from columns to rows like:

$ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")

1 9 9 9 9 9 9 9 9 9 1 2 1  

0 9 9 9 9 9 9 9 9 9 0 0 0

And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.

Only thing left is to loop over all files. Can be done with a kind of bash loop:

for f in ./wa_filtering_DP15_good_pops_snps_file_*;do

  f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))

  awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...

done

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

add a comment |

This is an alternative approach, which might be slow for million of files compared to pure awk solutions.

Using something like this, you can transpose rows to columns:

$ cat file1

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0



$ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')

1-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

1-0

2-0

1-0

You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:

$ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))

$ echo "$f1"

1-0

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

1-0

2-0

1-0

You can now revert back from columns to rows like:

$ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")

1 9 9 9 9 9 9 9 9 9 1 2 1  

0 9 9 9 9 9 9 9 9 9 0 0 0

And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.

Only thing left is to loop over all files. Can be done with a kind of bash loop:

for f in ./wa_filtering_DP15_good_pops_snps_file_*;do

  f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))

  awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...

done

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

This is an alternative approach, which might be slow for million of files compared to pure awk solutions.

Using something like this, you can transpose rows to columns:

$ cat file1

1   0   0   0   0   0   0   0   0   0   1   2   1   

0   0   0   0   0   0   0   0   0   0   0   0   0



$ paste -d'-' <(head -n1 file1 |tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')

1-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

0-0

1-0

2-0

1-0

You can then replace all 0-0 occurences with 9-9 with a simple sed, and you can store the output to a temp variable:

$ f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 file1|tr -s ' ' 'n') <(tail -n1 file1 |tr -s ' ' 'n')))

$ echo "$f1"

1-0

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

9-9

1-0

2-0

1-0

You can now revert back from columns to rows like:

$ awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1")

1 9 9 9 9 9 9 9 9 9 1 2 1  

0 9 9 9 9 9 9 9 9 9 0 0 0

And you can also append >file1 at the end of last awk command to overwrite the file1 with the new contents.

Only thing left is to loop over all files. Can be done with a kind of bash loop:

for f in ./wa_filtering_DP15_good_pops_snps_file_*;do

  f1=$(sed 's/0-0/9-9/g' <(paste -d'-' <(head -n1 "$f"|tr -s ' ' 'n') <(tail -n1 "$f" |tr -s ' ' 'n')))

  awk -F'-' 'NR==FNR{printf "%s ",$1;p=1;next}p{printf "n";p=0}{printf "%s ",$2}END{printf "n"}' <(echo "$f1") <(echo "$f1") #>"$f" #uncomment >"$f" to overwrite the files...

done

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

answered Sep 20 '17 at 20:12

George Vasiliou

5,83531130

add a comment |

With awk:

NR == 1 {   # save the values from 1st line in array t

            split($0, t, FS);

        }



NR == 2 {   # compare values from second line with those stored in array t

            for ( i = 1; i <= NF; ++i ) {

                # build l1 and l2 (line 1 and line 2) based on comparison

                if ($i == 0 && t[i] == 0) {

                    l1 = (i == 1 ? 9    : l1 OFS 9    );

                    l2 = (i == 1 ? 9    : l2 OFS 9    );

                } else {

                    l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                    l2 = (i == 1 ? $i   : l2 OFS $i   );

                }

            }

        }



END     {   # output the two constructed lines

            print l1;

            print l2;

        }

Running it on the example file:

$ awk -f script.awk file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:

mkdir modified



for name in wa_filtering_DP15_good_pops_snps_file_*; do

    awk -f script.awk "$name" >"modified/$name.new"

done

This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.

I opted for creating new files so that the originals are left unmodified.

I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.

In general, try not to create millions of files in a single directory. Instead either

create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.

The following variant will be more efficiently run on millions of files:

FNR == 1    {   # save the values from 1st line in array t

                split($0, t, FS);

            }



FNR == 2    {   # compare values from second line with those stored in array t

                for ( i = 1; i <= NF; ++i ) {

                    # build l1 and l2 (line 1 and line 2) based on comparison

                    if ($i == 0 && t[i] == 0) {

                        l1 = (i == 1 ? 9    : l1 OFS 9    );

                        l2 = (i == 1 ? 9    : l2 OFS 9    );

                    } else {

                        l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                        l2 = (i == 1 ? $i   : l2 OFS $i   );

                    }

                }



                # create output filename based on input filename

                # and output the two lines

                f = "modified/" FILENAME ".new";

                print l1 >f;

                print l2 >f;

            }

To run it:

mkdir modified



find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*' 

    -exec awk -f script.awk {} +

The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.

edited Sep 21 '17 at 11:35

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

– MiniMax
Sep 21 '17 at 12:57

@MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

– Kusalananda♦
Sep 21 '17 at 13:00

add a comment |

With awk:

NR == 1 {   # save the values from 1st line in array t

            split($0, t, FS);

        }



NR == 2 {   # compare values from second line with those stored in array t

            for ( i = 1; i <= NF; ++i ) {

                # build l1 and l2 (line 1 and line 2) based on comparison

                if ($i == 0 && t[i] == 0) {

                    l1 = (i == 1 ? 9    : l1 OFS 9    );

                    l2 = (i == 1 ? 9    : l2 OFS 9    );

                } else {

                    l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                    l2 = (i == 1 ? $i   : l2 OFS $i   );

                }

            }

        }



END     {   # output the two constructed lines

            print l1;

            print l2;

        }

Running it on the example file:

$ awk -f script.awk file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:

mkdir modified



for name in wa_filtering_DP15_good_pops_snps_file_*; do

    awk -f script.awk "$name" >"modified/$name.new"

done

This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.

I opted for creating new files so that the originals are left unmodified.

I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.

In general, try not to create millions of files in a single directory. Instead either

create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.

The following variant will be more efficiently run on millions of files:

FNR == 1    {   # save the values from 1st line in array t

                split($0, t, FS);

            }



FNR == 2    {   # compare values from second line with those stored in array t

                for ( i = 1; i <= NF; ++i ) {

                    # build l1 and l2 (line 1 and line 2) based on comparison

                    if ($i == 0 && t[i] == 0) {

                        l1 = (i == 1 ? 9    : l1 OFS 9    );

                        l2 = (i == 1 ? 9    : l2 OFS 9    );

                    } else {

                        l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                        l2 = (i == 1 ? $i   : l2 OFS $i   );

                    }

                }



                # create output filename based on input filename

                # and output the two lines

                f = "modified/" FILENAME ".new";

                print l1 >f;

                print l2 >f;

            }

To run it:

mkdir modified



find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*' 

    -exec awk -f script.awk {} +

The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.

edited Sep 21 '17 at 11:35

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

– MiniMax
Sep 21 '17 at 12:57

@MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

– Kusalananda♦
Sep 21 '17 at 13:00

add a comment |

With awk:

NR == 1 {   # save the values from 1st line in array t

            split($0, t, FS);

        }



NR == 2 {   # compare values from second line with those stored in array t

            for ( i = 1; i <= NF; ++i ) {

                # build l1 and l2 (line 1 and line 2) based on comparison

                if ($i == 0 && t[i] == 0) {

                    l1 = (i == 1 ? 9    : l1 OFS 9    );

                    l2 = (i == 1 ? 9    : l2 OFS 9    );

                } else {

                    l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                    l2 = (i == 1 ? $i   : l2 OFS $i   );

                }

            }

        }



END     {   # output the two constructed lines

            print l1;

            print l2;

        }

Running it on the example file:

$ awk -f script.awk file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:

mkdir modified



for name in wa_filtering_DP15_good_pops_snps_file_*; do

    awk -f script.awk "$name" >"modified/$name.new"

done

This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.

I opted for creating new files so that the originals are left unmodified.

I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.

In general, try not to create millions of files in a single directory. Instead either

create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.

The following variant will be more efficiently run on millions of files:

FNR == 1    {   # save the values from 1st line in array t

                split($0, t, FS);

            }



FNR == 2    {   # compare values from second line with those stored in array t

                for ( i = 1; i <= NF; ++i ) {

                    # build l1 and l2 (line 1 and line 2) based on comparison

                    if ($i == 0 && t[i] == 0) {

                        l1 = (i == 1 ? 9    : l1 OFS 9    );

                        l2 = (i == 1 ? 9    : l2 OFS 9    );

                    } else {

                        l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                        l2 = (i == 1 ? $i   : l2 OFS $i   );

                    }

                }



                # create output filename based on input filename

                # and output the two lines

                f = "modified/" FILENAME ".new";

                print l1 >f;

                print l2 >f;

            }

To run it:

mkdir modified



find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*' 

    -exec awk -f script.awk {} +

The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.

edited Sep 21 '17 at 11:35

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

With awk:

NR == 1 {   # save the values from 1st line in array t

            split($0, t, FS);

        }



NR == 2 {   # compare values from second line with those stored in array t

            for ( i = 1; i <= NF; ++i ) {

                # build l1 and l2 (line 1 and line 2) based on comparison

                if ($i == 0 && t[i] == 0) {

                    l1 = (i == 1 ? 9    : l1 OFS 9    );

                    l2 = (i == 1 ? 9    : l2 OFS 9    );

                } else {

                    l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                    l2 = (i == 1 ? $i   : l2 OFS $i   );

                }

            }

        }



END     {   # output the two constructed lines

            print l1;

            print l2;

        }

Running it on the example file:

$ awk -f script.awk file

1 9 9 9 9 9 9 9 9 9 1 2 1

0 9 9 9 9 9 9 9 9 9 0 0 0

Running on all files matching wa_filtering_DP15_good_pops_snps_file_* in current directory:

mkdir modified



for name in wa_filtering_DP15_good_pops_snps_file_*; do

    awk -f script.awk "$name" >"modified/$name.new"

done

This will create a new file for each input file, with the name of the original file and an extra .new suffix. The new files will be placed in the modified folder in the current directory.

I opted for creating new files so that the originals are left unmodified.

I opted to put the new files in a new directory, as having 22 million files in a single directory could make the filesystem be a bit awkward to work with.

In general, try not to create millions of files in a single directory. Instead either

create many subdirectories and distribute the files in them, maybe based on a binning algorithm working on that last integer of the filename, or a hash, or

create a single output file that aggregates all data, possibly with extra lines of text identifying what the following two lines refer to.

The following variant will be more efficiently run on millions of files:

FNR == 1    {   # save the values from 1st line in array t

                split($0, t, FS);

            }



FNR == 2    {   # compare values from second line with those stored in array t

                for ( i = 1; i <= NF; ++i ) {

                    # build l1 and l2 (line 1 and line 2) based on comparison

                    if ($i == 0 && t[i] == 0) {

                        l1 = (i == 1 ? 9    : l1 OFS 9    );

                        l2 = (i == 1 ? 9    : l2 OFS 9    );

                    } else {

                        l1 = (i == 1 ? t[i] : l1 OFS t[i] );

                        l2 = (i == 1 ? $i   : l2 OFS $i   );

                    }

                }



                # create output filename based on input filename

                # and output the two lines

                f = "modified/" FILENAME ".new";

                print l1 >f;

                print l2 >f;

            }

To run it:

mkdir modified



find . -maxdepth 1 -type f -name 'wa_filtering_DP15_good_pops_snps_file_*' 

    -exec awk -f script.awk {} +

The new files will be generated in the modified folder as before, but this time only a fraction of awk processes will be started and the speed of processing will be greatly increased.

edited Sep 21 '17 at 11:35

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

edited Sep 21 '17 at 11:35

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

answered Sep 20 '17 at 18:23

Kusalananda♦

142k18266442

I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

– MiniMax
Sep 21 '17 at 12:57

@MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

– Kusalananda♦
Sep 21 '17 at 13:00

add a comment |

I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

– MiniMax
Sep 21 '17 at 12:57

@MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

– Kusalananda♦
Sep 21 '17 at 13:00

I simplified your for loop body to this: ` up = t[i]; low = $i; if (up == 0 && low == 0) { up = 9; low = 9; } if(i != 1) { up = OFS up; low = OFS low; } l1 = l1 up; l2 = l2 low; `. Clearer and cleaner, in my opinion. Plus, one if test less.

– MiniMax
Sep 21 '17 at 12:57

@MiniMax That may be a good modification, I agree. I will fix it as soon as I have spare time on my hands (busy ATM).

– Kusalananda♦
Sep 21 '17 at 13:00

add a comment |

First variant:

For single file:

datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose

For many files do the same in the loop:

for i in *; do datamash -W transpose < "$i" |

sed 's/0t0/9t9/' |

datamash transpose > "new_$i"; done

This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.

Second variant:

This is a solution for the single file, for multiple files use loop, as in the previous variant.

tr 'n' 't' < input.txt |

awk '{

    num = NF / 2;

    for(up = 1; up <= NF; up++) {

        if(up <= num) {

            low = num + up;

            if(!$up && !$low) {

                $up = 9;    

                $low = 9;

            }

        }



        printf "%st", $up;



        if(up % num == 0) 

            print "";

    }

}'

Explanation

tr 'n' 't' < input.txt - join two lines together.

awk
- checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.
- if both elements are 0, it changes them to 9.
- print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.
- Each time the element number is a multiple of the number of elements in the line, adds a new line.

Input

1   0   0   0   0   0   0   0   0   0   1   2   1

0   0   0   0   0   0   0   0   0   0   0   0   0

Output

1   9   9   9   9   9   9   9   9   9   1   2   1

0   9   9   9   9   9   9   9   9   9   0   0   0

edited Sep 21 '17 at 21:10

answered Sep 21 '17 at 0:02

MiniMax

2,831819

'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

– RonJohn
Sep 21 '17 at 0:59

1

@RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

– Kusalananda♦
Sep 21 '17 at 6:15

But the '*' will expand in the bash command buffer, certainly overflowing it.

– RonJohn
Sep 22 '17 at 15:40

@RonJohn Here is the answer to your question.

– MiniMax
Sep 22 '17 at 17:22

add a comment |

First variant:

For single file:

datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose

For many files do the same in the loop:

for i in *; do datamash -W transpose < "$i" |

sed 's/0t0/9t9/' |

datamash transpose > "new_$i"; done

This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.

Second variant:

This is a solution for the single file, for multiple files use loop, as in the previous variant.

tr 'n' 't' < input.txt |

awk '{

    num = NF / 2;

    for(up = 1; up <= NF; up++) {

        if(up <= num) {

            low = num + up;

            if(!$up && !$low) {

                $up = 9;    

                $low = 9;

            }

        }



        printf "%st", $up;



        if(up % num == 0) 

            print "";

    }

}'

Explanation

tr 'n' 't' < input.txt - join two lines together.

awk
- checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.
- if both elements are 0, it changes them to 9.
- print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.
- Each time the element number is a multiple of the number of elements in the line, adds a new line.

Input

1   0   0   0   0   0   0   0   0   0   1   2   1

0   0   0   0   0   0   0   0   0   0   0   0   0

Output

1   9   9   9   9   9   9   9   9   9   1   2   1

0   9   9   9   9   9   9   9   9   9   0   0   0

edited Sep 21 '17 at 21:10

answered Sep 21 '17 at 0:02

MiniMax

2,831819

'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

– RonJohn
Sep 21 '17 at 0:59

1

@RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

– Kusalananda♦
Sep 21 '17 at 6:15

But the '*' will expand in the bash command buffer, certainly overflowing it.

– RonJohn
Sep 22 '17 at 15:40

@RonJohn Here is the answer to your question.

– MiniMax
Sep 22 '17 at 17:22

add a comment |

First variant:

For single file:

datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose

For many files do the same in the loop:

for i in *; do datamash -W transpose < "$i" |

sed 's/0t0/9t9/' |

datamash transpose > "new_$i"; done

This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.

Second variant:

This is a solution for the single file, for multiple files use loop, as in the previous variant.

tr 'n' 't' < input.txt |

awk '{

    num = NF / 2;

    for(up = 1; up <= NF; up++) {

        if(up <= num) {

            low = num + up;

            if(!$up && !$low) {

                $up = 9;    

                $low = 9;

            }

        }



        printf "%st", $up;



        if(up % num == 0) 

            print "";

    }

}'

Explanation

tr 'n' 't' < input.txt - join two lines together.

awk
- checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.
- if both elements are 0, it changes them to 9.
- print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.
- Each time the element number is a multiple of the number of elements in the line, adds a new line.

Input

1   0   0   0   0   0   0   0   0   0   1   2   1

0   0   0   0   0   0   0   0   0   0   0   0   0

Output

1   9   9   9   9   9   9   9   9   9   1   2   1

0   9   9   9   9   9   9   9   9   9   0   0   0

edited Sep 21 '17 at 21:10

answered Sep 21 '17 at 0:02

MiniMax

2,831819

First variant:

For single file:

datamash -W transpose < input.txt | sed 's/0t0/9t9/' | datamash transpose

For many files do the same in the loop:

for i in *; do datamash -W transpose < "$i" |

sed 's/0t0/9t9/' |

datamash transpose > "new_$i"; done

This loop will create the new, changed file for the each file, with the prefix "new_" added. Then you can remove all old files and remove prefix "new_" from filenames.

Second variant:

This is a solution for the single file, for multiple files use loop, as in the previous variant.

tr 'n' 't' < input.txt |

awk '{

    num = NF / 2;

    for(up = 1; up <= NF; up++) {

        if(up <= num) {

            low = num + up;

            if(!$up && !$low) {

                $up = 9;    

                $low = 9;

            }

        }



        printf "%st", $up;



        if(up % num == 0) 

            print "";

    }

}'

Explanation

tr 'n' 't' < input.txt - join two lines together.

awk
- checks the one element from the first line and the adjacent element from the second line simultaneously, like: 1 and 316, 2 and 317, 3 and 318, so on.
- if both elements are 0, it changes them to 9.
- print fields by the order - 1, 2, 3, 4 ... 628, 629, 630.
- Each time the element number is a multiple of the number of elements in the line, adds a new line.

Input

1   0   0   0   0   0   0   0   0   0   1   2   1

0   0   0   0   0   0   0   0   0   0   0   0   0

Output

1   9   9   9   9   9   9   9   9   9   1   2   1

0   9   9   9   9   9   9   9   9   9   0   0   0

edited Sep 21 '17 at 21:10

answered Sep 21 '17 at 0:02

MiniMax

2,831819

edited Sep 21 '17 at 21:10

answered Sep 21 '17 at 0:02

MiniMax

2,831819

answered Sep 21 '17 at 0:02

MiniMax

2,831819

answered Sep 21 '17 at 0:02

MiniMax

2,831819

'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

– RonJohn
Sep 21 '17 at 0:59

1

@RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

– Kusalananda♦
Sep 21 '17 at 6:15

But the '*' will expand in the bash command buffer, certainly overflowing it.

– RonJohn
Sep 22 '17 at 15:40

@RonJohn Here is the answer to your question.

– MiniMax
Sep 22 '17 at 17:22

add a comment |

'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

– RonJohn
Sep 21 '17 at 0:59

1

@RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

– Kusalananda♦
Sep 21 '17 at 6:15

But the '*' will expand in the bash command buffer, certainly overflowing it.

– RonJohn
Sep 22 '17 at 15:40

@RonJohn Here is the answer to your question.

– MiniMax
Sep 22 '17 at 17:22

'for i in *' is probably not a valid solution for 11 million files. Use xargs instead.

– RonJohn
Sep 21 '17 at 0:59

@RonJohn * will only be a problem if the shell uses it to execute an external command. for i in * will not be a problem on 11 million files in itself.

– Kusalananda♦
Sep 21 '17 at 6:15

But the '*' will expand in the bash command buffer, certainly overflowing it.

– RonJohn
Sep 22 '17 at 15:40

@RonJohn Here is the answer to your question.

– MiniMax
Sep 22 '17 at 17:22

add a comment |

#!/bin/bash



# compare two rows in a file

# when both are 0, change both to 9

# otherwise keep original value



ProgName=${0##*/}

Pid=$$

DBG_FNAME=""

scriptUsage() {

cat <<ENDUSE



  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]



  path/to/directory:    Path to directory (NO trailing '/')

  -f|--filename:        Print the each file name to stdout after complete

  -d|--debug:           Run in debug mode (Implies filename option - SEE NOTE*)

  -h|--help:            Print this help message



  NOTE:  USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]

         You DO NOT need both together!



ENDUSE

}



# check args

#!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'

[[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }

[[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }

[[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }

if (( $# > 2 ))

  then

    DBG_FNAME=1

    >&2 echo "Running in debug mode from using ${2} & ${3} together!"

    echo "PID is: $Pid"

    sleep 2

    set -x

  else

    [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1

    [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }

fi

#!!# to here #!!#

# directory as arg[1] or change to hardcoded

  WorkDir="$1"



# check for/remove trailing slash

[[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}



# given file root withOUT number ending

  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"





##== MAIN LOOP

for file in ${WorkFile}*

  do

    # reset these after each file

    TopRow=""

    BotRow=""

    NewTop=""

    NewBot=""

    SKIPME=""



    # get top row of file

    TopRow=$(sed -n '1{p;q}' $file)

    # get bottom row of file

    BotRow=$(sed -n '2{p;q}' $file)



    ##-- EACH FILE LOOP

    for (( f=0; f<${#TopRow}; f++ ))

      do

        if [[ -n $SKIPME ]]

          then

            # SKIPME is -z by default so

            # this runs every other time through

            NewTop="${NewTop} "

            NewBot="${NewBot} "

            SKIPME=""

        elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))

          then

            # 0+0=0 so change to 9

            NewTop="${NewTop}9"

            NewBot="${NewBot}9"

            SKIPME=1

        else

            # (1+0 or 0+1)!=0 so keep originals

            NewTop="${NewTop}${TopRow:${f}:1}"

            NewBot="${NewBot}${BotRow:${f}:1}"

            SKIPME=1

        fi

    done

    ##--



    # overwrite original file

    printf "%sn%s" "$NewTop" "$NewBot" > $file



    # if -f|--filename given print file name

    [[ -n $DBG_FNAME ]] && echo "$file is complete"

done

##==

DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

add a comment |

#!/bin/bash



# compare two rows in a file

# when both are 0, change both to 9

# otherwise keep original value



ProgName=${0##*/}

Pid=$$

DBG_FNAME=""

scriptUsage() {

cat <<ENDUSE



  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]



  path/to/directory:    Path to directory (NO trailing '/')

  -f|--filename:        Print the each file name to stdout after complete

  -d|--debug:           Run in debug mode (Implies filename option - SEE NOTE*)

  -h|--help:            Print this help message



  NOTE:  USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]

         You DO NOT need both together!



ENDUSE

}



# check args

#!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'

[[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }

[[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }

[[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }

if (( $# > 2 ))

  then

    DBG_FNAME=1

    >&2 echo "Running in debug mode from using ${2} & ${3} together!"

    echo "PID is: $Pid"

    sleep 2

    set -x

  else

    [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1

    [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }

fi

#!!# to here #!!#

# directory as arg[1] or change to hardcoded

  WorkDir="$1"



# check for/remove trailing slash

[[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}



# given file root withOUT number ending

  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"





##== MAIN LOOP

for file in ${WorkFile}*

  do

    # reset these after each file

    TopRow=""

    BotRow=""

    NewTop=""

    NewBot=""

    SKIPME=""



    # get top row of file

    TopRow=$(sed -n '1{p;q}' $file)

    # get bottom row of file

    BotRow=$(sed -n '2{p;q}' $file)



    ##-- EACH FILE LOOP

    for (( f=0; f<${#TopRow}; f++ ))

      do

        if [[ -n $SKIPME ]]

          then

            # SKIPME is -z by default so

            # this runs every other time through

            NewTop="${NewTop} "

            NewBot="${NewBot} "

            SKIPME=""

        elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))

          then

            # 0+0=0 so change to 9

            NewTop="${NewTop}9"

            NewBot="${NewBot}9"

            SKIPME=1

        else

            # (1+0 or 0+1)!=0 so keep originals

            NewTop="${NewTop}${TopRow:${f}:1}"

            NewBot="${NewBot}${BotRow:${f}:1}"

            SKIPME=1

        fi

    done

    ##--



    # overwrite original file

    printf "%sn%s" "$NewTop" "$NewBot" > $file



    # if -f|--filename given print file name

    [[ -n $DBG_FNAME ]] && echo "$file is complete"

done

##==

DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

add a comment |

#!/bin/bash



# compare two rows in a file

# when both are 0, change both to 9

# otherwise keep original value



ProgName=${0##*/}

Pid=$$

DBG_FNAME=""

scriptUsage() {

cat <<ENDUSE



  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]



  path/to/directory:    Path to directory (NO trailing '/')

  -f|--filename:        Print the each file name to stdout after complete

  -d|--debug:           Run in debug mode (Implies filename option - SEE NOTE*)

  -h|--help:            Print this help message



  NOTE:  USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]

         You DO NOT need both together!



ENDUSE

}



# check args

#!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'

[[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }

[[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }

[[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }

if (( $# > 2 ))

  then

    DBG_FNAME=1

    >&2 echo "Running in debug mode from using ${2} & ${3} together!"

    echo "PID is: $Pid"

    sleep 2

    set -x

  else

    [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1

    [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }

fi

#!!# to here #!!#

# directory as arg[1] or change to hardcoded

  WorkDir="$1"



# check for/remove trailing slash

[[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}



# given file root withOUT number ending

  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"





##== MAIN LOOP

for file in ${WorkFile}*

  do

    # reset these after each file

    TopRow=""

    BotRow=""

    NewTop=""

    NewBot=""

    SKIPME=""



    # get top row of file

    TopRow=$(sed -n '1{p;q}' $file)

    # get bottom row of file

    BotRow=$(sed -n '2{p;q}' $file)



    ##-- EACH FILE LOOP

    for (( f=0; f<${#TopRow}; f++ ))

      do

        if [[ -n $SKIPME ]]

          then

            # SKIPME is -z by default so

            # this runs every other time through

            NewTop="${NewTop} "

            NewBot="${NewBot} "

            SKIPME=""

        elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))

          then

            # 0+0=0 so change to 9

            NewTop="${NewTop}9"

            NewBot="${NewBot}9"

            SKIPME=1

        else

            # (1+0 or 0+1)!=0 so keep originals

            NewTop="${NewTop}${TopRow:${f}:1}"

            NewBot="${NewBot}${BotRow:${f}:1}"

            SKIPME=1

        fi

    done

    ##--



    # overwrite original file

    printf "%sn%s" "$NewTop" "$NewBot" > $file



    # if -f|--filename given print file name

    [[ -n $DBG_FNAME ]] && echo "$file is complete"

done

##==

DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

#!/bin/bash



# compare two rows in a file

# when both are 0, change both to 9

# otherwise keep original value



ProgName=${0##*/}

Pid=$$

DBG_FNAME=""

scriptUsage() {

cat <<ENDUSE



  $ProgName </path/to/directory> [ [-d|--debug] || [-f|--filename] ]



  path/to/directory:    Path to directory (NO trailing '/')

  -f|--filename:        Print the each file name to stdout after complete

  -d|--debug:           Run in debug mode (Implies filename option - SEE NOTE*)

  -h|--help:            Print this help message



  NOTE:  USING [-d|--debug] AUTOMATICALLY SETS [-f|--filename]

         You DO NOT need both together!



ENDUSE

}



# check args

#!# NOTE: you can delete from here to #!!# above 'WorkDir="$1"'

[[ -z $1 ]] && { >&2 echo "MISSING file source directory!"; scriptUsage; exit 1; }

[[ $1 == "-h" || $1 == "--help" ]] && { scriptUsage; exit 0; }

[[ -d $1 ]] || { >&2 echo "Unable to locate directory [$1]"; exit 1; }

if (( $# > 2 ))

  then

    DBG_FNAME=1

    >&2 echo "Running in debug mode from using ${2} & ${3} together!"

    echo "PID is: $Pid"

    sleep 2

    set -x

  else

    [[ $2 == "-f" || $2 == "--filename" ]] && DBG_FNAME=1

    [[ $2 == "-d" || $2 == "--debug" ]] && { echo "PID is: $Pid"; set -x; }

fi

#!!# to here #!!#

# directory as arg[1] or change to hardcoded

  WorkDir="$1"



# check for/remove trailing slash

[[ ${WorkDir:(-1)} == / ]] && WorkDir=${WorkDir:0:((${#WorkDir}-1))}



# given file root withOUT number ending

  WorkFile="${WorkDir}/wa_filtering_DP15_good_pops_snps_file_"





##== MAIN LOOP

for file in ${WorkFile}*

  do

    # reset these after each file

    TopRow=""

    BotRow=""

    NewTop=""

    NewBot=""

    SKIPME=""



    # get top row of file

    TopRow=$(sed -n '1{p;q}' $file)

    # get bottom row of file

    BotRow=$(sed -n '2{p;q}' $file)



    ##-- EACH FILE LOOP

    for (( f=0; f<${#TopRow}; f++ ))

      do

        if [[ -n $SKIPME ]]

          then

            # SKIPME is -z by default so

            # this runs every other time through

            NewTop="${NewTop} "

            NewBot="${NewBot} "

            SKIPME=""

        elif (( $((${TopRow:${f}:1}+${BotRow:${f}:1})) == 0 ))

          then

            # 0+0=0 so change to 9

            NewTop="${NewTop}9"

            NewBot="${NewBot}9"

            SKIPME=1

        else

            # (1+0 or 0+1)!=0 so keep originals

            NewTop="${NewTop}${TopRow:${f}:1}"

            NewBot="${NewBot}${BotRow:${f}:1}"

            SKIPME=1

        fi

    done

    ##--



    # overwrite original file

    printf "%sn%s" "$NewTop" "$NewBot" > $file



    # if -f|--filename given print file name

    [[ -n $DBG_FNAME ]] && echo "$file is complete"

done

##==

DOES EDIT FILES IN PLACE. Wouldn't be hard to have it make backups as it runs. Returns files exactly the way requested above.

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

edited 6 hours ago

Rui F Ribeiro

42.1k1484142

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

answered Oct 1 '17 at 2:47

EnterUserNameHere

10818

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mdthbs