File splitting Based on Column Name With `awk` FailsHandling dynamically changing column positions and...

What is this "opened" cube called?

What is the practical impact of using System.Random which is not cryptographically random?

“all of who” or “all of whom”?

Can I leave a large suitcase at TPE during a 4-hour layover, and pick it up 4.5 days later when I come back to TPE on my way to Taipei downtown?

Am I required to correct my opponent's assumptions about my morph creatures?

Journal published a paper, ignoring my objections as a referee

What caused the end of cybernetic implants?

Was it illegal to blaspheme God in Antioch in 360.-410.?

Turning an Abelian Group into a Vector Space

Resources to learn about firearms?

Why haven't the British protested Brexit as ardently like Hong Kongers protest?

Printing a list as "a, b, c." using Python

Who declared the Last Alliance to be the "last" and why?

Welche normative Autorität hat der Duden? / What's the normative authority of the Duden?

In what language did Túrin converse with Mím?

How to save money by shopping at a variety of grocery stores?

Rapid change in character

Was a six-engine 747 ever seriously considered by Boeing?

How to differentiate between two people with the same name in a story?

Which is the correct version of Mussorgsky's Pictures at an Exhibition?

German equivalent to "going down the rabbit hole"

How to investigate an unknown 1.5GB file named "sudo" in my Linux home directory?

Can I lend a small amount of my own money to a bank at the federal funds rate?

What is a "hashed transaction" in SQL Server Replication terminology?



File splitting Based on Column Name With `awk` Fails


Handling dynamically changing column positions and splitting filesplitting a column using awksplitting a file by column numberLast line is wrong when splitting a file with awkNested 'awk' in a 'while' loop, parse two files line by line and compare column valuescompare files basis two columns and add fieldMatching columns of different csv files, not working when column value is different lengthawk manipulation of a fileHaving trouble parsing a csv file from the commandlineFind value from array in one file, look up value in another file, and then use that value to find another and set as variable






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







0















I've tried:




  1. awk '{if (last != $1) close(last); print > $1; last = $1}' file


  2. awk -F$'t' '{ print > ($1) }' file


  3. awk '{if (last != $1) close(last); print >> $1; last = $1}' file



To split a very large text file (33GB) into multiple files named by first column.



For smaller files everything works fine but for large files awk stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).



Example: it just stops before reaching real end of column of type "10"



10      69331427        1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10


EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'



GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)










share|improve this question









New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






















  • Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.

    – osowiecki
    7 hours ago











  • Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".

    – osowiecki
    7 hours ago













  • idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement it will look different... you mean the input file - maybe that's an indication that your input file contains undesirable control characters?

    – Ed Morton
    4 hours ago













  • So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....

    – Ed Morton
    4 hours ago













  • As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).

    – osowiecki
    3 hours ago




















0















I've tried:




  1. awk '{if (last != $1) close(last); print > $1; last = $1}' file


  2. awk -F$'t' '{ print > ($1) }' file


  3. awk '{if (last != $1) close(last); print >> $1; last = $1}' file



To split a very large text file (33GB) into multiple files named by first column.



For smaller files everything works fine but for large files awk stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).



Example: it just stops before reaching real end of column of type "10"



10      69331427        1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10


EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'



GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)










share|improve this question









New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






















  • Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.

    – osowiecki
    7 hours ago











  • Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".

    – osowiecki
    7 hours ago













  • idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement it will look different... you mean the input file - maybe that's an indication that your input file contains undesirable control characters?

    – Ed Morton
    4 hours ago













  • So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....

    – Ed Morton
    4 hours ago













  • As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).

    – osowiecki
    3 hours ago
















0












0








0








I've tried:




  1. awk '{if (last != $1) close(last); print > $1; last = $1}' file


  2. awk -F$'t' '{ print > ($1) }' file


  3. awk '{if (last != $1) close(last); print >> $1; last = $1}' file



To split a very large text file (33GB) into multiple files named by first column.



For smaller files everything works fine but for large files awk stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).



Example: it just stops before reaching real end of column of type "10"



10      69331427        1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10


EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'



GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)










share|improve this question









New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I've tried:




  1. awk '{if (last != $1) close(last); print > $1; last = $1}' file


  2. awk -F$'t' '{ print > ($1) }' file


  3. awk '{if (last != $1) close(last); print >> $1; last = $1}' file



To split a very large text file (33GB) into multiple files named by first column.



For smaller files everything works fine but for large files awk stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).



Example: it just stops before reaching real end of column of type "10"



10      69331427        1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10


EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'



GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)







linux bash awk






share|improve this question









New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|improve this question









New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|improve this question




share|improve this question








edited 3 hours ago









Vasconcelos1914

14412 bronze badges




14412 bronze badges






New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked 8 hours ago









osowieckiosowiecki

12 bronze badges




12 bronze badges




New contributor



osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




osowiecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















  • Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.

    – osowiecki
    7 hours ago











  • Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".

    – osowiecki
    7 hours ago













  • idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement it will look different... you mean the input file - maybe that's an indication that your input file contains undesirable control characters?

    – Ed Morton
    4 hours ago













  • So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....

    – Ed Morton
    4 hours ago













  • As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).

    – osowiecki
    3 hours ago





















  • Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.

    – osowiecki
    7 hours ago











  • Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".

    – osowiecki
    7 hours ago













  • idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement it will look different... you mean the input file - maybe that's an indication that your input file contains undesirable control characters?

    – Ed Morton
    4 hours ago













  • So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....

    – Ed Morton
    4 hours ago













  • As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).

    – osowiecki
    3 hours ago



















Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.

– osowiecki
7 hours ago





Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.

– osowiecki
7 hours ago













Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".

– osowiecki
7 hours ago







Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".

– osowiecki
7 hours ago















idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement it will look different... you mean the input file - maybe that's an indication that your input file contains undesirable control characters?

– Ed Morton
4 hours ago







idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement it will look different... you mean the input file - maybe that's an indication that your input file contains undesirable control characters?

– Ed Morton
4 hours ago















So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....

– Ed Morton
4 hours ago







So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....

– Ed Morton
4 hours ago















As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).

– osowiecki
3 hours ago







As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).

– osowiecki
3 hours ago












1 Answer
1






active

oldest

votes


















0















To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:



awk '{print > $1}' file


That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.



I don't know what you mean by awk stops near the end of column type, nor forgets to input newline characters for columns that have "." in them, nor it just stops before reaching real end of column of type "10". That may partially be because there's nothing in your question to indicate what "column type" means to you.






share|improve this answer




























  • For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

    – Kusalananda
    4 hours ago











  • @Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

    – Ed Morton
    4 hours ago













  • You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

    – Kusalananda
    4 hours ago













  • @EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

    – mosvy
    4 hours ago













  • @mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

    – Ed Morton
    4 hours ago














Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






osowiecki is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f538326%2ffile-splitting-based-on-column-name-with-awk-fails%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0















To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:



awk '{print > $1}' file


That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.



I don't know what you mean by awk stops near the end of column type, nor forgets to input newline characters for columns that have "." in them, nor it just stops before reaching real end of column of type "10". That may partially be because there's nothing in your question to indicate what "column type" means to you.






share|improve this answer




























  • For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

    – Kusalananda
    4 hours ago











  • @Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

    – Ed Morton
    4 hours ago













  • You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

    – Kusalananda
    4 hours ago













  • @EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

    – mosvy
    4 hours ago













  • @mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

    – Ed Morton
    4 hours ago
















0















To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:



awk '{print > $1}' file


That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.



I don't know what you mean by awk stops near the end of column type, nor forgets to input newline characters for columns that have "." in them, nor it just stops before reaching real end of column of type "10". That may partially be because there's nothing in your question to indicate what "column type" means to you.






share|improve this answer




























  • For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

    – Kusalananda
    4 hours ago











  • @Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

    – Ed Morton
    4 hours ago













  • You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

    – Kusalananda
    4 hours ago













  • @EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

    – mosvy
    4 hours ago













  • @mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

    – Ed Morton
    4 hours ago














0














0










0









To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:



awk '{print > $1}' file


That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.



I don't know what you mean by awk stops near the end of column type, nor forgets to input newline characters for columns that have "." in them, nor it just stops before reaching real end of column of type "10". That may partially be because there's nothing in your question to indicate what "column type" means to you.






share|improve this answer















To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:



awk '{print > $1}' file


That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.



I don't know what you mean by awk stops near the end of column type, nor forgets to input newline characters for columns that have "." in them, nor it just stops before reaching real end of column of type "10". That may partially be because there's nothing in your question to indicate what "column type" means to you.







share|improve this answer














share|improve this answer



share|improve this answer








edited 4 hours ago

























answered 4 hours ago









Ed MortonEd Morton

1,3454 silver badges9 bronze badges




1,3454 silver badges9 bronze badges
















  • For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

    – Kusalananda
    4 hours ago











  • @Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

    – Ed Morton
    4 hours ago













  • You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

    – Kusalananda
    4 hours ago













  • @EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

    – mosvy
    4 hours ago













  • @mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

    – Ed Morton
    4 hours ago



















  • For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

    – Kusalananda
    4 hours ago











  • @Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

    – Ed Morton
    4 hours ago













  • You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

    – Kusalananda
    4 hours ago













  • @EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

    – mosvy
    4 hours ago













  • @mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

    – Ed Morton
    4 hours ago

















For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

– Kusalananda
4 hours ago





For a very large file with thousands of unique values in the first column, this may cause awk to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.

– Kusalananda
4 hours ago













@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

– Ed Morton
4 hours ago







@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.

– Ed Morton
4 hours ago















You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

– Kusalananda
4 hours ago







You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.

– Kusalananda
4 hours ago















@EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

– mosvy
4 hours ago







@EdMorton then you should write gawk instead of awk in your answer and mention that. But even with gawk, that is horribly dangerous, 1st because $1 may contain ../.., etc, and 2nd because gawk will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'.

– mosvy
4 hours ago















@mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

– Ed Morton
4 hours ago





@mosvy no, there's no need to write gawk when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /s and NULs and other things when the posted sample input simply has all integers in that field.

– Ed Morton
4 hours ago










osowiecki is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















osowiecki is a new contributor. Be nice, and check out our Code of Conduct.













osowiecki is a new contributor. Be nice, and check out our Code of Conduct.












osowiecki is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f538326%2ffile-splitting-based-on-column-name-with-awk-fails%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...