File splitting Based on Column Name With `awk` FailsHandling dynamically changing column positions and...
What is this "opened" cube called?
What is the practical impact of using System.Random which is not cryptographically random?
“all of who” or “all of whom”?
Can I leave a large suitcase at TPE during a 4-hour layover, and pick it up 4.5 days later when I come back to TPE on my way to Taipei downtown?
Am I required to correct my opponent's assumptions about my morph creatures?
Journal published a paper, ignoring my objections as a referee
What caused the end of cybernetic implants?
Was it illegal to blaspheme God in Antioch in 360.-410.?
Turning an Abelian Group into a Vector Space
Resources to learn about firearms?
Why haven't the British protested Brexit as ardently like Hong Kongers protest?
Printing a list as "a, b, c." using Python
Who declared the Last Alliance to be the "last" and why?
Welche normative Autorität hat der Duden? / What's the normative authority of the Duden?
In what language did Túrin converse with Mím?
How to save money by shopping at a variety of grocery stores?
Rapid change in character
Was a six-engine 747 ever seriously considered by Boeing?
How to differentiate between two people with the same name in a story?
Which is the correct version of Mussorgsky's Pictures at an Exhibition?
German equivalent to "going down the rabbit hole"
How to investigate an unknown 1.5GB file named "sudo" in my Linux home directory?
Can I lend a small amount of my own money to a bank at the federal funds rate?
What is a "hashed transaction" in SQL Server Replication terminology?
File splitting Based on Column Name With `awk` Fails
Handling dynamically changing column positions and splitting filesplitting a column using awksplitting a file by column numberLast line is wrong when splitting a file with awkNested 'awk' in a 'while' loop, parse two files line by line and compare column valuescompare files basis two columns and add fieldMatching columns of different csv files, not working when column value is different lengthawk manipulation of a fileHaving trouble parsing a csv file from the commandlineFind value from array in one file, look up value in another file, and then use that value to find another and set as variable
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I've tried:
awk '{if (last != $1) close(last); print > $1; last = $1}' file
awk -F$'t' '{ print > ($1) }' file
awk '{if (last != $1) close(last); print >> $1; last = $1}' file
To split a very large text file (33GB) into multiple files named by first column.
For smaller files everything works fine but for large files awk
stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).
Example: it just stops before reaching real end of column of type "10"
10 69331427 1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10
EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
linux bash awk
New contributor
|
show 3 more comments
I've tried:
awk '{if (last != $1) close(last); print > $1; last = $1}' file
awk -F$'t' '{ print > ($1) }' file
awk '{if (last != $1) close(last); print >> $1; last = $1}' file
To split a very large text file (33GB) into multiple files named by first column.
For smaller files everything works fine but for large files awk
stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).
Example: it just stops before reaching real end of column of type "10"
10 69331427 1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10
EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
linux bash awk
New contributor
Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.
– osowiecki
7 hours ago
Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".
– osowiecki
7 hours ago
idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statementit will look different...
you mean the input file - maybe that's an indication that your input file contains undesirable control characters?
– Ed Morton
4 hours ago
So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....
– Ed Morton
4 hours ago
As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).
– osowiecki
3 hours ago
|
show 3 more comments
I've tried:
awk '{if (last != $1) close(last); print > $1; last = $1}' file
awk -F$'t' '{ print > ($1) }' file
awk '{if (last != $1) close(last); print >> $1; last = $1}' file
To split a very large text file (33GB) into multiple files named by first column.
For smaller files everything works fine but for large files awk
stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).
Example: it just stops before reaching real end of column of type "10"
10 69331427 1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10
EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
linux bash awk
New contributor
I've tried:
awk '{if (last != $1) close(last); print > $1; last = $1}' file
awk -F$'t' '{ print > ($1) }' file
awk '{if (last != $1) close(last); print >> $1; last = $1}' file
To split a very large text file (33GB) into multiple files named by first column.
For smaller files everything works fine but for large files awk
stops near the end of column type (commands 1 and 2) or forgets to input newline characters for columns that have "." in them (command 3).
Example: it just stops before reaching real end of column of type "10"
10 69331427 1
10 69331428 1
10 69331429 1
10 69331430 1
10 69331431 1
10
EDIT :
Closing the file seems to help.
'{print >> $1; close($1)}'
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 4.0.1, GNU MP 6.1.2)
linux bash awk
linux bash awk
New contributor
New contributor
edited 3 hours ago
Vasconcelos1914
14412 bronze badges
14412 bronze badges
New contributor
asked 8 hours ago
osowieckiosowiecki
12 bronze badges
12 bronze badges
New contributor
New contributor
Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.
– osowiecki
7 hours ago
Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".
– osowiecki
7 hours ago
idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statementit will look different...
you mean the input file - maybe that's an indication that your input file contains undesirable control characters?
– Ed Morton
4 hours ago
So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....
– Ed Morton
4 hours ago
As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).
– osowiecki
3 hours ago
|
show 3 more comments
Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.
– osowiecki
7 hours ago
Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".
– osowiecki
7 hours ago
idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statementit will look different...
you mean the input file - maybe that's an indication that your input file contains undesirable control characters?
– Ed Morton
4 hours ago
So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....
– Ed Morton
4 hours ago
As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).
– osowiecki
3 hours ago
Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.
– osowiecki
7 hours ago
Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.
– osowiecki
7 hours ago
Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".
– osowiecki
7 hours ago
Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".
– osowiecki
7 hours ago
idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement
it will look different...
you mean the input file - maybe that's an indication that your input file contains undesirable control characters?– Ed Morton
4 hours ago
idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement
it will look different...
you mean the input file - maybe that's an indication that your input file contains undesirable control characters?– Ed Morton
4 hours ago
So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....
– Ed Morton
4 hours ago
So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....
– Ed Morton
4 hours ago
As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).
– osowiecki
3 hours ago
As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).
– osowiecki
3 hours ago
|
show 3 more comments
1 Answer
1
active
oldest
votes
To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:
awk '{print > $1}' file
That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.
I don't know what you mean by awk stops near the end of column type
, nor forgets to input newline characters for columns that have "." in them
, nor it just stops before reaching real end of column of type "10"
. That may partially be because there's nothing in your question to indicate what "column type" means to you.
For a very large file with thousands of unique values in the first column, this may causeawk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.
– Kusalananda♦
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
@EdMorton then you should writegawk
instead ofawk
in your answer and mention that. But even withgawk
, that is horribly dangerous, 1st because$1
may contain../..
, etc, and 2nd becausegawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filenameprintf 'foobar baz' | gawk '{print$1>$1}'
.
– mosvy
4 hours ago
@mosvy no, there's no need to writegawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against/
s and NULs and other things when the posted sample input simply has all integers in that field.
– Ed Morton
4 hours ago
|
show 7 more comments
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
osowiecki is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f538326%2ffile-splitting-based-on-column-name-with-awk-fails%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:
awk '{print > $1}' file
That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.
I don't know what you mean by awk stops near the end of column type
, nor forgets to input newline characters for columns that have "." in them
, nor it just stops before reaching real end of column of type "10"
. That may partially be because there's nothing in your question to indicate what "column type" means to you.
For a very large file with thousands of unique values in the first column, this may causeawk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.
– Kusalananda♦
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
@EdMorton then you should writegawk
instead ofawk
in your answer and mention that. But even withgawk
, that is horribly dangerous, 1st because$1
may contain../..
, etc, and 2nd becausegawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filenameprintf 'foobar baz' | gawk '{print$1>$1}'
.
– mosvy
4 hours ago
@mosvy no, there's no need to writegawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against/
s and NULs and other things when the posted sample input simply has all integers in that field.
– Ed Morton
4 hours ago
|
show 7 more comments
To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:
awk '{print > $1}' file
That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.
I don't know what you mean by awk stops near the end of column type
, nor forgets to input newline characters for columns that have "." in them
, nor it just stops before reaching real end of column of type "10"
. That may partially be because there's nothing in your question to indicate what "column type" means to you.
For a very large file with thousands of unique values in the first column, this may causeawk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.
– Kusalananda♦
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
@EdMorton then you should writegawk
instead ofawk
in your answer and mention that. But even withgawk
, that is horribly dangerous, 1st because$1
may contain../..
, etc, and 2nd becausegawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filenameprintf 'foobar baz' | gawk '{print$1>$1}'
.
– mosvy
4 hours ago
@mosvy no, there's no need to writegawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against/
s and NULs and other things when the posted sample input simply has all integers in that field.
– Ed Morton
4 hours ago
|
show 7 more comments
To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:
awk '{print > $1}' file
That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.
I don't know what you mean by awk stops near the end of column type
, nor forgets to input newline characters for columns that have "." in them
, nor it just stops before reaching real end of column of type "10"
. That may partially be because there's nothing in your question to indicate what "column type" means to you.
To "To split a very large text file (33GB) into multiple files named by first column." using GNU awk on any UNIX box is this:
awk '{print > $1}' file
That's all. If you're running into problems then it's something outside of your awk command that's causing it, e.g. maybe you're running out of space on your drive or maybe your input file contains some weird control characters.
I don't know what you mean by awk stops near the end of column type
, nor forgets to input newline characters for columns that have "." in them
, nor it just stops before reaching real end of column of type "10"
. That may partially be because there's nothing in your question to indicate what "column type" means to you.
edited 4 hours ago
answered 4 hours ago
Ed MortonEd Morton
1,3454 silver badges9 bronze badges
1,3454 silver badges9 bronze badges
For a very large file with thousands of unique values in the first column, this may causeawk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.
– Kusalananda♦
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
@EdMorton then you should writegawk
instead ofawk
in your answer and mention that. But even withgawk
, that is horribly dangerous, 1st because$1
may contain../..
, etc, and 2nd becausegawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filenameprintf 'foobar baz' | gawk '{print$1>$1}'
.
– mosvy
4 hours ago
@mosvy no, there's no need to writegawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against/
s and NULs and other things when the posted sample input simply has all integers in that field.
– Ed Morton
4 hours ago
|
show 7 more comments
For a very large file with thousands of unique values in the first column, this may causeawk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.
– Kusalananda♦
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
@EdMorton then you should writegawk
instead ofawk
in your answer and mention that. But even withgawk
, that is horribly dangerous, 1st because$1
may contain../..
, etc, and 2nd becausegawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filenameprintf 'foobar baz' | gawk '{print$1>$1}'
.
– mosvy
4 hours ago
@mosvy no, there's no need to writegawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against/
s and NULs and other things when the posted sample input simply has all integers in that field.
– Ed Morton
4 hours ago
For a very large file with thousands of unique values in the first column, this may cause
awk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.– Kusalananda♦
4 hours ago
For a very large file with thousands of unique values in the first column, this may cause
awk
to run out of available file descriptors. This is probably why they try to deal with that by explicitly closing the file when needed.– Kusalananda♦
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
@Kusalananda No, it won't. Gawk handles that internally, closing and re-opening files as necessary. It WOULD be a problem with some other awk but the OP is using gawk 4.1.4.
– Ed Morton
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
You appear to be correct. So it may be an unnecessary precaution on the user's side. However I also read that "gawk’s ability to do this depends upon the facilities of your operating system, so it may not always work. It is therefore both good practice and good portability advice to always use close() on your files when you are done with them." (from here) This may not be an issue on Linux though.
– Kusalananda♦
4 hours ago
@EdMorton then you should write
gawk
instead of awk
in your answer and mention that. But even with gawk
, that is horribly dangerous, 1st because $1
may contain ../..
, etc, and 2nd because gawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'
.– mosvy
4 hours ago
@EdMorton then you should write
gawk
instead of awk
in your answer and mention that. But even with gawk
, that is horribly dangerous, 1st because $1
may contain ../..
, etc, and 2nd because gawk
will accept NUL bytes in its input (which are invisible on terminal), but silently ignore & truncate them when using them in a filename printf 'foobar baz' | gawk '{print$1>$1}'
.– mosvy
4 hours ago
@mosvy no, there's no need to write
gawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /
s and NULs and other things when the posted sample input simply has all integers in that field.– Ed Morton
4 hours ago
@mosvy no, there's no need to write
gawk
when you specifically state "using GNU awk..." and no it's not horribly dangerous - you don't need to protect against /
s and NULs and other things when the posted sample input simply has all integers in that field.– Ed Morton
4 hours ago
|
show 7 more comments
osowiecki is a new contributor. Be nice, and check out our Code of Conduct.
osowiecki is a new contributor. Be nice, and check out our Code of Conduct.
osowiecki is a new contributor. Be nice, and check out our Code of Conduct.
osowiecki is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f538326%2ffile-splitting-based-on-column-name-with-awk-fails%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Fixed the first one. Closing the file does seem to help. All three commands are approved answers in different questions regarding the same problem. That's why I found it odd that it creates 99% proper files and then just cut in the end.
– osowiecki
7 hours ago
Btw, if you use """awk '{print >> $1; close($1)}' testing""" with testing containg """AAEX03026070.1 1676 0 AAEX03026070.1 1677 0""" (tab separated) it will look different in midnight commander (mc) view mode F3 in raw and parsed mode. I belive this might be a bug. The file has normal "n" characters but mc ignores them for some reason. This happens only when first column is not a number and has a ".".
– osowiecki
7 hours ago
idk what "midnight commander" is but if it's some kind of text editor and if by "it" in your statement
it will look different...
you mean the input file - maybe that's an indication that your input file contains undesirable control characters?– Ed Morton
4 hours ago
So it sounds like you have 2 different tools (awk and mc) that are both exhibiting unexpected behavior when operating on your input file. Look to your input file....
– Ed Morton
4 hours ago
As for the strange behavior of mc that pushed me to make this poor thread, look at this : 1. Parsed text postimg.cc/bDZxBMfd 2. Same file in Raw mode postimg.cc/RJfcMHFB 3. file content in od postimg.cc/87ThSsg2 Strange, huh? Lines not starting with "." behave normally like (2).
– osowiecki
3 hours ago