Unix file with newlines within quotes The 2019 Stack Overflow Developer Survey Results Are...
Why can Shazam do this?
Carnot-Caratheodory metric
Deadlock Graph and Interpretation, solution to avoid
Why is Grand Jury testimony secret?
How can I fix this gap between bookcases I made?
I looked up a future colleague on LinkedIn before I started a job. I told my colleague about it and he seemed surprised. Should I apologize?
What does "sndry explns" mean in one of the Hitchhiker's guide books?
In microwave frequencies, do you use a circulator when you need a (near) perfect diode?
Springs with some finite mass
A poker game description that does not feel gimmicky
Should I use my personal or workplace e-mail when registering to external websites for work purpose?
Is three citations per paragraph excessive for undergraduate research paper?
What does Linus Torvalds mean when he says that Git "never ever" tracks a file?
Are USB sockets on wall outlets live all the time, even when the switch is off?
How to deal with fear of taking dependencies
Why Did Howard Stark Use All The Vibranium They Had On A Prototype Shield?
I see my dog run
"Riffle" two strings
Why do UK politicians seemingly ignore opinion polls on Brexit?
Should I write numbers in words or as numerals when there are multiple next to each other?
Why could you hear an Amstrad CPC working?
What is the use of option -o in the useradd command?
What tool would a Roman-age civilization have to grind silver and other metals into dust?
Can distinct morphisms between curves induce the same morphism on singular cohomology?
Unix file with newlines within quotes
The 2019 Stack Overflow Developer Survey Results Are InHow to Prefix a column values with an apostrophe ( ' )?Merge CSV files with field delimiters also occuring inside quotesHow to replace single or double space in a text file when between quotesCSV file processing - remove quotes and replace comma delimiter with tabBash-search and replace-Merge columns in CSV fileFind and replace with awkReplace only certain double quotes in data fileHow to merge first two lines of a csv column-by-column?Field separator part of a column - incorrect parsing unixSplitting text file into CSV with multiple delimiters in bash?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I've got a CSV weird file with quotes within quotes and newlines and what not in one single column. Now I need to identify that column with "newlines" as one column and replace newlines with some delimiter.
I have 3 columns, the 3rd column will have some HTML text with all double quotes and each and every special character. But the double quotes are escaped with double quotes, like "<This ""is"" string>"
.
Input:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2
""line2"",line2"
"3","ghi","line3"
Output:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
bash text-processing csv
New contributor
add a comment |
I've got a CSV weird file with quotes within quotes and newlines and what not in one single column. Now I need to identify that column with "newlines" as one column and replace newlines with some delimiter.
I have 3 columns, the 3rd column will have some HTML text with all double quotes and each and every special character. But the double quotes are escaped with double quotes, like "<This ""is"" string>"
.
Input:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2
""line2"",line2"
"3","ghi","line3"
Output:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
bash text-processing csv
New contributor
add a comment |
I've got a CSV weird file with quotes within quotes and newlines and what not in one single column. Now I need to identify that column with "newlines" as one column and replace newlines with some delimiter.
I have 3 columns, the 3rd column will have some HTML text with all double quotes and each and every special character. But the double quotes are escaped with double quotes, like "<This ""is"" string>"
.
Input:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2
""line2"",line2"
"3","ghi","line3"
Output:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
bash text-processing csv
New contributor
I've got a CSV weird file with quotes within quotes and newlines and what not in one single column. Now I need to identify that column with "newlines" as one column and replace newlines with some delimiter.
I have 3 columns, the 3rd column will have some HTML text with all double quotes and each and every special character. But the double quotes are escaped with double quotes, like "<This ""is"" string>"
.
Input:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2
""line2"",line2"
"3","ghi","line3"
Output:
ID, Name, text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
bash text-processing csv
bash text-processing csv
New contributor
New contributor
edited yesterday
fra-san
2,0271721
2,0271721
New contributor
asked yesterday
KumarKumar
111
111
New contributor
New contributor
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
There's no real issue with your file. It has embedded newlines and double quotes. A CSV parser would be able to handle it properly. Escaping double quotes with "
(while double quoting the field) is the proper way to escape embedded double quotes in a CSV file.
To replace the embedded newlines in your CSV file with a @
character, you could do this:
$ csvformat -M '@' file.csv | tr 'n@' '@n'
1,abc,Line 1
2,def,"Line2@""line2"",line2"
3,ghi,line3
This uses csvformat
from the csvkit toolbox. It's a proper CSV parser that is able to reformat CSV files.
The command pipeline above first replaces all newlines that are not embedded with the @
character. Then I use tr
to swap the remaining newlines and the @
characters with each other, ending up with a CSV file whose embedded newlines are @
.
This relies on the fact that the original data in the file contains no @
characters.
If you then want to have spaces instead of a marker of where the newlines originally were, then use tr 'n@' ' n'
instead of the tr
shown above:
$ csvformat -M '@' file.csv | tr 'n@' ' n'
1,abc,Line 1
2,def,"Line2 ""line2"",line2"
3,ghi,line3
Note that this would make it extremely difficult to re-insert the original newlines if there are other spaces in the data (as there is in the third field on the first line).
Would you prefer that csvformat
did not remove all unnecessary double quotes, then use it with -U 1
:
$ csvformat -U 1 -M '@' file.csv | tr 'n@' ' n'
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
add a comment |
You can try with sed :
sed '
:A
2,$ {
/[^"]"$/! {
N
bA
}
s/n//g
}
' infile
Catch on each line from 2 to end if last char is "
If not, get a newline and restart the loop.
At the end of loop, remove each "n".
add a comment |
You can do this with the GNU version of sed
, making use of the extended regex support, as shown:
Command-line:
$ sed -Ee '
1b
/^("[^"]*"[^"]*)*$/!{
N;s/n/ /;s/^/n/;D
}
' input.csv
Results:
ID,Name,Text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
Explanation:
-E
turns on the extended regex mode.
1b
will take the header to stdout as it is.
/^("[^"]*"[^"]*)*$/
will match a line that is fully balanced w.r.t double quotes.- Hence when we negate it we get our unbalanced lines, IOW, we need to seek their closing double quotes in the succeeding line(s).
- We read in the next line and append to pattern space,
N
, and remove the newline. - We repeat this process till the pattern space is balanced.
With POSIX
sed
you would need to change the above somewhat:
$ sed -e '
1b
/^("[^"]*"[^"]*)*$/b
N;s/n/ /;H;s/.*//;x;D
' input.csv
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Kumar is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511276%2funix-file-with-newlines-within-quotes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
There's no real issue with your file. It has embedded newlines and double quotes. A CSV parser would be able to handle it properly. Escaping double quotes with "
(while double quoting the field) is the proper way to escape embedded double quotes in a CSV file.
To replace the embedded newlines in your CSV file with a @
character, you could do this:
$ csvformat -M '@' file.csv | tr 'n@' '@n'
1,abc,Line 1
2,def,"Line2@""line2"",line2"
3,ghi,line3
This uses csvformat
from the csvkit toolbox. It's a proper CSV parser that is able to reformat CSV files.
The command pipeline above first replaces all newlines that are not embedded with the @
character. Then I use tr
to swap the remaining newlines and the @
characters with each other, ending up with a CSV file whose embedded newlines are @
.
This relies on the fact that the original data in the file contains no @
characters.
If you then want to have spaces instead of a marker of where the newlines originally were, then use tr 'n@' ' n'
instead of the tr
shown above:
$ csvformat -M '@' file.csv | tr 'n@' ' n'
1,abc,Line 1
2,def,"Line2 ""line2"",line2"
3,ghi,line3
Note that this would make it extremely difficult to re-insert the original newlines if there are other spaces in the data (as there is in the third field on the first line).
Would you prefer that csvformat
did not remove all unnecessary double quotes, then use it with -U 1
:
$ csvformat -U 1 -M '@' file.csv | tr 'n@' ' n'
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
add a comment |
There's no real issue with your file. It has embedded newlines and double quotes. A CSV parser would be able to handle it properly. Escaping double quotes with "
(while double quoting the field) is the proper way to escape embedded double quotes in a CSV file.
To replace the embedded newlines in your CSV file with a @
character, you could do this:
$ csvformat -M '@' file.csv | tr 'n@' '@n'
1,abc,Line 1
2,def,"Line2@""line2"",line2"
3,ghi,line3
This uses csvformat
from the csvkit toolbox. It's a proper CSV parser that is able to reformat CSV files.
The command pipeline above first replaces all newlines that are not embedded with the @
character. Then I use tr
to swap the remaining newlines and the @
characters with each other, ending up with a CSV file whose embedded newlines are @
.
This relies on the fact that the original data in the file contains no @
characters.
If you then want to have spaces instead of a marker of where the newlines originally were, then use tr 'n@' ' n'
instead of the tr
shown above:
$ csvformat -M '@' file.csv | tr 'n@' ' n'
1,abc,Line 1
2,def,"Line2 ""line2"",line2"
3,ghi,line3
Note that this would make it extremely difficult to re-insert the original newlines if there are other spaces in the data (as there is in the third field on the first line).
Would you prefer that csvformat
did not remove all unnecessary double quotes, then use it with -U 1
:
$ csvformat -U 1 -M '@' file.csv | tr 'n@' ' n'
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
add a comment |
There's no real issue with your file. It has embedded newlines and double quotes. A CSV parser would be able to handle it properly. Escaping double quotes with "
(while double quoting the field) is the proper way to escape embedded double quotes in a CSV file.
To replace the embedded newlines in your CSV file with a @
character, you could do this:
$ csvformat -M '@' file.csv | tr 'n@' '@n'
1,abc,Line 1
2,def,"Line2@""line2"",line2"
3,ghi,line3
This uses csvformat
from the csvkit toolbox. It's a proper CSV parser that is able to reformat CSV files.
The command pipeline above first replaces all newlines that are not embedded with the @
character. Then I use tr
to swap the remaining newlines and the @
characters with each other, ending up with a CSV file whose embedded newlines are @
.
This relies on the fact that the original data in the file contains no @
characters.
If you then want to have spaces instead of a marker of where the newlines originally were, then use tr 'n@' ' n'
instead of the tr
shown above:
$ csvformat -M '@' file.csv | tr 'n@' ' n'
1,abc,Line 1
2,def,"Line2 ""line2"",line2"
3,ghi,line3
Note that this would make it extremely difficult to re-insert the original newlines if there are other spaces in the data (as there is in the third field on the first line).
Would you prefer that csvformat
did not remove all unnecessary double quotes, then use it with -U 1
:
$ csvformat -U 1 -M '@' file.csv | tr 'n@' ' n'
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
There's no real issue with your file. It has embedded newlines and double quotes. A CSV parser would be able to handle it properly. Escaping double quotes with "
(while double quoting the field) is the proper way to escape embedded double quotes in a CSV file.
To replace the embedded newlines in your CSV file with a @
character, you could do this:
$ csvformat -M '@' file.csv | tr 'n@' '@n'
1,abc,Line 1
2,def,"Line2@""line2"",line2"
3,ghi,line3
This uses csvformat
from the csvkit toolbox. It's a proper CSV parser that is able to reformat CSV files.
The command pipeline above first replaces all newlines that are not embedded with the @
character. Then I use tr
to swap the remaining newlines and the @
characters with each other, ending up with a CSV file whose embedded newlines are @
.
This relies on the fact that the original data in the file contains no @
characters.
If you then want to have spaces instead of a marker of where the newlines originally were, then use tr 'n@' ' n'
instead of the tr
shown above:
$ csvformat -M '@' file.csv | tr 'n@' ' n'
1,abc,Line 1
2,def,"Line2 ""line2"",line2"
3,ghi,line3
Note that this would make it extremely difficult to re-insert the original newlines if there are other spaces in the data (as there is in the third field on the first line).
Would you prefer that csvformat
did not remove all unnecessary double quotes, then use it with -U 1
:
$ csvformat -U 1 -M '@' file.csv | tr 'n@' ' n'
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
edited yesterday
answered yesterday
Kusalananda♦Kusalananda
140k17261436
140k17261436
add a comment |
add a comment |
You can try with sed :
sed '
:A
2,$ {
/[^"]"$/! {
N
bA
}
s/n//g
}
' infile
Catch on each line from 2 to end if last char is "
If not, get a newline and restart the loop.
At the end of loop, remove each "n".
add a comment |
You can try with sed :
sed '
:A
2,$ {
/[^"]"$/! {
N
bA
}
s/n//g
}
' infile
Catch on each line from 2 to end if last char is "
If not, get a newline and restart the loop.
At the end of loop, remove each "n".
add a comment |
You can try with sed :
sed '
:A
2,$ {
/[^"]"$/! {
N
bA
}
s/n//g
}
' infile
Catch on each line from 2 to end if last char is "
If not, get a newline and restart the loop.
At the end of loop, remove each "n".
You can try with sed :
sed '
:A
2,$ {
/[^"]"$/! {
N
bA
}
s/n//g
}
' infile
Catch on each line from 2 to end if last char is "
If not, get a newline and restart the loop.
At the end of loop, remove each "n".
edited yesterday
answered yesterday
ctac_ctac_
1,4621211
1,4621211
add a comment |
add a comment |
You can do this with the GNU version of sed
, making use of the extended regex support, as shown:
Command-line:
$ sed -Ee '
1b
/^("[^"]*"[^"]*)*$/!{
N;s/n/ /;s/^/n/;D
}
' input.csv
Results:
ID,Name,Text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
Explanation:
-E
turns on the extended regex mode.
1b
will take the header to stdout as it is.
/^("[^"]*"[^"]*)*$/
will match a line that is fully balanced w.r.t double quotes.- Hence when we negate it we get our unbalanced lines, IOW, we need to seek their closing double quotes in the succeeding line(s).
- We read in the next line and append to pattern space,
N
, and remove the newline. - We repeat this process till the pattern space is balanced.
With POSIX
sed
you would need to change the above somewhat:
$ sed -e '
1b
/^("[^"]*"[^"]*)*$/b
N;s/n/ /;H;s/.*//;x;D
' input.csv
add a comment |
You can do this with the GNU version of sed
, making use of the extended regex support, as shown:
Command-line:
$ sed -Ee '
1b
/^("[^"]*"[^"]*)*$/!{
N;s/n/ /;s/^/n/;D
}
' input.csv
Results:
ID,Name,Text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
Explanation:
-E
turns on the extended regex mode.
1b
will take the header to stdout as it is.
/^("[^"]*"[^"]*)*$/
will match a line that is fully balanced w.r.t double quotes.- Hence when we negate it we get our unbalanced lines, IOW, we need to seek their closing double quotes in the succeeding line(s).
- We read in the next line and append to pattern space,
N
, and remove the newline. - We repeat this process till the pattern space is balanced.
With POSIX
sed
you would need to change the above somewhat:
$ sed -e '
1b
/^("[^"]*"[^"]*)*$/b
N;s/n/ /;H;s/.*//;x;D
' input.csv
add a comment |
You can do this with the GNU version of sed
, making use of the extended regex support, as shown:
Command-line:
$ sed -Ee '
1b
/^("[^"]*"[^"]*)*$/!{
N;s/n/ /;s/^/n/;D
}
' input.csv
Results:
ID,Name,Text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
Explanation:
-E
turns on the extended regex mode.
1b
will take the header to stdout as it is.
/^("[^"]*"[^"]*)*$/
will match a line that is fully balanced w.r.t double quotes.- Hence when we negate it we get our unbalanced lines, IOW, we need to seek their closing double quotes in the succeeding line(s).
- We read in the next line and append to pattern space,
N
, and remove the newline. - We repeat this process till the pattern space is balanced.
With POSIX
sed
you would need to change the above somewhat:
$ sed -e '
1b
/^("[^"]*"[^"]*)*$/b
N;s/n/ /;H;s/.*//;x;D
' input.csv
You can do this with the GNU version of sed
, making use of the extended regex support, as shown:
Command-line:
$ sed -Ee '
1b
/^("[^"]*"[^"]*)*$/!{
N;s/n/ /;s/^/n/;D
}
' input.csv
Results:
ID,Name,Text
"1","abc","Line 1"
"2","def","Line2 ""line2"",line2"
"3","ghi","line3"
Explanation:
-E
turns on the extended regex mode.
1b
will take the header to stdout as it is.
/^("[^"]*"[^"]*)*$/
will match a line that is fully balanced w.r.t double quotes.- Hence when we negate it we get our unbalanced lines, IOW, we need to seek their closing double quotes in the succeeding line(s).
- We read in the next line and append to pattern space,
N
, and remove the newline. - We repeat this process till the pattern space is balanced.
With POSIX
sed
you would need to change the above somewhat:
$ sed -e '
1b
/^("[^"]*"[^"]*)*$/b
N;s/n/ /;H;s/.*//;x;D
' input.csv
edited 16 hours ago
answered 16 hours ago
Rakesh SharmaRakesh Sharma
262
262
add a comment |
add a comment |
Kumar is a new contributor. Be nice, and check out our Code of Conduct.
Kumar is a new contributor. Be nice, and check out our Code of Conduct.
Kumar is a new contributor. Be nice, and check out our Code of Conduct.
Kumar is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511276%2funix-file-with-newlines-within-quotes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown