Is there an alternative to sed that supports unicode?How can I convert Persian numerals in UTF-8 to European...
How do lasers measure short distances (<1cm) when electronics are too slow for time-of-flight to work?
How to work with ElasticSearch in Mathematica?
Can a Creature at 0 HP Take Damage?
Difference between $HOME and ~
Are there any privately owned large commercial airports?
Is It normal to keep log file larger than data file?
How to find an internship in OR/Optimization?
How to find out which object is taking space?
Why did a young George Washington sign a document admitting to assassinating a French military officer?
Mishna Berura Ruling on Tying Tekhelet
Can I perform Umrah while on a Saudi Arabian visit e-visa
Chances of successful landing on the moon
How could "aggressor" pilots fly foreign aircraft without speaking the language?
Transiting through Switzerland by coach with lots of cash
Modern warfare theory in a medieval setting
Why does unique_ptr<Derived> implicitly cast to unique_ptr<Base>?
I am confused with the word order when putting a sentence into passé composé with reflexive verbs
Should a grammatical article be a part of a web link anchor
Why didn't Kes send Voyager home?
How to make "acts of patience" exciting?
Does the Creighton Method of Natural Family Planning have a failure rate of 3.2% or less?
What is the good path to become a Judo teacher?
Proving roots of a function cannot all be real
Can I color text by using an image, so that the color isn't flat?
Is there an alternative to sed that supports unicode?
How can I convert Persian numerals in UTF-8 to European numerals in ASCII?Why the inconsistency with using cat vs. echo piped to this sed command?sed command to print all line starting from and end to specific words present in a fileStrange ascii from hexdump of text fileUse sed to replace a part of a line with a variablePrint one byte signed number with hexdumpHow to get Hexdump output in same format as hexedit?Sed to replace lowercase and capital stringsecho line with var that contains few linesReplace AWORD or BWORD with CWORD in sed
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{
margin-bottom:0;
}
For example:
sed 's/u0091//g' file1
Right now, I have to do hexdump
to get hex number and put into sed
as follows:
$ echo -ne 'u9991' | hexdump -C
00000000 e9 a6 91 |...|
00000003
And then:
$ sed 's/xe9xa6x91//g' file1
sed unicode hexdump
add a comment
|
For example:
sed 's/u0091//g' file1
Right now, I have to do hexdump
to get hex number and put into sed
as follows:
$ echo -ne 'u9991' | hexdump -C
00000000 e9 a6 91 |...|
00000003
And then:
$ sed 's/xe9xa6x91//g' file1
sed unicode hexdump
add a comment
|
For example:
sed 's/u0091//g' file1
Right now, I have to do hexdump
to get hex number and put into sed
as follows:
$ echo -ne 'u9991' | hexdump -C
00000000 e9 a6 91 |...|
00000003
And then:
$ sed 's/xe9xa6x91//g' file1
sed unicode hexdump
For example:
sed 's/u0091//g' file1
Right now, I have to do hexdump
to get hex number and put into sed
as follows:
$ echo -ne 'u9991' | hexdump -C
00000000 e9 a6 91 |...|
00000003
And then:
$ sed 's/xe9xa6x91//g' file1
sed unicode hexdump
sed unicode hexdump
edited Apr 17 '15 at 18:03
chaos
37.7k9 gold badges85 silver badges123 bronze badges
37.7k9 gold badges85 silver badges123 bronze badges
asked Apr 17 '15 at 8:38
A-letubbyA-letubby
3092 gold badges4 silver badges6 bronze badges
3092 gold badges4 silver badges6 bronze badges
add a comment
|
add a comment
|
6 Answers
6
active
oldest
votes
Just use that syntax:
sed 's/馑//g' file1
Or in the escaped form:
sed "s/$(echo -ne 'u9991')//g" file1
(Note that older versions of Bash and some shells do not understand echo -e 'u9991'
, so check first.)
1
Does sed count 馑 as one character or 3? That is, doesecho 馑 | sed s/...//
print anything?
– immibis
Apr 17 '15 at 11:22
@immibis Sincesed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see:echo -ne "馑" | wc -m
gives1
. If you count the bytes (wc -c
) it would return3
. Did I understand your question correctly?
– chaos
Apr 17 '15 at 11:28
I meant: does.
mean "one character" or "one byte"?
– immibis
Apr 17 '15 at 11:30
@immibis I matches one character henceecho 馑 | sed s/...//
gives me馑
(nothing is replaced)
– chaos
Apr 17 '15 at 11:33
4
@chaos: It works underen_US.UTF-8
, but doesn't underC
.
– choroba
Apr 17 '15 at 12:28
|
show 2 more comments
Perl can do that:
echo 汉典“馑”字的基本解释 | perl -CS -pe 's/N{U+9991}/Jin/g'
-CS
turns on UTF-8 for standard input, output and error.
7
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
add a comment
|
A number of versions of sed
support Unicode:
Heirloom sed, which is based on "original Unix material".
GNU sed, which is its own codebase.
Plan 9 sed, which has been ported to Unix-like operating systems.
I couldn't find information on BSD sed, which I thought was strange, but I think the odds are good that it supports Unicode too. Unfortunately, there is no standard way to tell sed
which encoding to use, so each one does this in its own ways.
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
10
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
add a comment
|
This works for me:
$ vim -nEs +'%s/%u9991//g' +wq file1
It’s a drop more verbose than I’d like; here’s a full explanation:
-n
disable vim swap file
-E
Ex improved mode
-s
silent mode
+'%s/%u9991//g'
execute the substitution command
+wq
save and exit
I suppose this modifiesfile1
in-place, is that correct?
– gerrit
Jan 10 at 10:32
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
add a comment
|
Works for me with GNU sed (version 4.2.1):
$ echo -ne $'u9991' | sed 's/xe9xa6x91//g' | hexdump -C
$ echo -ne $'u9991' | hexdump -C
00000000 e9 a6 91
(As another replacement for sed
you could also use GNU awk
; but it don't seem necessary.)
add a comment
|
With recent versions of BASH, just omit the quotes around the sed expression and you can use BASH's escaped strings. Spaces within the sed expression or parts of the sed expression that might be interpreted by BASH as wildcards can be individually quoted.
$ echo "饥馑荐臻" | sed s/$'u9991'//g
饥荐臻
add a comment
|
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f196780%2fis-there-an-alternative-to-sed-that-supports-unicode%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Just use that syntax:
sed 's/馑//g' file1
Or in the escaped form:
sed "s/$(echo -ne 'u9991')//g" file1
(Note that older versions of Bash and some shells do not understand echo -e 'u9991'
, so check first.)
1
Does sed count 馑 as one character or 3? That is, doesecho 馑 | sed s/...//
print anything?
– immibis
Apr 17 '15 at 11:22
@immibis Sincesed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see:echo -ne "馑" | wc -m
gives1
. If you count the bytes (wc -c
) it would return3
. Did I understand your question correctly?
– chaos
Apr 17 '15 at 11:28
I meant: does.
mean "one character" or "one byte"?
– immibis
Apr 17 '15 at 11:30
@immibis I matches one character henceecho 馑 | sed s/...//
gives me馑
(nothing is replaced)
– chaos
Apr 17 '15 at 11:33
4
@chaos: It works underen_US.UTF-8
, but doesn't underC
.
– choroba
Apr 17 '15 at 12:28
|
show 2 more comments
Just use that syntax:
sed 's/馑//g' file1
Or in the escaped form:
sed "s/$(echo -ne 'u9991')//g" file1
(Note that older versions of Bash and some shells do not understand echo -e 'u9991'
, so check first.)
1
Does sed count 馑 as one character or 3? That is, doesecho 馑 | sed s/...//
print anything?
– immibis
Apr 17 '15 at 11:22
@immibis Sincesed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see:echo -ne "馑" | wc -m
gives1
. If you count the bytes (wc -c
) it would return3
. Did I understand your question correctly?
– chaos
Apr 17 '15 at 11:28
I meant: does.
mean "one character" or "one byte"?
– immibis
Apr 17 '15 at 11:30
@immibis I matches one character henceecho 馑 | sed s/...//
gives me馑
(nothing is replaced)
– chaos
Apr 17 '15 at 11:33
4
@chaos: It works underen_US.UTF-8
, but doesn't underC
.
– choroba
Apr 17 '15 at 12:28
|
show 2 more comments
Just use that syntax:
sed 's/馑//g' file1
Or in the escaped form:
sed "s/$(echo -ne 'u9991')//g" file1
(Note that older versions of Bash and some shells do not understand echo -e 'u9991'
, so check first.)
Just use that syntax:
sed 's/馑//g' file1
Or in the escaped form:
sed "s/$(echo -ne 'u9991')//g" file1
(Note that older versions of Bash and some shells do not understand echo -e 'u9991'
, so check first.)
edited Oct 17 '16 at 16:52
Flimm
1,6204 gold badges20 silver badges28 bronze badges
1,6204 gold badges20 silver badges28 bronze badges
answered Apr 17 '15 at 8:46
chaoschaos
37.7k9 gold badges85 silver badges123 bronze badges
37.7k9 gold badges85 silver badges123 bronze badges
1
Does sed count 馑 as one character or 3? That is, doesecho 馑 | sed s/...//
print anything?
– immibis
Apr 17 '15 at 11:22
@immibis Sincesed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see:echo -ne "馑" | wc -m
gives1
. If you count the bytes (wc -c
) it would return3
. Did I understand your question correctly?
– chaos
Apr 17 '15 at 11:28
I meant: does.
mean "one character" or "one byte"?
– immibis
Apr 17 '15 at 11:30
@immibis I matches one character henceecho 馑 | sed s/...//
gives me馑
(nothing is replaced)
– chaos
Apr 17 '15 at 11:33
4
@chaos: It works underen_US.UTF-8
, but doesn't underC
.
– choroba
Apr 17 '15 at 12:28
|
show 2 more comments
1
Does sed count 馑 as one character or 3? That is, doesecho 馑 | sed s/...//
print anything?
– immibis
Apr 17 '15 at 11:22
@immibis Sincesed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see:echo -ne "馑" | wc -m
gives1
. If you count the bytes (wc -c
) it would return3
. Did I understand your question correctly?
– chaos
Apr 17 '15 at 11:28
I meant: does.
mean "one character" or "one byte"?
– immibis
Apr 17 '15 at 11:30
@immibis I matches one character henceecho 馑 | sed s/...//
gives me馑
(nothing is replaced)
– chaos
Apr 17 '15 at 11:33
4
@chaos: It works underen_US.UTF-8
, but doesn't underC
.
– choroba
Apr 17 '15 at 12:28
1
1
Does sed count 馑 as one character or 3? That is, does
echo 馑 | sed s/...//
print anything?– immibis
Apr 17 '15 at 11:22
Does sed count 馑 as one character or 3? That is, does
echo 馑 | sed s/...//
print anything?– immibis
Apr 17 '15 at 11:22
@immibis Since
sed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see: echo -ne "馑" | wc -m
gives 1
. If you count the bytes (wc -c
) it would return 3
. Did I understand your question correctly?– chaos
Apr 17 '15 at 11:28
@immibis Since
sed
has the g modifier it replaces all occurence also when they follow each other. Also sed should count it as one character, see: echo -ne "馑" | wc -m
gives 1
. If you count the bytes (wc -c
) it would return 3
. Did I understand your question correctly?– chaos
Apr 17 '15 at 11:28
I meant: does
.
mean "one character" or "one byte"?– immibis
Apr 17 '15 at 11:30
I meant: does
.
mean "one character" or "one byte"?– immibis
Apr 17 '15 at 11:30
@immibis I matches one character hence
echo 馑 | sed s/...//
gives me 馑
(nothing is replaced)– chaos
Apr 17 '15 at 11:33
@immibis I matches one character hence
echo 馑 | sed s/...//
gives me 馑
(nothing is replaced)– chaos
Apr 17 '15 at 11:33
4
4
@chaos: It works under
en_US.UTF-8
, but doesn't under C
.– choroba
Apr 17 '15 at 12:28
@chaos: It works under
en_US.UTF-8
, but doesn't under C
.– choroba
Apr 17 '15 at 12:28
|
show 2 more comments
Perl can do that:
echo 汉典“馑”字的基本解释 | perl -CS -pe 's/N{U+9991}/Jin/g'
-CS
turns on UTF-8 for standard input, output and error.
7
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
add a comment
|
Perl can do that:
echo 汉典“馑”字的基本解释 | perl -CS -pe 's/N{U+9991}/Jin/g'
-CS
turns on UTF-8 for standard input, output and error.
7
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
add a comment
|
Perl can do that:
echo 汉典“馑”字的基本解释 | perl -CS -pe 's/N{U+9991}/Jin/g'
-CS
turns on UTF-8 for standard input, output and error.
Perl can do that:
echo 汉典“馑”字的基本解释 | perl -CS -pe 's/N{U+9991}/Jin/g'
-CS
turns on UTF-8 for standard input, output and error.
answered Apr 17 '15 at 8:50
chorobachoroba
29.6k4 gold badges57 silver badges81 bronze badges
29.6k4 gold badges57 silver badges81 bronze badges
7
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
add a comment
|
7
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
7
7
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
Perl can do almost anything.....
– wobbily_col
Apr 17 '15 at 10:49
add a comment
|
A number of versions of sed
support Unicode:
Heirloom sed, which is based on "original Unix material".
GNU sed, which is its own codebase.
Plan 9 sed, which has been ported to Unix-like operating systems.
I couldn't find information on BSD sed, which I thought was strange, but I think the odds are good that it supports Unicode too. Unfortunately, there is no standard way to tell sed
which encoding to use, so each one does this in its own ways.
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
10
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
add a comment
|
A number of versions of sed
support Unicode:
Heirloom sed, which is based on "original Unix material".
GNU sed, which is its own codebase.
Plan 9 sed, which has been ported to Unix-like operating systems.
I couldn't find information on BSD sed, which I thought was strange, but I think the odds are good that it supports Unicode too. Unfortunately, there is no standard way to tell sed
which encoding to use, so each one does this in its own ways.
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
10
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
add a comment
|
A number of versions of sed
support Unicode:
Heirloom sed, which is based on "original Unix material".
GNU sed, which is its own codebase.
Plan 9 sed, which has been ported to Unix-like operating systems.
I couldn't find information on BSD sed, which I thought was strange, but I think the odds are good that it supports Unicode too. Unfortunately, there is no standard way to tell sed
which encoding to use, so each one does this in its own ways.
A number of versions of sed
support Unicode:
Heirloom sed, which is based on "original Unix material".
GNU sed, which is its own codebase.
Plan 9 sed, which has been ported to Unix-like operating systems.
I couldn't find information on BSD sed, which I thought was strange, but I think the odds are good that it supports Unicode too. Unfortunately, there is no standard way to tell sed
which encoding to use, so each one does this in its own ways.
answered Apr 17 '15 at 12:54
The SpooniestThe Spooniest
2811 silver badge1 bronze badge
2811 silver badge1 bronze badge
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
10
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
add a comment
|
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
10
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
Do they support UTF-16 with and without BOM ?
– Bon Ami
Apr 17 '15 at 17:12
10
10
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
UTF-16 is pretty unusable in Unix-based OSes. It's also an abomination that should have never seen the light of day.
– Brian Bi
Apr 17 '15 at 19:11
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
Whether or not they support UTF-16 depends on the implementation, and I'm afraid I don't have that data. I doubt that Plan 9 sed does (the original OS is UTF-8 everywhere), but I can't be sure, and even if it doesn't, the others might.
– The Spooniest
Apr 17 '15 at 19:30
add a comment
|
This works for me:
$ vim -nEs +'%s/%u9991//g' +wq file1
It’s a drop more verbose than I’d like; here’s a full explanation:
-n
disable vim swap file
-E
Ex improved mode
-s
silent mode
+'%s/%u9991//g'
execute the substitution command
+wq
save and exit
I suppose this modifiesfile1
in-place, is that correct?
– gerrit
Jan 10 at 10:32
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
add a comment
|
This works for me:
$ vim -nEs +'%s/%u9991//g' +wq file1
It’s a drop more verbose than I’d like; here’s a full explanation:
-n
disable vim swap file
-E
Ex improved mode
-s
silent mode
+'%s/%u9991//g'
execute the substitution command
+wq
save and exit
I suppose this modifiesfile1
in-place, is that correct?
– gerrit
Jan 10 at 10:32
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
add a comment
|
This works for me:
$ vim -nEs +'%s/%u9991//g' +wq file1
It’s a drop more verbose than I’d like; here’s a full explanation:
-n
disable vim swap file
-E
Ex improved mode
-s
silent mode
+'%s/%u9991//g'
execute the substitution command
+wq
save and exit
This works for me:
$ vim -nEs +'%s/%u9991//g' +wq file1
It’s a drop more verbose than I’d like; here’s a full explanation:
-n
disable vim swap file
-E
Ex improved mode
-s
silent mode
+'%s/%u9991//g'
execute the substitution command
+wq
save and exit
answered Apr 17 '18 at 18:21
Aryeh Leib TaurogAryeh Leib Taurog
4234 silver badges8 bronze badges
4234 silver badges8 bronze badges
I suppose this modifiesfile1
in-place, is that correct?
– gerrit
Jan 10 at 10:32
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
add a comment
|
I suppose this modifiesfile1
in-place, is that correct?
– gerrit
Jan 10 at 10:32
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
I suppose this modifies
file1
in-place, is that correct?– gerrit
Jan 10 at 10:32
I suppose this modifies
file1
in-place, is that correct?– gerrit
Jan 10 at 10:32
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
@gerrit that’s correct, and thanks for pointing it out.
– Aryeh Leib Taurog
Jan 10 at 19:21
add a comment
|
Works for me with GNU sed (version 4.2.1):
$ echo -ne $'u9991' | sed 's/xe9xa6x91//g' | hexdump -C
$ echo -ne $'u9991' | hexdump -C
00000000 e9 a6 91
(As another replacement for sed
you could also use GNU awk
; but it don't seem necessary.)
add a comment
|
Works for me with GNU sed (version 4.2.1):
$ echo -ne $'u9991' | sed 's/xe9xa6x91//g' | hexdump -C
$ echo -ne $'u9991' | hexdump -C
00000000 e9 a6 91
(As another replacement for sed
you could also use GNU awk
; but it don't seem necessary.)
add a comment
|
Works for me with GNU sed (version 4.2.1):
$ echo -ne $'u9991' | sed 's/xe9xa6x91//g' | hexdump -C
$ echo -ne $'u9991' | hexdump -C
00000000 e9 a6 91
(As another replacement for sed
you could also use GNU awk
; but it don't seem necessary.)
Works for me with GNU sed (version 4.2.1):
$ echo -ne $'u9991' | sed 's/xe9xa6x91//g' | hexdump -C
$ echo -ne $'u9991' | hexdump -C
00000000 e9 a6 91
(As another replacement for sed
you could also use GNU awk
; but it don't seem necessary.)
answered Apr 17 '15 at 10:16
JanisJanis
10.7k2 gold badges17 silver badges39 bronze badges
10.7k2 gold badges17 silver badges39 bronze badges
add a comment
|
add a comment
|
With recent versions of BASH, just omit the quotes around the sed expression and you can use BASH's escaped strings. Spaces within the sed expression or parts of the sed expression that might be interpreted by BASH as wildcards can be individually quoted.
$ echo "饥馑荐臻" | sed s/$'u9991'//g
饥荐臻
add a comment
|
With recent versions of BASH, just omit the quotes around the sed expression and you can use BASH's escaped strings. Spaces within the sed expression or parts of the sed expression that might be interpreted by BASH as wildcards can be individually quoted.
$ echo "饥馑荐臻" | sed s/$'u9991'//g
饥荐臻
add a comment
|
With recent versions of BASH, just omit the quotes around the sed expression and you can use BASH's escaped strings. Spaces within the sed expression or parts of the sed expression that might be interpreted by BASH as wildcards can be individually quoted.
$ echo "饥馑荐臻" | sed s/$'u9991'//g
饥荐臻
With recent versions of BASH, just omit the quotes around the sed expression and you can use BASH's escaped strings. Spaces within the sed expression or parts of the sed expression that might be interpreted by BASH as wildcards can be individually quoted.
$ echo "饥馑荐臻" | sed s/$'u9991'//g
饥荐臻
answered 36 mins ago
Dave RoveDave Rove
2901 gold badge2 silver badges7 bronze badges
2901 gold badge2 silver badges7 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f196780%2fis-there-an-alternative-to-sed-that-supports-unicode%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown