This is the mail archive of the
cygwin
mailing list for the Cygwin project.
RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
- From: "Lavrentiev, Anton (NIH/NLM/NCBI) [C]" <lavr at ncbi dot nlm dot nih dot gov>
- To: "cygwin at cygwin dot com" <cygwin at cygwin dot com>
- Date: Tue, 25 Jun 2013 15:38:19 +0000
- Subject: RE: [BUG REPORT]sed -e 's/[B-D]/_/g' replaces unexpected characters
- References: <CA+nJC97He=j-O2FZ-Y2jJhYXEJn2o2EfC1wO39+2bZ=nj1f-zA at mail dot gmail dot com> <20130625152356 dot GD11958 at calimero dot vinschen dot de>
> Your locale is zh_CN.UTF-8. What you're expecting is only guaranteed
> in the C locale:
I'm not quite sure it applies here. I'm using US English Windows 7.
LANG = 'en_US.UTF-8'
I get the same result:
$ echo abcdeABCDE | sed -e 's/[B-D]/_/g'
ab__eA___E
BUT:
$ echo abcdeABCDE | LANG=C sed 's/[B-D]/_/g'
abcdeA___E
This is very weird, indeed.
OTOH, in Linux I have the same LANG setup, yet it does work
correctly:
> echo $LANG
en_US.UTF-8
> echo abcdeABCDE | sed -e 's/[B-D]/_/g'
abcdeA___E
I believe that an en_US UTF-8 string representation for
"abcdeABCDE" is not any different from ASCII.
Anton Lavrentiev
Contractor NIH/NLM/NCBI