This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: Grepping Unicode files?
- From: Vince Rice <vrice at solidrocksystems dot com>
- To: cygwin at cygwin dot com
- Date: Thu, 14 May 2015 12:14:20 -0500
- Subject: Re: Grepping Unicode files?
- Authentication-results: sourceware.org; auth=none
- References: <3C280897-291A-4A8C-8C3F-46D1D9BEFCFE at solidrocksystems dot com> <746170827 dot 20150514185648 at yandex dot ru> <313678DD-A000-4F82-A015-836B882C09FC at solidrocksystems dot com> <5554D09B dot 3030209 at redhat dot com>
> On May 14, 2015, at 11:43 AM, Eric Blake <eblake@redhat.com> wrote:
>
> On 05/14/2015 10:32 AM, Vince Rice wrote:
>
> â
>>
>> Now, pardon my continued ignorance, but which of those variables needs to be set to UTF16 in order for grep to work? And I assume it (they?) should be set to en_US.UTF-16?
>
> None. UTF16 is not a valid locale. It is a valid encoding (wide
> character), but locales must operate on multi-byte sequences, not wide
> characters. So you HAVE to convert from wide character to multi-byte
> before you can do anything that requires a locale to work correctly.
Oh my, the rabbit-hole gets deeper. I donât know the difference between wide character and multi-byte. A little searching appears to indicate that Unicode is a type of wide-character, while multi-byte is â well, I still donât know what multi-byte is. :) But, weâre definitely out in the weeds of non-cygwinness here, and my file is UTF16, so I can learn what multi-byte is and the difference later.
Bottom-lineâ
>>
>> Thanks to everyone for your help. I think youâve all confirmed this isnât cygwin-specific, but I couldnât find anything even searching generically (âgrep unicodeâ and now âgrep utf16â). I did finally find an external reference to iconv, but if grep is supposed to be handle this natively, I havenât been able to find much on how to do it.
>
> grep cannot handle UTF16 natively. iconv exists to do encoding
> transformations, so that the rest of the system can live in multi-byte
> world instead of worrying about wide-character encodings.
â grep canât handle unicode files. Good to know. iconv it is.
Thanks again!
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple