This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: regex library fails git tests


On 07/22/2013 02:12 AM, Corinna Vinschen wrote:

>>> However, please note that this behaviour, while being provided by glibc
>>> and now by Cygwin, is *not* standards-compliant.  In the narrow sense
>>> the characters beyond 0x7f are still invalid ASCII chars, and other
>>> functions working with wchar_t strings won't be as forgiving when using
>>> invalid input.
>>>

> After some sleep, I think I now understand why the glibc devs made
> regcomp to work this way.  This behaviour is backward compatible to non
> locale-aware applications.  In the "C" locale, a char is just some
> arbitrary byte between 0 and 255.  So this pattern always worked before
> in the "C locale, therefore it makes sense that it continues to work,
> even if it won't when using other locales/codesets.

By the way, there is currently a big debate going on in the Austin Group
(the people responsible for POSIX) on whether the "C" locale must be
8-bit clean (the way glibc behaves) or whether it was intended to allow
UTF-8 encoding by default (the way musl libc wants to behave); and
resolution of the debate will require input from the C standards
committee.  There may be some interesting fallout, no matter which
solution is finally reached.  http://austingroupbugs.net/view.php?id=663

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]