default charset for implicit locale specification

Thomas Wolff towo@towo.net
Wed Jan 20 08:45:00 GMT 2010


Corinna Vinschen schrieb:
> Hi,
>
> right now, if a user only specifies a language but not a charset, for
> instance LANG="es_MX", then Cygwin defaults to the current ANSI codepage
> via the GetACP() function call.
>
> While that matches the current system settings, it doesn't necessarily
> result in using a codepage which matches the current language.
>
> For instance, on a US system, this results in using CP1252, even
> for LANG="zh_TW".  CP1252 very certainly has not the right characters
> for the Chinese language.  The right codepage would be 950 == Big5
> in this case.
>
> There *is* a way in Windows, which isn't even complicated, to fetch
> the default ANSI codepage for a given ISO compatible language code.
> Locally I'm running such a Cygwin version right now, which asks the
> system for the matching ANSI codepage and uses that, rather than the
> system default ANSI codepage.  It works quite nicely.
>
> So, here's the question:
>
> What do you think is the better default for an arbitrary locale
> without explicit charset?
>
> [ ] Stick to the default system ANSI codepage?
> [ ] Default to the matching ANSI codepage for the given language?
>
> And why?
>
>
> I think that the second option is the better one since that's
> what the user expects when setting the locale to some language.
> But I'd like to hear arguments.
>   
I agree absolutely with the second option, it's basically what I had 
already suggested:
http://cygwin.com/ml/cygwin/2009-09/msg00824.html
It's what all other systems do, too.
I wasn't aware, though, that Windows provides the mapping so you don't 
have to maintain your own table,
sounds perfect (assuming that the Windows mapping doesn't include some 
weird non-standard things... who knows).

Thomas



More information about the Cygwin-developers mailing list