CYGWIN=codepage? Or LC_CTYPE=foo?
Corinna Vinschen
corinna-cygwin@cygwin.com
Sun Apr 6 11:14:00 GMT 2008
On Apr 6 15:59, Kazuhiro Fujieda wrote:
> >>> On Thu, 03 Apr 2008 17:54:48 +0200
> >>> Corinna Vinschen said:
>
> > That means, in theory there's no reason anymore to keep the
> > CYGWIN=codepage setting in the environment. We could use the LC_CTYPE
> > setting, just as on other systems. Right now, we need the LC_CTYPE
> > set to "C-UTF-8" anyway when using the codepage:utf8 setting, otherwise
> > the wcstombs and mbstowcs conversions in newlib will be broken.
> >
> > But there's a problem. The newlib conversion functions don't know
> > anything about Windows codepages, and the Windows conversion functions
> > used in the Cygwin functions sys_wcstombs and sys_mbstowcs don't know
> > anything about LC_CTYPE.
>
> The LC_CTYPE is defined to control the character handling of not
> system calls but C library functions by the specification. I
> believe Cygwin DLL should use sys_wcstombs and sys_mbstowcs with
> CYGWIN=codepage, and not depend on userland functions.
Isn't that somewhat error prone? Right now, if you define codepage:utf8
and don't define LC_CTYPE='C-UTF-8', you will probably still have
working file names most of the time, but you get a screwed up console
output because the strings sent by the application are incorrectly
evaluated by the console code. That's one reason I hoped that we don't
need two places to define language/codepage stuff.
Another is that Cygwin is not using any function which really requires a
codepage. The codepage is needed for application calling Windows ANSI
functions. But Cygwin doesn't call these functions, so the focus of
language and character set support has moved from the Cygwin->OS
interface to the application->Cygwin interface.
So, given my vague understanding of this language stuff, the conversion
from wide char to multibyte string *can* be based on the notion the
applications have of the language/codepage. Which sounds to me as if
using LANG/LC_CTYPE would also make sense for Cygwin's internal
conversions.
Does that make sense? I don't know. No. 5: "More input, please!"
> Cygwin DLL, however, has both of system calls and userland
> functions. Controlling them by LC_CTYPE at the same time is not
> bad idea.
>
> To achieve this, it is necessary to make functions related to
> character handling know about the mapping between locale names
> and Windows codepages. For example, if LC_CTYPE is set to
> de_DE@ISO-8859-15, they should know it designate the codepage 28605.
>
> The current implementations of mbstowcs and wcstombs do not work
> at all in this scenario. We must replace the implementations
> with ones based on MultiByteToWideChar and WideCharToMultiByte.
> The emulation will take a little cost. Cygwin DLL should also
> use sys_wcstombs and sys_mbstowcs in this scenario.
I would be basically fine with that, we just have to replace the
newlib functions _mbtowc_r and _wctomb_r. All other conversions are
based on these. What we still also need is a good conversion
function from LANG/LC_CTYPE to Windows codepage.
And here's the problem: I don't think I understand this stuff good
enough. Does anybody have fun and time to come up with that?
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
More information about the Cygwin-developers
mailing list