This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Console codepage setting via chcp?


Hi.

I had read the thread, but I don't well understand what you want to decide now.

* Isn't there opposite to the following specifications?

> - System objects will always be *initially* translated using UTF-8. This
>  includes file names, user names, and initial environment variables.
> - By setting the locale environ variables you can switch the charset
>  used to translate filenames on a per-process base.
>  This would be only a stop-gap measure, to allow to re-use old archives
>  or scripts.  Those should be converted to UTF-8 ASAP.  Expect complaints.
> - The "C" locale's charset will be UTF-8.
> - There'll be language-neutral "C.<charset>" locales.
> - The user's ANSI codepage will remain the default charset for
> "language_TERRITORY" locales.
> - The console charset will be set according to LC_ALL/LC_CTYPE/LANG
>  at the time the application starts.

* Is other issue of existing only the thread "Lone surrogates in UTF-8?"?
 (Does the thread exist in the ML archive page?)

2009/9/26 Corinna Vinschen <corinna-cygwin@cygwin.com>:
> On Sep 25 19:42, Andy Koppe wrote:
>> 2009/9/25 Corinna Vinschen:
>> >> - System objects will always be translated using UTF-8. This includes
>> >> file names, user names, and initial environment variables (and
>> >> probably more I'm not aware of).
>> >[...]
>> The downside, of course, is that non-ASCII filenames created in a
>> non-UTF8 locale won't show up correctly in Windows, and vice versa.
>> But that's the same on Linux if the global setting is UTF-8 while the
>> terminal is set to something else. And the stock answer to any
>> complaints will be: Use UTF-8!
>>
>> In any case, the DCxx scheme will ensure that things work correctly
>> within any particular locale.
>>
>> And I guess the ^N scheme can go (or be disabled)?
>
> Probably not.  I spent some more time thinking about the various
> scenarios (partly instead of sleeping) and it occured to me that using
> UTF-8 exclusively is a nice dream.
>
> Still, what about your tar example given in
> http://cygwin.com/ml/cygwin-developers/2009-09/msg00043.html?
>
> If we stick to UTF-8 exclusively we *have* to create the convmv-like
> tool which allows to convert "broken" filenames to be converted from the
> \016\377\x notation to the UTF-8 \c2\x or \c3\x notation, otherwise.
>
> Or would it be better to allow to switch the charset using the locale
> environment variables, regardless, as you proposed:
>
>    $ LANG=C.KOI8-R tar xzf bla.tgz
>
> What's the right thing to do?  I'm still unsure.  With your proposal,
> it's at least the user choose and if some interoperability issue occurs
> and the user complains, we can point to the FAQ: "Use UTF-8, dumbass!"
>
>> > So, utilizing the initial setting of LC_ALL/ff. is as good
>> > as defaulting to UTF-8 and allowing to switch via a setcons tool.
>>
>> 'setcons' requires a wrapper script, whereas the variables don't
>> necessarily, as they can be set in the Windows environment. This would
>> allow programs to be invoked directly from a shortcut and still
>> picking up the user's setting.
>>
>> Also, one of the locale variables needs to be set anyway if one wants
>> to use something other than the default locale.
>>
>> > I have
>> > found an easy way to allow a setcons tool which only switches the charset
>> > used by Cygwin.  It doesn't affect the setting in cmd, or made by chcp.
>>
>> That's a good idea. I've come round to thinking that 'setcons' is
>> worth having in addition to the initial setting from the environment.
>
> Ok, let's use the environment variables for now.  Creating a setcons
> tool will be possible, but is low priority then.
>
>> >> - setlocale() will have no effects beyond what's expected in Linux.
>> >
>> > Well... probably.  I'm not saying yes without asking a lawyer first.
>>
>> :)  I put that a bit too probingly, didn't I?
>
> Yep :)
>
> So, the modified list alongside your proposal looks like this:
>
> - System objects will always be *initially* translated using UTF-8. This
>  includes file names, user names, and initial environment variables.
> - By setting the locale environ variables you can switch the charset
>  used to translate filenames on a per-process base.
>  This would be only a stop-gap measure, to allow to re-use old archives
>  or scripts.  Those should be converted to UTF-8 ASAP.  Expect complaints.
> - The "C" locale's charset will be UTF-8.
> - There'll be language-neutral "C.<charset>" locales.
> - The user's ANSI codepage will remain the default charset for
> "language_TERRITORY" locales.
> - The console charset will be set according to LC_ALL/LC_CTYPE/LANG
>  at the time the application starts.
> - setlocale() will (probably) have no effects beyond what's expected in Linux.
>
> So which approach do we take, the one from
> http://cygwin.com/ml/cygwin-developers/2009-09/msg00050.html
> or the one above?  The implementation differs only marginally
> in complexity, since the most of it is already there.
>
> Please vote.
>
>
> Corinna
>
> --
> Corinna Vinschen                  Please, send mails regarding Cygwin to
> Cygwin Project Co-Leader          cygwin AT cygwin DOT com
> Red Hat
-- 
IWAMURO Motnori <http://vmi.jp/>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]