This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [ANNOUNCEMENT] Updated: libreadline7-7.0.3-3


On 07/27/2017 01:56 PM, Steven Penny wrote:
> On Thu, 27 Jul 2017 12:08:53, Eric Blake wrote:
>> I've got some time today to look at building readline, but for the life
>> of me, I can't figure out what I'm supposed to be debugging.  You have
>> so many emails saying "see this earlier URL" that I am lost in what you
>> are saying is wrong or how to reproduce it.
> 
> Thanks for this. Between your 2 emails, youve put a lot on the table.
> Instead
> of getting overwhelmed, I will just start my side of the convo by
> replaying the
> problem. Then if you need more from me I am happy to help. So, here is an
> example problem using LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6):
> 
>    $ chcp.com 65001

I still don't know your environment (it's really hard to reproduce
issues if I don't know the steps to reproduce them).  This looks like a
bash prompt, but are you running bash inside mintty, or directly in a
cmd window?

When I first open a mintty window to get bash, I see:

$ chcp.com
Active code page: 437

and in that environment, typing <alt-1-4-8> displays nothing, but
hitting <enter> then displays:
-bash: $'\302\224': command not found

which maps to \xc2\x94; I can confirm that with 'od -tx1'.  Trying
<alt-0-2-4-6> gives a different character (¦), as \xc2\xa6.

When I then do

$chcp.com 65001
Active code page 65001

I don't see any change in behavior.

But if I first open a cmd window, with NO bash in the mix, I see:

c:\cygwin\bin> chcp
Active code page: 437

where both <alt-1-4-8> and <alt-0-2-4-6> output ö, and where 'od -tx1'
confirms both sequences produce \xc3\xb6.

Then switching code pages:

c:\cygwin\bin> chcp 65001
Active code page: 65001

directly typing <alt-0-2-4-6> prints nothing, while 'od -tx1' still
shows that it received \xc3\xb6.

I have no idea how alt- sequences are mapped to code points (it is not
as trivial as a conversion of base to get either the Unicode code-point
of 0x96 or to the UTF-8 encoding), but it appears that the input within
cmd is the same, while the choice of code page determines what the
output will be.  I also have no idea why the alt- sequences produce
different inputs under cmd than under mintty.  So knowing WHAT
environment you are using is VITAL to me understanding the results you
are seeing.

At any rate, I definitely know that U+00F6 is encoded as \xc3\xb6 in
UTF-8 (I confirmed that on Linux, with echo $'\xc3\xb6').  I _don't_
know what it is encoded as in Windows code page 437 or 65001.  But a
quick google later, and I see that for code page 437
(https://en.wikipedia.org/wiki/Code_page_437), ö is at codepoint 0x94
(decimal 148, octal 0224); meanwhile, 0xf6 is equal to decimal 246.  Aha
- maybe that explains the two alt- sequences under codepage 437: without
a leading zero, you are typing the decimal position which looks up the
character from the current code page; WITH a leading zero you are
directly requesting the decimal encoding of a Unicode character.  And
trying some other sequences, I note that õ (LATIN SMALL LETTER O WITH
TILDE' (U+00F5)) is not part of code page 437; so there is nothing I can
type without a leading 0 to print one; conversely, trying <alt-0-2-4-5>
which requests the same unicode character displays merely 'o'
(apparently U+006f), which, when you lack o-with-tilde, is a reasonable
fallback compared to printing nothing at all.

Either way, the character requested by the alt-sequence in the cmd
window is then transformed by Cygwin into the appropriate UTF-8 input
for the tty stdin of the Cygwin child process.  Hmm; repeating those
sequences under 'od -tx1', when I try <alt-0-2-4-5>, I see something
interesting: the moment I press 5 (while still holding alt), the display
prints [G; then releasing alt prints o; the transcription is then

0000000 1b 1b 5b 47 c3 b5 0a

which is ESC ESC [ G (hmm - that's the ANSI terminal escape sequence for
moving to column 0), followed by the actual Unicode õ, before my ending
newline.  No idea why that is leaking through to Cygwin to pick up as
input.  Is windows trying to beep at me to tell me my Unicode request
doesn't exist in the current code page?  Except that beep is Ctrl-G
(U+0007).

But when I switch to code page 65001, wikipedia redirects me to UTF-8.
So in that code page, presumably all ALT sequences represent themselves,
whether or not there is a leading 0?  No, experimentation shows
otherwise: <alt-2> shows nothing (and not the smiley face from codepage
437); while <alt-0-2> shows ^B (where ctrl-b really is code point 2). I
have no idea WHAT sequence would thus give you ö.


> Now you might say, why not just use codepage 437? Which is exactly what
> Corinna
> did say:
> 
> http://cygwin.com/ml/cygwin/2017-03/msg00193.html

Well, obviously, the code page matters to cmd; and I have no idea what
alt- sequences do (or are supposed to do) under mintty.  So there may
STILL be some lingering craziness on what Cygwin itself should do when
it recognizes an alt- sequence coming in (if cygwin translates from the
current code page to Unicode, where the current code page definitely
affects which character is desired); and that's _in addition_ to what
appears to be the craziness in bash when reconstructing the UTF-8
sequence for omega Ω as mentioned in my other mail.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]