This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: Need help with multibyte UTF-8 characters
I believe that Cygwin displays certain UTF-8 characters incorrectly. To
see the problem, first save the attached "utf-8_test.sed" text file to
your desktop. Then run "mintty," and set its options by right clicking
in its title bar, selecting "Options" and then "Text." On the Text page
set "Locale" to "en_US" and "Character set" to "UTF-8," and then
"Save." Now exit and restart mintty. Change directory to your desktop
and run the editor "vim" on the utf-8_test.sed file. Once inside vim do
a ":set fileencoding=utf-8". You should now see that vim displays
correctly a sample of one-, two-, and three-byte UTF-8 character
encodings in the test file. Vim fails, however, on the three-byte
encodings for the "en" dash, the "em" dash, and the ellipsis, each of
which displays incorrectly as a filled-in rectangle. Now exit vim and
do a "less" or "cat" on the utf-8_test.sed file. You should see most of
the sample UTF-8 encoded characters displayed correctly, except once
again for the en dash, em dash, and ellipsis. So it looks like a
problem in the underlying Cygwin run-time libraries rather than in vim,
less, or cat. I haven't tested this on four-byte UTF-8 character
encodings, but assume Cygwin will have similar problems.
Attachment:
utf-8_test.sed
Description: Text document
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple