This is the mail archive of the
cygwin
mailing list for the Cygwin project.
Re: bug in mbrtowc?
From the "Linux Programmerâs Manual" (release 3.15 of the Linux man-pages):
"If the n bytes starting at s do not contain a complete multibyte
character, mbrtowc() returns (size_t) -2."
On Mon, Jul 27, 2009 at 6:56 PM, Andy Koppe wrote:
> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
> Here's an example:
>
> #include <stdio.h>
> #include <locale.h>
> #include <stdlib.h>
> #include <wchar.h>
>
> int main(void) {
> Âwchar_t wc;
> Âsize_t ret;
> Âmbstate_t s = { 0 };
> Âputs(setlocale(LC_CTYPE, "en_GB.UTF-8"));
> Âprintf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
> Âprintf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
> Âprintf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
> Âprintf("%x\n", wc);
> Âreturn 0;
> }
>
> The sequence E2 94 84 should translate to U+2514. Instead, the second
> and third calls to mbrtowc report encoding errors. It does work
> correctly if the three bytes are passed to mbrtowc() in one go:
>
> Âprintf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));
>
> Andy
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple