This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" character set (again)


On Jan  8 12:12, Thomas Wolff wrote:
> Andy Koppe wrote:
> >There's an important distinction here between the C locale and the
> >defaut locale. The C locale is what you get if you don't call
> >setlocale at all, whereas the default locale is what you get if you
> >call setlocale(LC_FOO, "") and the relevant environment variables are
> >all unset or empty.
> >
> >The default locale uses UTF-8, and I most certainly agree that this
> >should stay as is. The charset of the filesystem and the console are
> >both controlled by the default locale (unless overridden in the
> >environment). They are independent of the C locale's charset or
> >whether an application calls setlocale.
> >
> >No, this is about the C locale only. Lots of people and programs make
> >assumptions about the C locale which may not be valid according to
> >POSIX, but which nevertheless hold true for Linux and most (if not
> >all) other Unices, including Cygwin 1.5. The most important assumption
> >is that the C locale is 8-bit clean.
> And byte-transparent, right?
> Which gets me back to this printf issue; actually your point here
> seems to approve my arguments there, if only I had explicitly
> restricted them to the C locale.
> Could you agree that functions like sprintf should handle their char
> * arguments byte-transparently if acting in the C locale?

It does!  The problem occurs in the *format* string.  It's not about the
C locale but the underlying charset.  The important part is how printf
is implemented.  There are two implementations.

- The format string is treated as a singlebyte string and all bytes !=
  '%' are just waved through.  On these systems the printf problem
  reported on the cygwin list will not show up, but the printf
  implementation is not entirely multibyte clean.  Examples are FreeBSD
  and Linux.

- The format string is treated as a multibyte string and each character
  is converted to a wide char.  The resulting wide char is checked
  against L'%'.  Every other character is waved through.  These
  implementations are multibyte clean, however, conversion errors will
  result in printf returning prematurely.  Examples are OpenBSD and
  newlib/Cygwin.

Since the only defined part of the C locale is the ASCII range, both
results are valid.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]