This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: With bad UTF-8, cygwin can create files it can't read

From: Warren Young <wyml at etr-usa dot com>
To: cygwin at cygwin dot com
Date: Wed, 1 Apr 2015 10:01:42 -0600
Subject: Re: With bad UTF-8, cygwin can create files it can't read
Authentication-results: sourceware.org; auth=none
References: <CAOCY71AaRWGEFVcPqLKNEjqWEkELdfLD-KBvxMAQCi0wt2A5ZA at mail dot gmail dot com> <20150330110446 dot GK29875 at calimero dot vinschen dot de> <20150401133401 dot GV13285 at calimero dot vinschen dot de>

On Apr 1, 2015, at 7:34 AM, Corinna Vinschen <corinna-cygwin@cygwin.com> wrote:
> 
> As you probably know, Unicode values beyond the base plane (that is,
> everything > 0xffff in UTF-32 and > ef bf bf in UTF-8 notation)
> are represented as so-called surrogate pairs in UTF-16, two UTF-16
> values in the 0xd800 - 0xdfff range.

I happened to have run across a similar strangeness in Unicode earlier today.  Does Cygwin cope with/care about Unicode normalization forms?

  http://goo.gl/jnsqhC

For example, will open(2) cope with any UTF-8 form of a string that you could pass in UTF-16 encoding to CreateFile()?

You could imagine, say, a web app getting a string from a user, then using that to access a file on disk.  A different browser given the âsameâ string could result in a different series of bytes passed to the Cygwin POSIX layer.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Follow-Ups:
- Re: With bad UTF-8, cygwin can create files it can't read
  - From: Corinna Vinschen

References:
- Re: With bad UTF-8, cygwin can create files it can't read
  - From: Corinna Vinschen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]