This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Suffixes in non-western charsets


On Mon, 28 Jan 2008, Corinna Vinschen wrote:

> Hi,
>
> sorry for my ignorance, but I found that I have no idea how file
> suffixes are handled when working in a non-western charset environment.
> What I'm up to is this:
>
> When you're using a latin-character based charset like ASCII or
> ISO-8859-1, then the suffixes used for instance for executables or
> shortcuts are always the same.  An executable has ".exe" or ".com", a
> shortcut has ".lnk", a batch file ".bat" and so on.
>
> How is that in non-latin charsets like, say, in cyrillic, chinese or in
> japanese?  Are these suffixes in some way translated into the non-latin
> charset?  If so, how?
>
> Given that NTFS uses UTF-16, it would be possible to keep the latin
> characters part of the filename.  So, if I try to find out if a path
> name is a batch file, the comparison with L".bat" would still be valid.
> But, is it working this way?
>
> FAT uses the system OEM charset.  Many applications are still using
> single/multi-byte functions.  So, how does it work?  Are the suffixes
> fixed by using always the same byte value, regardless of the meaning of
> that byte value in the used charset?  Or are they translated to
> characters which have some similarity with the latin characters the
> suffixes are based on?  Would the "usual" comparison work after
> converting the filename to UTF-16 (as for L".bat")?
>
> Can anybody enlighten me here?

As far as I know, most 8-bit charsets share the ASCII 7-bit portion, and
differ only in the upper 128 characters.  When using the Windows-1251
(Cyrillic) charset, the suffixes are in the ASCII subset, and thus are
unchanged.  I have seen some other 8-bit charsets used (1252, 1255), and
there was no translation of suffixes.

I'm not certain that the CJK charsets share this property.
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_	    pechtcha@cs.nyu.edu | igor@watson.ibm.com
ZZZzz /,`.-'`'    -.  ;-;;,_		Igor Peshansky, Ph.D. (name changed!)
     |,4-  ) )-,_. ,\ (  `'-'		old name: Igor Pechtchanski
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"That which is hateful to you, do not do to your neighbor.  That is the whole
Torah; the rest is commentary.  Go and study it." -- Rabbi Hillel


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]