This is the mail archive of the cygwin@sourceware.cygnus.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ASCII and BINARY files. Why?


Jim Balter wrote:
> 
> I can guarantee you that the text/binary split will *never* stop
> being a major headache.

This is probably true no matter what we do.

> The fact that cat throws away characters
> from files and stops dead at ^Z makes any hope of building robust
> systems on top of this thing hopeless.

I think it would be a good idea to

	- change the text->binary and binary->text translations
	  so that text->binary->text or binary->text->binary
	  translations leave the original intact

	- not treat ^Z in text files as EOF
	  (^Z at the console should be EOF iff `stty eof ^Z'.)

> One solution would be to do away with the text/binary
> split and fix any program that cannot handle CR's within
> lines.  I'm not talking about throwing them away in filters,
> as with the current situation, but rather make sure that programs
> that *parse* lines can handle arbitrary whitespace.

There are many programs that treat "\n" as different from other whitespace.

Your suggestion amounts to requiring applications to check for
"\r\n" and treat it as equivalent to "\n", even if the file was
not opened in binary mode.  Philosophically, this seems like a bit of a
step backwards from the ANSI C approach of making it the
implementation's responsibility, not the program's. 
Making these changes might help interoperability in other contexts
(e.g.  when using network file systems shared by both DOS and Linux),
so I guess it is arguable that they're a good idea anyway,
but I think there would be pragmatic problems anyway:
I think grep fopen etc. is going to have fewer hits than grep '\n'.

For example of the difference, in the C preprocessor,

	#define foo \<carriage return><newline>
	bar

is different from

	#define foo \<newline>
	bar

Now, in this particular case, it is implementation-defined what
constitutes the end of a line, and so the GNU C preprocessor could
define the end of a line as either "\r\n" or "\n".  However, the ANSI
standard requires that the implementation document this choice, and so
if this change were made, the documentation would need to be changed.

> This would all
> be POSIX compatible and viewable as bug fixes, and thus quite possibly
> mergeable back into the GNU sources.

I don't agree that it would be viewable as bug fixes.
Strictly speaking, the documentation of all these sources would have to
be changed to relect the new behaviour.  Note that if this approach
were taken, and the changes were merged back into the GNU sources,
then it would affect the behaviour of the other version (e.g. the Linux
version) not just the gnu-win32 versions.

Still, even though they're not bug fixes, such changes might be
mergeable back into the GNU sources as enhancements.

> There might be a few exceptions
> where the lines are defined as exactly the bytes up to a NL,

Why do you think there would only be "a few" exceptions like this?
I think that cases like this are very common.

So, I think the problem with your suggestion is that even though these
changes might well be worthy enhancements, the sheer number of changes
required would be overwhelming.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: <http://www.cs.mu.oz.au/~fjh>   |  of excellence is a lethal habit"
PGP: finger fjh@128.250.37.3         |     -- the last words of T. S. Garp.
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]