This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Erroneous line endings (cat,gawk,text mount)


Roman Belenov wrote:
> I encountered that cygwin tools can generate file with strange line
> endings in certain situation. I have a file (name it foo.txt) with
> dos-style line endings in  text mounted directory. If I do
>     gawk {print;} <foo.txt >bar.txt
> or
>     cat foo.txt >bar.txt
> I get a copy of foo.txt. But if I do
>     cat foo.txt | gawk {print;} >bar.txt
> I get 0xd doubled in line separators (so lines are separated with 0xd
> 0xd 0xa in bar.txt).
>
> <disclaimer>
> This is just a bug report, I don't expect timely reaction of any
> kind.
> </disclaimer>
>
> --
>   With regards, Roman.

This is very interesting as I couldn't reproduce Roman's results at
all, although I did get some results that I didn't expect.  Details
follow.

System: Win98SE
Cygwin: 1.3.22
Gawk:   3.1.2-2

$ echo "CYGWIN = $CYGWIN"
CYGWIN = tty

$ mount                                      # output wrapped at col 72
C:\Cygwin\usr\X11R6\lib\X11\fonts on /usr/X11R6/lib/X11/fonts type
system (binmode)
C:\Cygwin\bin on /usr/bin type system (binmode)
C:\Cygwin\lib on /usr/lib type system (binmode)
C:\Cygwin on / type system (binmode)
a: on /cygdrive/a type user (textmode)
c: on /cygdrive/c type user (binmode,noumount)
d: on /cygdrive/d type user (binmode,noumount)

$ cd /cygdrive/a

The following 3 commands give the output that I expect.

$ od -ba foo.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ cat foo.txt | od -ba      # same as above - as it should be: UUOC ;-)
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ cat foo.txt >bar.txt;od -ba bar.txt                     # as expected
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017


However, this doesn't:

$ awk 1 foo.txt | od -ba
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

For a text mount I'd expect "\n" -> "\r\n" translation on output, but
it doesn't seem to be happening.

Other gawk Windows ports normally translate "\r\n" -> "\n" on input and
"\n" -> "\r\n" on output, unless the BINMODE variable is used.  This is
so that gawk can work internally with "\n" as a line ending, but handle
the system's line endings correctly.  [See gawk manual]

For the Cygwin port and a text mount I'd expect the same behaviour,
i.e., "\r\n" -> "\n" on input and "\n" -> "\r\n" on output, unless the
BINMODE variable was set.


Next I took a file on the text mount with unix line endings:

$ od -ba unixle.txt
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ cat unixle.txt | od -ba                            # no surprise here
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ cat unixle.txt >bar.txt;od -ba bar.txt   # s/b "\r\n" endings surely?
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ awk 1 unixle.txt | od -ba                # s/b "\r\n" endings surely?
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

For the above 2 commands the results seem odd again to me as I would
expect the output files to be "\r\n" terminated.

I re-read the rules in the Cygwin manual about line end translation and
tried this:

$ od -ba a:foo.txt                                        # as expected
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ awk 1 a:foo.txt >bar.txt;od -ba bar.txt
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

But:

$ awk 1 a:foo.txt >a:bar.txt;od -ba bar.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

As the manual says if you use a path for the file that includes a drive
letter then the mount for that file is text, but shouldn't we get the
same output without the drive letter as /cygdrive/a is text mounted?

Interestingly (still on the text mounted /cygdrive/a):

$ od -ba unixle.txt
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ awk 1 a:unixle.txt >bar.txt;od -ba bar.txt
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ awk 1 a:unixle.txt >a:bar.txt;od -ba bar.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ awk 1 unixle.txt >a:bar.txt;od -ba bar.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

These are as I would expect, given the manual's rules.


How about a bin mount I thought?  So:

$ cd ~

$ od -ba foo.txt
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ cat foo.txt | od -ba
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ cat foo.txt >bar.txt;od -ba bar.txt       # mmm should cat translate?
0000000 061 015 012 062 015 012 063 015 012 064 015 012 065 015 012
          1  cr  nl   2  cr  nl   3  cr  nl   4  cr  nl   5  cr  nl
0000017

$ awk 1 foo.txt | od -ba                            # well awk does ...
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ awk 1 foo.txt >bar.txt;od -ba bar.txt         # ... however you do it
0000000 061 012 062 012 063 012 064 012 065 012
          1  nl   2  nl   3  nl   4  nl   5  nl
0000012

$ # yes, I know the last two should work the same.


So it seems that with gawk on a bin mount we get line end translation
on output, but not on a text mount, unless you force Cygwin to do it by
using a drive letter in the file path.

Or am I missing something significant in the documentation?

Peter S Tillier
"Who needs perl when you can write dc, sokoban,
arkanoid and an unlambda interpreter in sed?"



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]