This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: read() returns data but not the number of read bytes


On Jul  1 16:02, Theodor.Kazhan@gmx.de wrote:
> Hi all,
> 
> The issue was observed in cygwin versions 1.7.30-1, 2.0.2-1 and
> 2.0.4-1 on Win7. Upon receiving 'HUGEFILE' form a debian system (scp
> user@server:logDownload_m/TEST.txt .), scp in cygwin hangs from time
> to time... 
> 
> Based on the observed occurrence behaviour of the issue and my
> experience in hunting ghosts, I - based on the new aspect described
> below - without much doubt assume a race condition between the cygwin
> read()/write() function implementations. I started investigating the
> cygwin-newlib implementation and gdb-ing it but I'm not yet that much
> into the details of read()/write()-implementation to identify possible
> race conditions.
> So any contribution / co-working offers / investigation proposals are
> highly appreciated as I can only progress slowly.
> 
> 
> As a starting point I'd have, based on: "scp and it's child process
> ssh share a pipe to receive the data of the downloaded file..." the
> following Q:
> 1) Is a child forked into a new, separate Win-process or is the fork
> simulated in cygwin, e.g. by threads?

Fork is starting a new process.

> 2) Is the pipe implementation in cygwin in such a setup using
> Win-methods to forward the data or is the pipe-functionality simulated
> in cygwin?

Pipes are using Windows named pipes w/ overlapped IO.

> 3) Can the action of providing data for reading from a pipe by the
> consumer process (i.e. indicating data availability and the actual
> copying of the data in a read()-call) be interrupted/impacted by a
> write()-call from a pipe-data-producing process?

I don't understand the question.  Writing to the pipe on one end will
trigger an event handle on the other end and the read call will check
if overlapped data is available in a call to GetOverlappedResult.
Writing to the pipe will not generate POSIX signals in the consumer
process, if that's what you mean.

> ---8<---
> size_t
> atomicio6(ssize_t (*f) (int, void *, size_t), int fd, void *_s, size_t n,
>     int (*cb)(void *, size_t), void *cb_arg)
> {
> [...]
> --->8---

A diff -up against the orignal code would be helpful for stuff like that.

> $ cat -n scp_20150701_123824.txt
>  . . .
>  50782  TK: atomicio.c: atomicio6: fd=8, event=1, res=49152, errno=011,            s[0]=B014565-N000001-, s[Q1-16]=B014565-N001024-B014566-N000001-, s[Q2-16]=B014566-N001024-B014567-N000001-, s[Q3-16]=B014567-N001024-B014568-N000001-, s[Q4-16]=B014568-N001024-
>  50783  TK: scp.c: sink: amt=65536, j=65536, i=238616576, size=1553121293, count=65536, buf=B014565-N000001-
>  50784  TK: dispatch.c: ssh_dispatch_run: type=94
>  50785  TK: channels.c: channel_input_data: data_len=16384, buf=B014569-N000001-
>  50786  TK: channels.c: channel_handle_wfd(2): write: len=16384, buf=B014569-N000001-
>  50787  TK: atomicio.c: atomicio6: fd=8, event=1, res=-0001, errno=011,            s[0]=B014569-N000001-, s[Q1-16]=B014569-N001024-B014566-N000001-, s[Q2-16]=, s[Q3-16]=, s[Q4-16]=

Hmm.  I inspected the read code for overlapped reads again and I don't
see anything wrong.  The question is why an EAGAIN is triggered.  The
only reason I can think of is if GetOverlappedResult returns FALSE
when there actually is data in the buffer.  For the underlying
functionality, see

https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/fhandler.cc;h=6f024da3288053359de36baf41fa095819d252cc;hb=HEAD#l1944

I'm now trying to reproduce this again, but I'm really not sure
where to look and, as I wrote, last time I was unable to reproduce
this despite running the testcase for hours.

It might be interesting to see if there's a certain bordercase possible:
GetOverlappedResult returning FALSE with bytes set to some non-0 value.
Locally I changed the debug output to

diff --git a/winsup/cygwin/fhandler.cc b/winsup/cygwin/fhandler.cc
index 6f024da..556c240 100644
--- a/winsup/cygwin/fhandler.cc
+++ b/winsup/cygwin/fhandler.cc
@@ -1993,10 +1993,14 @@ fhandler_base_overlapped::wait_overlapped (bool inres, bool writing, DWORD *byte
 	     overridden by the return of GetOverlappedResult which could detect
 	     that I/O completion occurred.  */
 	  CancelIo (h);
+	  *bytes = 0;
 	  wores = GetOverlappedResult (h, get_overlapped (), bytes, false);
 	  err = GetLastError ();
 	  ResetEvent (get_overlapped ()->hEvent);	/* Probably not needed but CYA */
-	  debug_printf ("wfres %u, wores %d, bytes %u", wfres, wores, *bytes);
+	  if (!wores && *bytes > 0)
+	    system_printf ("wfres %u, wores %d, bytes %u", wfres, wores, *bytes);
+	  else
+	    debug_printf ("wfres %u, wores %d, bytes %u", wfres, wores, *bytes);
 	  if (wores)
 	    res = overlapped_success;	/* operation succeeded */
 	  else if (wfres == WAIT_OBJECT_0 + 1)

I'm running the test with this change now.  If you see anything else
wrong in that code, please yell.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgprQ4C7lfNxc.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]