This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

difficult SIGCHLD / pipe hang


I am still trying to trace down a nasty ksh93 regression test failure.
The script that fails looks like this:

echo 0
print | cat >/dev/null &
wait $!

The problem appears to be timing-related.  What happens is that
the "cat" process and the "print" process are in a race to exit.
If "print" is fast enough, "cat" will get a SIGCHLD right around
the time its read of the pipe from "print" returns EOF.  This causes
the "cat" process to hang.

Note that both "cat" and "print" are built-ins in the version of ksh
being tested.  They are both extremely fast -- everything is running
in forked ksh processes, there is no exec() to run cat.

Here is a trace of the failure:

# process 246 is the "print" process -- it starts exit() here
  266  567270 [main] ksh 246 do_exit: do_exit (0)
  172  567442 [main] ksh 246 sig_dispatch_pending: pending_signals 0
  169  567611 [main] ksh 246 sig_dispatch_pending: no need to wake anything up
  168  567779 [main] ksh 246 set_sigcatchers: last 1, old 0, cur 0x6DE24420, cur 0x1
  145  567924 [main] ksh 246 void: 0x6DE24420 = signal (20, 0x1)
  141  568065 [main] ksh 246 sig_dispatch_pending: pending_signals 0
  176  568241 [main] ksh 246 sig_dispatch_pending: no need to wake anything up
  180  568421 [main] ksh 246 void: 0x0 = signal (1, 0x1)
  184  568605 [main] ksh 246 sig_dispatch_pending: pending_signals 0
  154  568759 [main] ksh 246 sig_dispatch_pending: no need to wake anything up
  141  568900 [main] ksh 246 void: 0x1 = signal (2, 0x1)
  155  569055 [main] ksh 246 sig_dispatch_pending: pending_signals 0
  134  569189 [main] ksh 246 sig_dispatch_pending: no need to wake anything up
  175  569364 [main] ksh 246 void: 0x1 = signal (3, 0x1)
  175  569539 [main] ksh 246 fhandler_base::close: closing '/dev/null' handle 0x178
  181  569720 [main] ksh 246 fhandler_base::close: closing '/dev/pipew' handle 0xD8
  190  569910 [main] ksh 246 fhandler_base::close: closing '/dev/pipew' handle 0xE8
# process 620 is "cat" -- it gets eof here
 5788  569923 [read_pipe] ksh 620 fhandler_base::read: returning 0, binary mode
  215  570125 [main] ksh 246 fhandler_base::close: closing '/var/run/utmp' handle 0x158
  250  570375 [main] ksh 246 fhandler_base::close: closing '/tmp/j2' handle 0x130
  189  570564 [main] ksh 246 sigproc_terminate: entering
  186  570750 [main] ksh 246 proc_terminate: nchildren 0, nzombies 0
# 620 unwinding in read()
  906  570829 [main] ksh 620 readv: 0 = readv (0, 0x22E090, 1), errno 0
  241  570991 [main] ksh 246 proc_terminate: leaving
  411  571402 [main] ksh 246 __to_clock_t: dwHighDateTime 0, dwLowDateTime 468750
  198  571600 [main] ksh 246 __to_clock_t: total 00000000 0000002E
  179  571779 [main] ksh 246 __to_clock_t: dwHighDateTime 0, dwLowDateTime 312500
  159  571938 [main] ksh 246 __to_clock_t: total 00000000 0000001F
 8805  580743 [main] ksh 246 _pinfo::exit: Calling ExitProcess 0
# 620 notices that a child died and sends itself SIGCHLD
# it is never heard from after that and gdb cannot be attached
# to the process to figure out what it thinks is going on...
16429  587258 [proc] ksh 620 proc_subproc: args: 2, 0
  343  587601 [proc] ksh 620 proc_subproc: pid 246[0] terminated, handle 0x4C, nchildren 1, nzombies 0
  233  587834 [proc] ksh 620 proc_subproc: zombifying [0], pid 246, handle 0x4C, nchildren 1
  218  588052 [proc] ksh 620 proc_subproc: returning 1
  265  588317 [proc] ksh 620 sig_send: pid 620, signal 20, its_me 1
  182  588499 [proc] ksh 620 sig_send: Not waiting for sigcomplete.  its_me 1 signal 20
   10  588509 [sig] ksh 620 wait_sig: awake
  345  588854 [sig] ksh 620 wait_sig: processing signal 20
  184  589038 [sig] ksh 620 wait_sig: Got signal 20
  163  589201 [sig] ksh 620 sig_handle: signal 20
  157  589358 [sig] ksh 620 sig_handle: signal 20, about to call 0x6DE24420
 -305  589053 [proc] ksh 620 sig_send: returning 0 from sending signal 20
  636  589689 [proc] ksh 620 wait_subproc: looping
# last message from 620
    3  589692 [sig] ksh 620 setup_handler: suspending mainthread

The "suspending mainthread" message is suspicious -- a test that does not hang
does not emit that message.  It goes something like this:

  402  527314 [proc] ksh 687 proc_subproc: args: 2, 0
  -11  527303 [main] ksh 687 geterrno_from_win_error: windows error 2 == errno 2
  312  527615 [proc] ksh 687 proc_subproc: pid 688[0] terminated, handle 0x100, nchildren 1, nzombies 0
  386  528001 [proc] ksh 687 proc_subproc: zombifying [0], pid 688, handle 0x100, nchildren 1
  239  528240 [proc] ksh 687 proc_subproc: returning 1
  255  528495 [proc] ksh 687 sig_send: pid 687, signal 20, its_me 1
  462  528957 [proc] ksh 687 sig_send: Not waiting for sigcomplete.  its_me 1 signal 20
   19  528976 [sig] ksh 687 wait_sig: awake
  414  529390 [sig] ksh 687 wait_sig: processing signal 20
  170  529560 [sig] ksh 687 wait_sig: Got signal 20
  180  529740 [sig] ksh 687 sig_handle: signal 20
    7  529747 [main] ksh 687 symlink_info::check: not a symlink
  346  530093 [main] ksh 687 symlink_info::check: 0 = symlink.check (q:\.hekimian.com\cygwin-shared\bin\cat.exe, 0x22D2C0) (0xA)
 -166  529927 [sig] ksh 687 sig_handle: signal 20, about to call 0x6DE24420
  598  530525 [main] ksh 687 path_conv::check: root_dir(q:\), this->path(q:\.hekimian.com\cygwin-shared\bin\cat.exe), set_has_acls(0)
  288  530813 [proc] ksh 687 sig_send: returning 0 from sending signal 20
  239  531052 [proc] ksh 687 wait_subproc: looping
# get this message instead of "suspending mainthread"
    7  531059 [sig] ksh 687 interruptible: pc 0x610DE709, h 0x61000000, interruptible 0, testvalid 0
  602  531661 [sig] ksh 687 interruptible: pc 0x6105FB7C, h 0x61000000, interruptible 0, testvalid 0
  514  532175 [sig] ksh 687 interruptible: pc 0x65B9063F, h 0x65B80000, interruptible 1, testvalid 0
  468  532643 [sig] ksh 687 proc_subproc: args: 3, 1
  310  532953 [sig] ksh 687 proc_subproc: clear waiting threads
  282  533235 [sig] ksh 687 proc_subproc: finished clearing
  263  533498 [sig] ksh 687 proc_subproc: returning 1
  271  533769 [sig] ksh 687 interrupt_setup: armed signal_arrived 0x224, res 1
  284  534053 [sig] ksh 687 setup_handler: signal successfully delivered
  463  534516 [sig] ksh 687 setup_handler: returning 1

Any expert advice appreciated -- the Cygwin code in this area is complex.
--
Joe Buehler


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]