This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Shells hang during script execution


>>Here's a description of a second hang condition we were encountering, along 
>>with a patch for it.
>>
>>
>>The application (pdksh in this case) does a read on a pipe, which eventually 
>>calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread 
>>with "read_pipe()" as the function.  Then >it calls th->detach(read_state).
>>
>>When the hang occurs, the new thread gets terminated early, before
>>cygthread::stub() can call "callfunc()".  You see the error message
>>"erroneous thread activation".  I'm not sure what's causing the thread
>>to fail activation, but the result is, the read_state semaphore never
>>gets signalled.
>
>Sorry but this is another band-aid around a problem.  The real problem
>is that the code shouldn't get into the state that you are describing.
>That's why cygwin prints an error message - it is a serious problem.
>Making the code deal gracefully with a problem like this isn't going
>to solve the underlying issue.
>
>If you can figure out what's causing the erroneous thread activation
>then that will be the real culprit.
>
>cgf
>

OK, I believe I've tracked this down.

The problem occurs when we get into a read_pipe cygthread constructor (cygthread::cygthread()) with a NULL h and an ev that is signalled.  When this condition exists, a hang can occur as follows:

1) Creator thread calls detach().  This waits for pipe_state to be released twice
2) read_pipe thread calls read_pipe, reads data, and releases the semaphore twice
3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately because ev was set when the thread was created.
4) Creator thread initiates another read_pipe cygthread to read more pipe data.

At this point, there's a race: if the Creator thread gets past the initialization part of the constuctor, which sets __name(name), BEFORE the original read_pipe thread gets to the part of cygthread::stub() that sets info->__name = NULL, then you'll see the hang.  The new pipe_read will give the "erroneous thread activation" message, and the parent will be stuck waiting for data that will never arrive.

The only path that leaves an unused thread structure in a state where h==NULL and ev is signalled is cygthread::release().  So the fix is simple:

$ cat cygthread.cc.udiff
--- cygthread.cc.ORIG   2006-02-22 10:57:42.123931300 -0500
+++ cygthread.cc        2006-03-01 12:59:23.255023000 -0500
@@ -268,7 +268,12 @@
 cygthread::release (bool nuke_h)
 {
   if (nuke_h)
+    {
     h = NULL;
+
+    if (ev)
+      ResetEvent (ev);
+    }
 #ifdef DEBUGGING
   __oldname = __name;
   debug_printf ("released thread '%s'", __oldname);


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]