This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Race condition spawning childs/pipe stuff?


Hello,

I seem to encounter a race condition when running large recursive build
processes (make).
Occasionally, the build process hangs with a spawned child (sh.exe)
eating with 100% user cpu.
It seems the build command itself (spawned make) finished but
child/parent? shell doesnt exit.

When i kill sh.exe manually, the (recursive) build process continues and
finishes.
I suspect some kind of race condition somewhere in pipe stuff.

The condition itself is not reproducable.

Cygwin dll is: 1.5.19, api ver: 0.138, build date: 2005-10-03 13:32

I attached gdb to process and examined threads:

----------- snip ----
$ ./gdb
GNU gdb 6.3.50.20050926
....
(gdb) attach 3048
Attaching to process 3048

[Switching to thread 3048.0xca8]
(gdb) info threads
* 3 thread 3048.0xca8  0x7c911231 in ntdll!DbgUiConnectToDbg () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
  2 thread 3048.0xd04  0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
  1 thread 3048.0xf90  0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll

(gdb) thread 1
[Switching to thread 1 (thread 3048.0xf90)]#0  0x7c91eb94 in
ntdll!LdrAccessResource ()
   from /cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) bt
#0  0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#1  0x7c91ea53 in ntdll!ZwYieldExecution () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#2  0x7c81e956 in SwitchToThread () from
/cygdrive/c/WINDOWS/system32/kernel32.dll
#3  0x61054215 in low_priority_sleep (secs=0) at
/netrel/src/cygwin-snapshot-20051003-1/winsup/cygwin/miscfuncs.cc:245
#4  0xfffffffe in ?? ()

(gdb) thread 2
[Switching to thread 2 (thread 3048.0xd04)]#0  0x7c91eb94 in
ntdll!LdrAccessResource ()
   from /cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) bt
#0  0x7c91eb94 in ntdll!LdrAccessResource () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#1  0x7c91e288 in ntdll!ZwReadFile () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#2  0x7c801875 in ReadFile () from
/cygdrive/c/WINDOWS/system32/kernel32.dll
#3  0x0000074c in ?? ()

(gdb) thread 3
[Switching to thread 3 (thread 3048.0xca8)]#0  0x7c911231 in
ntdll!DbgUiConnectToDbg ()
   from /cygdrive/c/WINDOWS/system32/ntdll.dll
(gdb) bt
#0  0x7c911231 in ntdll!DbgUiConnectToDbg () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#1  0x7c9607a8 in ntdll!KiIntSystemCall () from
/cygdrive/c/WINDOWS/system32/ntdll.dll
#2  0x00000005 in ?? ()

(gdb) q
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: , Pid 3048

----------- snip ----

Thread 1 seems to be the eater.
Gdb doesnt reveal much info so i used my favorite win32 user mode
debugger, ollydbg:

----------- snip ----

Threads
Ident      Entry      Data block   Last error                   Status
Priority   User time     System time
00000388   7C96077B   7FFDD000     ERROR_SUCCESS (00000000)     Active
32 + 0       0.0000 s      0.0000 s
00000D04   7C810856   7FFDE000     ERROR_SUCCESS (00000000)     Active
32 + 0       0.0000 s      0.0000 s
00000F90   00000000   7FFDF000     ERROR_SUCCESS (00000000)     Active
32 + 0      52.8437 s     94.5156 s

----------- snip ----

You see (main) thread 0xf90 is eating all the cpu.
I examined the call stack and used gdb's "l/info" commands to get
symbols (i have appropriate .dbg file)
I manually added the symbols as comments "(xxxx)":

----------- snip ----
Call stack of main thread
Address    Stack      Procedure
Called from                   Frame
0022DD84   7C91EA53   Includes ntdll.KiFastSystemCallRet
ntdll.7C91EA51
0022DD88   7C81E956   ntdll.ZwYieldExecution
kernel32.7C81E950
0022DD8C   61054215   cygwin1.610F5138
cygwin1.61054210 (low_priority_sleep + 80)
0022DDAC   6106DF57   cygwin1.610541C0 (low_priority_sleep,
miscfuncs.cc:230)   cygwin1.6106DF52 (_pinfo::sync_proc_pipe() + 34)
0022DDBC   61095984   cygwin1.6106DF30 (_pinfo::sync_proc_pipe(),
pinfo.cc:977) cygwin1.6109597F (spawn_guts(char const* ...) + 5263)
0022E99C   61095E35   ? cygwin1.610944F0 (spawn_guts(char const* ...),
spawn.cc) cygwin1.61095E30 (spawnve + 224)             0022E998
0022E9CC   610188AB   cygwin1.61095D50  (spawnve)
cygwin1.610188A6 (execve + 38)          	 0022E9C8

----------- snip ----

I searched the current cygwin sources and found following snippets ...

----- snip spawn.cc ----

static int __stdcall
spawn_guts (const char * prog_arg, const char *const *argv,
            const char *const envp[], int mode)
{

...
  /* If wr_proc_pipe doesn't exist then this process was not started by
a cygwin
         process.  So, we need to wait around until the process we've
just "execed"
         dies.  Use our own wait facility to wait for our own pid to
exit (there
         is some minor special case code in proc_waiter and friends to
accommodate
         this).

         If wr_proc_pipe exists, then it should be duplicated to the
child.
         If the child has exited already, that's ok.  The parent will
pick up
         on this fact when we exit.  dup_proc_pipe will close our end of
the pipe.
         Note that wr_proc_pipe may also be == INVALID_HANDLE_VALUE.
That will make
         dup_proc_pipe essentially a no-op.  */
      if (!newargv.win16_exe && myself->wr_proc_pipe)
        {
          myself->sync_proc_pipe ();        /* Make sure that we own
wr_proc_pipe
                                           just in case we've been
previously
                                           execed. */
          myself.zap_cwd ();
          myself->dup_proc_pipe (pi.hProcess);
        }

----- snip pinfo.cc ----

void
_pinfo::sync_proc_pipe ()
{
  if (wr_proc_pipe && wr_proc_pipe != INVALID_HANDLE_VALUE)
    while (wr_proc_pipe_owner != GetCurrentProcessId ())
      low_priority_sleep (0);
}

---------------------------

It seems "sync_proc_pipe" is looping forever because the condition
"wr_proc_pipe_owner != GetCurrentProcessId ()" is satisfied but never
left.

I updated cygwin core several times but this kind of error persists.

What gives?

Regards,

Robert Michelsen
--


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]