This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: How to make child of failed fork exit cleanly? (solved)


On 03/05/2011 2:49 PM, Ryan Johnson wrote:
Very strangely, when every child dies (including those automatically respawned by Windows), the parent also seg faults when calling gcc_deregister_frame on the same dll! If even one child survives (even if many had previously crashed), then no error arises. Even more strangely, if I break into a first child which has a good layout (no previous failures, current fork will succeed) and delay it long enough that the parent times out, the parent still suffers the seg fault! What shared state is there that could cause this to happen?

Disabling dll finalization completely when in_forkee==1 gets rid of the above problem, but occasionally I'll get a new error in the child:

CloseHandle(pinfo_shared_handle<0x610031BF>) failed void pinfo::release():1040, Win32 error 6
110356 [main] fork 10556 fork: child -1 - died waiting for longjmp before initialization, retry 0, exit code 0x100, errno 11


Sometimes, when the child dies as above, the parent will again seg fault while deregistering a dll (but not always).
Eureka!

Turns out that the pinfo class constructor was empty, leaving its fields uninitialized. In particular, pinfo::destroy and pinfo::procinfo were highly likely to both contain non-zero garbage values. Later, a call to pinfo::init() is supposed to initialize both. However, as the fork error says, the child "died... before initialization," causing the parent to jump to cleanup and run pinfo::~pinfo ()... which tries to release() garbage. That's why the bug doesn't arise if even one child makes it past this point -- pinfo::init would then be called and the destructor would do the right thing.

The problem would have bit folks off and on before, but my added fail-fast code path makes forks which were going to fail usually do so "before initialization."

The fix is easy, at least (pinfo.h):
-  pinfo () {}
+  pinfo () : procinfo(NULL), destroy(false) {}

At this point, the only thing left -- besides cleaning up my fork handling code changes to make a patch -- is to verify that it's ok to not run any dll finalizers in the child if the fork fails. Empirically it seems to do the right thing (child processes no longer fault), but I don't know enough about the code base to say with confidence that no corner cases exist.

Ryan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]