This is the mail archive of the
cygwin-patches
mailing list for the Cygwin project.
Re: Improvements to fork handling (1/5)
On Sun, May 22, 2011 at 09:04:40AM -0400, Ryan Johnson wrote:
>On 21/05/2011 9:41 PM, Christopher Faylor wrote:
>> On Wed, May 11, 2011 at 02:31:23PM -0400, Ryan Johnson wrote:
>>> Hi all,
>>>
>>> This is the first of a series of patches, sent in separate emails as
>>> requested.
>>>
>>> The first patch allows a child which failed due to address space
>>> clobbers to report cleanly back to the parent. As a result, DLL_LINK
>>> which land wrong, DLL_LOAD whose space gets clobbered, and failure to
>>> replicate the cygheap, generate retries and dispense with the terminal
>>> spam. Handling of unexpected errors should not have changed. Further,
>>> the patch fixes several sources of access violations and crashes,
>>> including:
>>> - accessing invalid state after failing to notice that a
>>> statically-linked dll loaded at the wrong location
>>> - accessing invalid state while running dtors on a failed forkee. I
>>> follow cgf's approach of simply not running any dtors, based on the
>>> observation that dlls in the parent (gcc_s!) can store state about other
>>> dlls and crash trying to access that state in the child, even if they
>>> appeared to map properly in both processes.
>>> - attempting to generate a stack trace when somebody in the call chain
>>> used alloca(). This one is only sidestepped here, because we eliminate
>>> the access violations and api_fatal calls which would have triggered the
>>> problematic stack traces. I have a separate patch which allows offending
>>> functions to disable stack traces, if folks are interested, but it was
>>> kind of noisy so I left it out for now (cygwin uses alloca pretty
>>> liberally!).
>>>
>>> Ryan
>>> diff --git a/child_info.h b/child_info.h
>>> --- a/child_info.h
>>> +++ b/child_info.h
>>> @@ -92,6 +92,18 @@ public:
>>> void alloc_stack_hard_way (volatile char *);
>>> };
>>>
>>> +/* Several well-known problems can prevent us from patching up a
>>> + forkee; when such errors arise the child should exit cleanly (with
>>> + a failure code for the parent) rather than dumping stack. */
>>> +#define fork_api_fatal(fmt, args...) \
>>> + do \
>>> + { \
>>> + sigproc_printf (fmt,## args); \
>>> + fork_info->handle_failure (-1); \
>>> + } \
>>> + while(0)
>>> +
>>> +
>>> class fhandler_base;
>>>
>>> class cygheap_exec_info
>>> diff --git a/dll_init.cc b/dll_init.cc
>>> --- a/dll_init.cc
>>> +++ b/dll_init.cc
>>> @@ -19,6 +19,7 @@ details. */
>>> #include "dtable.h"
>>> #include "cygheap.h"
>>> #include "pinfo.h"
>>> +#include "child_info.h"
>>> #include "cygtls.h"
>>> #include "exception.h"
>>> #include<wchar.h>
>>> @@ -131,10 +132,16 @@ dll_list::alloc (HINSTANCE h, per_proces
>>> {
>>> if (!in_forkee)
>>> d->count++; /* Yes. Bump the usage count. */
>>> + else if (d->handle != h)
>>> + fork_api_fatal ("Location of %W changed from %p (parent) to %p (child)",
>>> + d->name, d->handle, h);
>> You seem to be guranteeing a failure in a condition which could conceivably work
>> ok for simple applications, i.e., if a dll loads in a different location that
>> is not necessarily going to cause a problem.
>By fork semantics the condition *is* a failure. If we try to relax the
>requirement we risk Bad Things happening, usually in hard-to-diagnose
>ways. The example I have right off is libgcc_s storing pointers to other
>dlls and seg faulting when it tries to access pointers which were valid
>in the parent but not in the child. I prefer a fail-fast solution over
>cross-fingers-and-hope-it-doesn't-happen-to-me.
When you add a failure case like this you are assuming that you
understand all of the parameters and that it will make things better. I
am not convinced that this won't cause previously working cases to fail.
It is not inconceivable that a DLL could be relocated into another
location and continue to work in a forked process. Yes, I know this
doesn't match the way fork is supposed to work but I'm not as concerned
about that as I am about Cygwin mailing list complaints about new
failures.
The reason I'm objecting to this is because I've considered, from time
to time, adding a similar test but have always avoided it because I
couldn't convince myself that it would help more than it would hurt. If
we are going to add tests, I'd prefer that the testing be done in
frok:parent when the child_copy happens for static and dynamic dlls,
maybe by adding a dll function which first checks that the data/bss can
be copied to the same location as the parent.
>As it is, I'm pretty nervous that Bad Things could happen at some point
>with windows dlls mapping to the wrong location (we detect that only
>when they clobber something cygwin needs), but we can hope that few apps
>which fork() are also heavy windows dlls.... and cross our fingers.
>
>>> d->p = p;
>>> }
>>> else
>>> {
>>> + if (in_forkee)
>>> + system_printf ("Unexpected dll loaded during fork: %W", name);
>>> +
>>> /* FIXME: Change this to new at some point. */
>>> d = (dll *) cmalloc (HEAP_2_DLL, sizeof (*d) + (namelen * sizeof (*name)));
>>>
>>> @@ -371,7 +378,6 @@ dll_list::load_after_fork (HANDLE parent
>>> preferred_block = reserve_at (d->name, (DWORD) h);
>>>
>>> }
>>> - in_forkee = false;
>>> }
>>>
>>> struct dllcrt0_info
>>> diff --git a/dll_init.h b/dll_init.h
>>> --- a/dll_init.h
>>> +++ b/dll_init.h
>>> @@ -57,7 +57,7 @@ struct dll
>>> int init ();
>>> void run_dtors ()
>>> {
>>> - if (has_dtors)
>>> + if (has_dtors&& !in_forkee)
>>> {
>>> has_dtors = 0;
>>> p.run_dtors ();
>> Isn't this already handled in dll_init.cc?
>Yes. I didn't notice you had checked that into CVS (you hadn't yet the
>time I did look). However, the above does have the advantage of residing
>in one location rather than 2+.
My change was dated 2011/05/04. That predates your patch by a week.
It's not an advantage to dive into the function multiple times when you
can short-circuit repeated attempts to call it once.
cgf