This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Incorrect order of static dtors in DLL CRT?


On Sun, Aug 03, 2008 at 11:58:21PM +0100, Dave Korn wrote:
>
>    Evening all,
>
>  I've learnt everything there is to know about pretty much everything that
>can possibly go wrong in the world of Dwarf-2 EH over the past little while,
>but for the purposes of this discussion only a few facts are germane:
>
>[ and if you don't want to know, page down a couple of times to the obvious
>break. ]
>
>
>
>-  Dwarf-2 EH tables are linked into the runtime exception handling
>mechanism at startup by using static ctors; the main exe and all the shared
>libs have one static .ctors entry each that points to a thunk that calls
>__register_frame_info with a pointer to that module's exception data.
>
>-  Similarly, the main exe and all the libs each have a .dtors entry that
>points to a thunk that loads a pointer to their exception data and calls
>__deregister_frame_info.
>
>-  When you throw an exception, the first bit of the stack it's going to
>need to be able to unwind is its own stack in _Unwind_RaiseException,
>because it's got to work from there back up to the user code.
>_Unwind_RaiseException is part of shared libgcc, and so the information
>necessary for it to be able to unwind its way back to the user's code is
>part of the shared libgcc's EH tables.
>
>-  So: if you throw an exception during the final stages of cleanup, after
>the shared libgcc's dtors have run and deregistered the table with all the
>EH frame info for the shared libgcc dll, it isn't possible to unwind the
>stack and throwing fails; the application aborts.
>
>-  This shouldn't be a problem, since anything which might possibly throw at
>shutdown time - the exe, and other C++-using dlls (e.g.: shared libstdc++) -
>all depend on shared libgcc, so it'll be the last thing to get unloaded.
>
>-  Libstdc++ does, indeed, throw exceptions during shutdown.
>
>
>  That's how it ought to be, but that's not quite what happens.  I set ran a
>testcase (27_io/objects/char/6.cc) under gdb, with breakpoints on
>__register_frame_info and __deregister_frame_info; when it hits, I checked
>the backtrace to see what module's static ctors or dtors were being called.
>What I saw was this:
>
>----------------------------<snip>----------------------------
>Breakpoint 11, 0x63546af3 in cyggcc_s!__register_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#0  0x63546af3 in cyggcc_s!__register_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#1  0x63541041 in ?? ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>
>Continuing.
>
>Breakpoint 11, 0x63546af3 in cyggcc_s!__register_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#0  0x63546af3 in cyggcc_s!__register_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#1  0x6c481041 in ?? ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cygstdc++-6.dll
>
>Continuing.
>
>Breakpoint 11, 0x63546af3 in cyggcc_s!__register_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#0  0x63546af3 in cyggcc_s!__register_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#1  0x00401091 in __gcc_register_frame ()
>----------------------------<snip>----------------------------
>
>  So, that's the static ctors for the shared libgcc, then libstdc++, then
>the main exe, all being called in the correct order of dependency.  Lots of
>detail snipped (full log available if wanted), but the main thing it would
>show you is those backtraces originating in per_module::run_ctors for the
>two dlls, and in do_global_ctors for the main exe.
>
>  After main exits, though, we start to see the shutdown sequence:
>
>----------------------------<snip>----------------------------
>Breakpoint 13, 0x63547c63 in cyggcc_s!__deregister_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#0  0x63547c63 in cyggcc_s!__deregister_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#1  0x63541099 in ?? ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>
>Continuing.
>
>Breakpoint 13, 0x63547c63 in cyggcc_s!__deregister_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#0  0x63547c63 in cyggcc_s!__deregister_frame_info ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cyggcc_s.dll
>#1  0x6c481099 in ?? ()
>   from /win/i/FSF-Gcc/release/gcc4-4.3.0-1/inst/usr/bin/cygstdc++-6.dll
>
>Breakpoint 6, __static_initialization_and_destruction_0 (__initialize_p=0, 
>    __priority=65535)
>    at
>/gnu/gcc/release/gcc4-4.3.0-1/src/gcc-4.3.0/libstdc++-v3/testsuite/27_io/obj
>ects/char/6.cc:60
>60	}
>----------------------------<snip>----------------------------
>
>  Ah, whoops.  That last breakpoint isn't like the others.  That's the
>static dtors for the main exe, sure enough, but it's not got as far as
>calling __deregister_frame_info for the main exe's EH tables, and it's not
>going to, because before it does that it's going to try and throw an
>exception.  And we saw libgcc deregistering the EH data for
>_Unwind_RaiseException just earlier, so it's going to blow up.
>
>  The actual cause of the exception, as it happens, is libstdc++: it
>instantiates a static object in the program's data space, which is used to
>throw an exception as part of the iostream cleanup.  I don't understand this
>mechanism or why it works the way it does, but that's OK; it's trying to
>throw, is all that I need to understand, and even if libstdc++ wasn't doing
>it, there's no reason in principle why the main exe might not be doing it
>anyway; it's supposed to work, even at this late stage of the proceedings.
>
>  So, the thing that went wrong there was that all the dtors got called in
>the completely wrong order.  They were called in the same sequence as the
>ctors - libgcc, then libstdc++, then main.exe.  That's the wrong order of
>course, it should have been the other way round, and then everything would
>have worked fine - main's dtors would have been called first, destroyed the
>static libstdc++ iostream object, thrown and caught the exception, then
>unregistered main's exception tables and exited; then libstdc++ would have
>unregistered its own EH tables, and finally libgcc would deregister the
>critical EH table containing _Unwind_RaiseException and everyone could have
>gone home happily.
>
>
>
>
>
>
>
>
>
>
>[  END OF LONG EXPOSITION   -   START OF PART TWO  -  IF YOU WANT TO GO GET
>A CUP OF TEA NOW MIGHT BE A GOOD TIME!  ]
>
>
>
>
>
>
>
>
>  I've identified two reasons why this happens.  The first is because of
>this snippet from dcrt0.cc:
>
>  1112  extern "C" void
>  1113  cygwin_exit (int n)
>  1114  {
>  1115    dll_global_dtors ();
>  1116    if (atexit_lock)
>  1117      atexit_lock.acquire ();
>  1118    exit (n);
>  1119  }
>
>  It calls dll_global_dtors, which as the name suggests invokes all the
>global dtors for the application's dlls - but this is too soon; it's before
>the application's dtors have been called.  They'll be called shortly, when
>this function hands off to exit() from newlib, which in turn runs the
>atexit() list, which in turn calls the main static dtors for the exe.
>
>  As far as I know the app should always be destroyed before the libs are
>destroyed and unloaded, since it's completely reasonable for the app to
>still, for example, be using those libs and any resources they allocated in
>the dtors of static objects it defines.  So calling dll_global_dtors before
>exit AFAICT is just never going to be correct.
>
>  That's ok!  Because it turns out you can just delete that line, and you
>still get saved by this snippet:
>
>   998  void __stdcall
>   999  do_exit (int status)
>  1000  {
>  1001    syscall_printf ("do_exit (%d), exit_state %d", status,
>exit_state);
>  1002
>  1003  #ifdef NEWVFORK
>            [ ... elided ... ]
>  1010  #endif
>  1011
>  1012    lock_process until_exit (true);
>  1013
>  1014    if (exit_state < ES_GLOBAL_DTORS)
>  1015      {
>  1016        exit_state = ES_GLOBAL_DTORS;
>  1017        dll_global_dtors ();
>  1018      }
>
>which gets called during proper shutdown, *after* the main.exe's dtors have
>run to completion.  It's also worth observing that dll_global_dtors is
>idempotent: it uses a runonce guard, so it doesn't matter if it gets
>over-called - but it /does/ matter if it gets called too early.

You should check the cvs history for this stuff.  I have been tweaking
it for years and I believe I tried to remove it from cygwin_exit at one
point.  I don't remember why I put it back.  If only there was some way
to record thoughts and observations in the code itself we wouldn't have
to guess...

>  There's still a problem though: the DLLs themselves are still dtor'd in
>the same order they were c'tor'd.  It works out OK here, because neither
>libgcc nor the libstdc++ DLL want to throw any exceptions from static dtors,
>or rather, libstdc++ does, but its static dtors are part of the main.exe, so
>they've already run.  But it's still wrong in theory and if there was a
>third C++ library in the mix, say a user-written one that depended on
>libstdc++, it might still want to throw at static dtor time and it would
>fail.
>
>  The reasons the DLLs are run in the wrong order is also simple enough:
>that's what the code says to do, here, in dll_init.cc:
>
>    26  static bool dll_global_dtors_recorded;
>    27
>    28  /* Run destructors for all DLLs on exit. */
>    29  void
>    30  dll_global_dtors ()
>    31  {
>    32    int recorded = dll_global_dtors_recorded;
>    33    dll_global_dtors_recorded = false;
>    34    if (recorded)
>    35      for (dll *d = dlls.istart (DLL_ANY); d; d = dlls.inext ())
>    36        d->p.run_dtors ();
>    37  }
>
>  That's just walking the chain of dlls from start to finish, invoking the
>dtors.  The chain is in dependency order; it's walked the same way at

The chain isn't really in dependency order.  It's in load order but changing
it to be in reverse order is probably the right thing to do.

>  I /think/ the right things to do are remove the dll_global_dtors
>invocation from within cygwin_exit, and to reverse the order of iteration
>along the list of dlls within it.

I don't know if the Cygwin test suite is even working these days but it
seems like removing the code in question (after investigating CVS) and
doing a before/after test suite run is one way to see if there will be
problems.

It sure would be nice if we had someone who wanted to write more tests
for the test suite.  You don't even have to understand cygwin internals to
do this.  You just have to know how to write C code.

Hey.  It's been years since I made that plaintive observation.  I must be
slipping.

cgf


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]