This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ASLR sometimes stops working on Vista with 1.7? [was: Re: Cygwin 1.7 release (was ...)]


Corinna Vinschen wrote:
> I can reproduce the "unable to remap" on W7RC by running `cygport
> automake1.11-1.11-10 compile'.

Uhhh...I'm glad to hear that? Or not...

>  The culprit in my case is always the
> same DLL, a run-time loaded perl DLL called Cwd.dll.  Even after
> rebaseall, it still doesn't work because the Windows Loader tries to
> load the DLL into an entirely different address.

You did reboot, right? IIRC Windows only calculates the new base
address(es) for dynbase-marked DLLs loaded after a boot. Also to "turn
off" dynbase for a particular DLL, you have to reboot after removing the
dynbase flag, so that the OS "forgets" about its randomized Image Base
computed for the current boot session, when it had the +dynbase flag.

> When examining the memory layout of the parent, it stands out that
> Cwd.dll was already loaded into another address than the DLLs base
> address.  The base addr of Cwd.dll is 0x6ee00000, the end address would
> be 0x6ee08000. 

Is that the ASLR-computed Image Base, or the one reported by objdump
from the file on disk? See below.

> There's no other DLL in this memory area according to
> the memory map.  Nevertheless the DLL has been loaded into the rather
> low address 0xa00000 in the parent.  When trying to map this DLL into
> the same address in the child, it fails.

Right.  ASLR isn't putting the DLL in the correct location *in the
parent*. (See below).

> When I rebase Cwd.dll to some other address like 0x65000000, then it
> works for me.
> 
> Probably the memory at 0x6ee00000 is actually used by some Windows DLL
> at that time.  The fact that the DLL got rebased already in the parent
> is not exactly surprising, just very annoying.
> 
> I don't think that this has anything to do with ASLR.  It's not the way
> ASLR is documented to work.  Setting or resetting the ASLR flag should
> have no effect from all I can tell.  If anything, setting the ASLR
> flag in the executable should make things worse in case of fork().

ASLR works on DLLs, not EXEs IIRC. peflagsall by default doesn't even
set +dynbase on EXEs.

echo "Note: peflagsall will NOT set the dynamicbase flag on executables,
nor will"
echo "      it set the tsaware flag on dlls. If you must do this, use
peflags itself"


At each boot, an entirely new random starting location is computed (call
it the "uber-base" address).  Then, as each dynbase-marked DLL is first
loaded into memory for this boot session, it is "rebased" to the next
available memory location starting at the "uber-base" address.  These
new base addresses are remembered for each DLL.  In effect, the OS
"pretends" that the Image Base is the new randomized Base address;
obviously for this initial load of any given dynbase-marked DLL, the
Base address will be equal to the (random) Image Base.  Since the new
(random) Image Base is hopefully NOT the same as the actual, on-disk
Image Base, the DLL will have had relocations updated even though a
Process Viewer will report that the in-memory DLL's Base addr == its
"Image Base" -- it's just that the (new) Image Base is the ASLR-directed
lie.

So, *after* a reboot following marking cygwin DLLs with dynbase, IF you
have *ever* successfully loaded Cwd.dll into memory, then that "slot" --
the new "random base address" for Cwd -- is forever [*] reserved for
Cwd.dll and no other (dynbase-marked) DLL, including windows ones,
should ever be allowed by the OS to conflict with that reserved slot. At
least, until you run out of ASLR-tracked memory addrs.

You can tell what the ASLR-reserved base address for a DLL is by looking
 at a running process (e.g. in sysinterals process viewer) that has
loaded the DLL, and look at what PV reports is the Image Base. (It ought
to match the reported 'Base' address, if ASLR is working).  However,
this 'Image Base' will be *different* from what objdump reports when
examining the DLL on disk.  That's your proof that ASLR has computed a
new (random) base address for this DLL, for this boot session: the
claimed Image Base of an in-memory DLL doesn't match the
objdump-reported Image Base of the DLL on disk.

[*] at least until the next reboot.

> This is entirely the good old fork() problem trying to get the memory
> layout of the child into the same shape as in the parent.

Right -- but the problem is that ASLR is not setting up the memory
layout the way that it promised to, in the parent.

> This is really a bad problem since it seem to have gotten even worse
> with W7.

Crap.

> I think I'm going to ask MSFT if there's any workaround for
> this problem.

If my understanding of ASLR is correct, then ASLR *ought* to have solved
this problem, except for systems with a LOT of dynbase-marked DLLs that
have been loaded during the same boot session, such that you "run out"
of ASLR-tracked addresses (The ASLR mappings are shared across all
processes, are persistent for the entire logon session, I think -- so
you could eventually run out).

But IMO it is not working, for some reason, with the perl DLLs.  Note
that it's not always Cwd.dll.  If you reboot, rename Cwd.dll to
something else, and keep going, a few things will happen:
 1) perl won't work quite right, because the Cwd.dll really is needed by
the scripts that 'use Cwd;'
 2) ignoring that, keep going. Eventually the remap problem will hit
another perl DLL. In my case, Posix.dll.

I don't think this is the fault of these DLLs, per se.  It's just that
perl has a lot of 'em, and when repetitively running the autotools it
always uses the same set.  Plus, you tend to have a TON of individual
perl processes that run, and each time, that perl is going to dlopen
those DLLs. This ought to be where ASLR shines...but one of those times,
the DLL gets LoadLibraried to the wrong location. This is fine, until
THAT process happens to fork.  Bang, you're dead.

Could it be possible that cygwin's dlopen (or fork) implementation is
doing something that occasionally defeats ASLR, such that eventually a
perl parent process [**] dlopen's Cwd.dll at the wrong memory location?

[**] obviously this perl "parent" process was itself invoked as a
fork/exec from, say, bash, but we've long since gotten past the exec()
for perl, if we're down to dlopening DLLs needed by virtue of 'use'
statements in a particular .pl script

Hmmm...what if it's a race condition in fork/exec during a chain of
perl's? Let's take a look at what happens in autoreconf...(note that
this is all supposition. I hope it is accurate, and believe it is
reasonable so, but I haven't explicitly straced the process)

perl1 -- This one was fork/exec'ed by bash. So, after fork() you have a
child process whose memory looks just like the parent bash, with all of
its DLLs.  Then, you exec() perl, which eventually causes CreateProcess
to do its thing -- the windows loader loads cygwin1.dll,
cygperl5_10.dll, and various dependent DLLs (which do NOT include any of
the little perl DLLs like Cwd).  Eventually, we get to perl's main(),
and it parses some script. First thing it does is see some 'use'
statements that (may) force it to dlopen some "little" perl DLLs like
Cwd. This works as expected. dynbase and all.

Then, suppose the script that perl1 is interpreting (autoreconf, in this
case) has a fork(). Maybe to run an OS command, or one of the other
autotools like aclocal (here's an example from autoreconf-2.63):
    xsystem ("$aclocal $flags");
xsystem is an AutoM4te function that eventually calls "system (@_)",
where system() is obviously implemented by cygperl5_10.dll  Now, aclocal
is also a perl script.

So, deep in the bowels of perl's system() implementation, first thing
that happens is a call to cygwin's fork() implementation -- cygwin
successfully reproduces the memory layout of the autoreconf perl1,
Cwd.dll and all. But, this is in fact a different process than perl1 --
call it perl2. Then, there's an exec() in perl's system()
implementation.  Cygwin eventually figures out that aclocal is a perl
script, with a #!/bin/perl shebang, and realizes it needs to
CreateProcess perl.exe with a command line containing the original
argv[0].  In this third process (perl3), only cygwin1.dll,
cygperl5_10.dll, and dependencies are initially loaded (by the windows
Runtime Loader). NOT Cwd.dll or any of the "little" DLLs that both perl1
and perl2 have.

Then some magic [A] happens, and perl2 goes away, and cygwin magically
connects perl1 and perl3 as parent and child.

Next, perl3 starts parsing the aclocal script...and eventually hits the
'use' statements, including 'use Cwd;'  THIS time, however, when it
dlopen's Cwd.dll the library is loaded into the "wrong" address in
perl3's virtual memory layout. Now, this is not an immediate problem,
because we're not (yet) trying to "match" any existing memory layout.
The question is -- WHY does it happen? I'm wondering if somewhere in the
magic [A] above, in the perl2 process, cygwin is marking the memory in
perl3 at the location of the dlopen'ed libraries in perl1/2 as used, and
hasn't yet gotten around to realizing that those locations are, in fact,
not used by perl3?  So that when dlopen() is called, the virtual memory
map of perl3 has the memory needed by Cwd.dll (that is, the
ASLR-computed "fake" Image Base) marked as used, such that when dlopen
eventually delegates to LoadLibrary, LoadLibrary has no choice but to
find somewhere else to put it.  If so, this would be a possible problem
for ANY dll dlopened by both parent (perl1) and child (perl3).  Given
the sporadic nature, I'm wondering about a race condition in the
fork/exec [A] magic...

In that case, it wouldn't be ASLR's fault. It (and the apparent
sensitivity of cygwin-1.7 to the issue as compared to cygwin-1.5) could
be explained by a combination of changes in cygwin-1.7's fork()
implementation, coupled with Vista/W7's CreateProcess not behaving in
exactly the way we expect given XP and older's behavior. [***]

Anyway, perl3 (the aclocal) process, continues merrily until it, too,
hits a fork/exec -- maybe because of this line:
	xsystem ('cp', $src, $dest);
At this point, it's a standard remap problem: cygwin does the fork()
which creates perl4, and while manually loading all the DLLs currently
in use by perl3 and trying to ensure the memory map of perl3 and perl4
match, Cwd.dll is loaded -- perhaps into the "correct" (ASLR-directed)
Image Base.  But perl3 had Cwd in the "wrong" (low) address...and bang.
We never get to the exec("cp",...) step in perl4, and never try to
create the cp5 process.

[***] if this sounds reasonable, I'll take a look at the 1.7 changes in
fork() but it'll have to wait until Sunday or Monday 'cause of other
commitments. Unless somebody beats me to it.

--
Chuck

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]