This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Intermittent failures retrieving process exit codes - snapshot test requested


On 01/01/2013 12:36 AM, Christopher Faylor wrote:
On Mon, Dec 31, 2012 at 08:44:56PM -0500, Tom Honermann wrote:
I'm still seeing hangs in the latest code from CVS.  The stack traces
below are from WinDbg.

I'm not asking you to build this yourself. I have no way to know how you are building this. Please just use the snapshots at

http://cygwin.com/snapshots/

I was building it myself so that I could debug it without having to specify debug source paths and such. I believe my builds are not unconventional. I used options that disabled frame pointer omission so that the resulting binaries could be debugged with non-gcc debuggers.


$ mkdir build
$ cd build
$ ../src/configure \
    CFLAGS="-g" \
    CXXFLAGS="-g" \
    CFLAGS_FOR_TARGET="-g" \
    CXXFLAGS_FOR_TARGET="-g" \
    --enable-debugging \
    --prefix=$HOME/src/cygwin-latest/install -v
$ make
$ make install

I manually resolved the symbol references within
the cygwin1 module using the linker generated .map file.  Since the .map
file does not include static functions, some of these may be incorrect -
I didn't try and verify or correct for this.

Thanks for trying, but the output below is garbled and not really useful. If you are not going to dive in and attempt to fix code yourself then all we normally need is a simple test case. WinDbg is not really appropriate for debugging Cygwin applications.

The output below is not garbled, but I didn't explain it clearly enough. Lines with frame numbers come directly from WinDbg. Since WinDbg is unable to resolve symbols to gcc generated debug info, the symbol references within the cygwin1 module are incorrect. In those cases, I manually resolved the instruction pointer address using the RetAddr value from the prior frame and searching the linker generated cygwin1.map file. I then pasted the mangled name on a line following the WinDbg line (with the incorrect symbol name) and, if the symbol is a C++ one, the unmangled name on an additional line.


For the stack fragment below, address 610f1553 == strtosigno+0x357 == __ZN4muto7acquireEm == muto::acquire(unsigned long). I did not translate offsets for the functions as I resolved them, nor did I try and verify they are correct (ie, that the return address is not for a static function that is not represented in the .map file)

  # ChildEBP RetAddr
00 00288bd0 758d0a91 ntdll!ZwWaitForSingleObject+0x15
01 00288c3c 76c11194 KERNELBASE!WaitForSingleObjectEx+0x98
02 00288c54 76c11148 kernel32!WaitForSingleObjectExImplementation+0x75
03 00288c68 610f1553 kernel32!WaitForSingleObject+0x12
04 00288cb8 6118e54d cygwin1!strtosigno+0x357
                              __ZN4muto7acquireEm
                              muto::acquire(unsigned long)
[snip]

The reason for using WinDbg is that, from what I understand, gdb is unable to produce accurate stack traces when the call stack includes frames for functions that omit the frame pointer and do not have debug info that gdb can process. I believe many Microsoft provided functions in ntdll, kernel32, kernelbase, etc... do omit the frame pointer and only provide debug info in the PDB format - which gdb is unable to use. Compiling Cygwin without frame pointer omission, and using WinDbg therefore provides the most accurate stack trace. If I am incorrect about any of this, I would very much appreciate a correction and/or explanation.


I downloaded the latest snapshot (2012-12-31 18:44:57 UTC) and was able to reproduce several issues which are described below.

All of these issues occur when using ctrl-c to interrupt the infinite loop in the test case(s) I've been using to debug inconsistent exit codes. When ctrl-c is pressed, I've observed the following:

1) Programs are (generally) terminated as expected. cmd.exe prompts to "Terminate batch job" as expected.

2) An access violation occurs and a processor context is dumped to the console. I do not yet have stack traces for these cases.

3) One of the processes hangs.

access violations occur in ~20% of test runs. Hangs occur in ~5% of test runs.

I did not provide a test case previously because I don't have an automated reproducer at present. All sources needed to reproduce the issues are below. The test case uses a .bat file to avoid dependencies on bash so as to minimally isolate the problem.

To reproduce the issues, copy test.bat, false-cygwin32.exe, and expect-false-execve-cygwin32.exe to a Cygwin bin directory and run test.bat from a cmd.exe console. Press ctrl-c to interrupt the test. Repeat until problems are observed. I have not been able to reproduce these symptoms when running the test via a MinTTY console.

I have been unable to get useful stack traces from hung processes using gdb. gdb reports that the debug information in cygwin1-20130102.dbg.bz2 does not match (CRC mismatch) the cygwin1.dll module in cygwin-inst-20130102.tar.bz2.


$ cat expect-false-execve.c #include <errno.h> #include <stdio.h> #include <sys/wait.h> #include <unistd.h>

int main(int argc, char *argv[]) {
    pid_t child_pid, wait_pid;
    int result, child_status;

    if (argc != 2) {
        fprintf(stderr, "expect-false: Missing or too many arguments\n");
        return 127;
    }

child_pid = fork();
if (child_pid == -1) {
fprintf(stderr, "expect-false: fork failed. errno=%d\n", errno);
return 127;
} else if (child_pid == 0) {
result = execlp(argv[1], argv[1], NULL);
if (result == -1) {
fprintf(stderr, "expect-false: execlp failed. errno=%d\n", errno);
}
_exit(127);
}


do {
wait_pid = waitpid(child_pid, &child_status, 0);
} while(
(wait_pid == -1 && errno == EINTR) ||
(wait_pid == child_pid && !(WIFEXITED(child_status) || WIFSIGNALED(child_status)))
);
if (wait_pid == -1) {
fprintf(stderr, "expect-false: waitpid failed. errno=%d\n", errno);
return 127;
}
if (!WIFEXITED(child_status)) {
fprintf(stderr, "expect-false: child process did not exit normally\n");
return 127;
}
if (WEXITSTATUS(child_status) != 1) {
fprintf(stderr, "expect-false: unexpected exit code: %d\n", child_status);
}


    return WEXITSTATUS(child_status);
}


$ cat false.c #include <stdio.h>

int main() {
    printf("myfalse\n");
    return 1;
}


$ cat test.bat @echo off setlocal

set PATH=%CD%;%PATH%

:loop
echo test...
expect-false-execve-cygwin32.exe false-cygwin32
if not errorlevel 1 (
    echo exiting...
    exit /B 1
)
goto loop


$ gcc -o expect-false-execve-cygwin32.exe expect-false-execve.c $ gcc -o false-cygwin32.exe false.c

From a cmd.exe console: (press ctrl-c once the test is running)
C:\...\cygwin\bin>test
test...
myfalse
test...
myfalse
...


Tom.



-- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]