This is the mail archive of the cygwin@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: pthread_signal() references illegal memory address


Hello,

Attached, you will find a MUCH simplified version of
the problem I am having. I believe that it has all
the essential elements. I built the program with
  
   make

nothing more. I run it, and it continuously loops as
designed. I install it with cygrunsrv and it seems to
be installed, then I start it with cygrunsrv and it
always errs, so I am unable to tell you whether or
not it reproduces my problem. If it does, the program
should die rather than continuing to loop.

You might want to try it for yourself as you surely 
understand using cygrunsrv better than I do.  This is 
the best I can do. I would appreciate it to know if 
you are able to reproduce the problem.

Best regards,

Kern


On Mon, 2003-05-12 at 03:02, Igor Pechtchanski wrote:
> On 11 May 2003, Kern Sibbald wrote:
> 
> > Hello,
> >
> > In addition to the email you sent me last Thursday, which I
> > received just a few minutes later, I just now received another
> > copy apparently destined for david.postill@pobox.com, but some
> > how it got routed to me, the long, slow way (3 days).
> > Well, anything that goes to or through the Blue Yonder is
> > likely to be a bit slow ... :-)
> >
> > By the way, I could not run my app. with cygrunsrv.  I
> > don't know why, probably because both cygrunsrv and my app
> > are trying to talk to the service manager, so for the moment
> > I give up on this.
> 
> Kern,
> 
> cygrunsrv expects to be the one to talk to the service manager.  If your
> program also does, there's an obvious conflict of interest.  I was
> suggesting making a small command-line testcase, running it with
> cygrunsrv, and seeing if it exhibits the same kind of behavior your main
> program does.  If it doesn't, move code from your main application until
> the behavior is replicated (or until all of the main application except
> the service manager code is present).  If you still can't replicate the
> problem, it's probably in your service manager interface code, and you
> won't need it anyway with cygrunsrv (and you would have by that point a
> service that runs with cygrunsrv).  If the behavior is replicated, look
> into the code that was added last -- that's probably your culprit.  If you
> can replicate the behavior in a small example, send it to the list.
> 
> > Best regards,
> > Kern
> >
> > PS: I sent this off list on purpose -- I suspect there may be a
> > bug in the list program, or more likely a bug at David Postill's
> > site.
> 
> I don't see how this rates a private e-mail, especially to me.  If there
> is a bug in the list software, the list should know about it.  If there is
> a bug at David Posthill's site, he should know about it.  Please do not
> send private mail unless requested to do so.
> 	Igor
> P.S. I'm forwarding this whole e-mail to the list, as the below may be of
> interest to at least David Posthill and possibly others.
> 
> > Here it is the email mentioned above with headers turned on:
> >
> > ============= Copy of email just received =================
> > Return-Path: <cygwin-owner@cygwin.com>
> > Received: from blueyonder.co.uk (pcow025o.blueyonder.co.uk
> >         [195.188.53.125]) by matou.sibbald.com (8.11.6/8.11.6) with
> > ESMTP id
> >         h4BK6rf15398 for <kern@sibbald.com>; Sun, 11 May 2003 22:06:56
> > +0200
> > Received: from mail pickup service by blueyonder.co.uk with Microsoft
> >         SMTPSVC; Sun, 11 May 2003 19:18:26 +0100
> > Received: from pcol001m.blueyonder.net ([195.188.53.104]) by
> >         blueyonder.co.uk  with Microsoft SMTPSVC(5.5.1877.757.75); Fri,
> > 9 May 2003
> >         16:12:40 +0100
> > Received: from exim by pcol001m.blueyonder.net with relayed (Exim 4.12)
> > id
> >         19E974-0005xX-00 for david.postill@blueyonder.co.uk; Fri, 09 May
> > 2003
> >         15:44:54 +0100
> > Received: from [212.24.65.71] (helo=mutt.eurobell.net) by
> >         pcol001m.blueyonder.net with smtp (Exim 4.12) id
> > 19E973-0005xU-00 for
> >         david.postill@blueyonder.co.uk; Fri, 09 May 2003 15:44:41 +0100
> > Received: (qmail 13027 invoked from network); 8 May 2003 19:04:05 -0000
> > Received: from unknown (HELO kumquat.pobox.com) (64.119.218.72) by
> >         mailq1.blueyonder.co.uk with SMTP; 8 May 2003 19:04:05 -0000
> > Received: from kumquat.pobox.com (localhost.localdomain [127.0.0.1]) by
> >         kumquat.pobox.com (Postfix) with ESMTP id 986D659E98 for
> >         <david.postill@blueyonder.co.uk>; Thu,  8 May 2003 15:04:01
> > -0400 (EDT)
> > Delivered-To: david.postill@pobox.com
> > Received: from sources.redhat.com (sources.redhat.com [66.187.233.205])
> > by
> >         kumquat.pobox.com (Postfix) with SMTP id 655AC3E832 for
> >         <david.postill@pobox.com>; Thu,  8 May 2003 15:03:58 -0400 (EDT)
> > Received: (qmail 9121 invoked by alias); 8 May 2003 19:03:45 -0000
> > Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
> > Precedence: bulk
> > List-Unsubscribe:
> >         <mailto:cygwin-unsubscribe-david.postill=pobox.com@cygwin.com>
> > List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
> > List-Archive: <http://sources.redhat.com/ml/cygwin/>
> > List-Post: <mailto:cygwin@cygwin.com>
> > List-Help: <mailto:cygwin-help@cygwin.com>,
> >         <http://sources.redhat.com/ml/#faqs>
> > Sender: cygwin-owner@cygwin.com
> > Mail-Followup-To: cygwin@cygwin.com
> > Delivered-To: mailing list cygwin@cygwin.com
> > Received: (qmail 9114 invoked from network); 8 May 2003 19:03:45 -0000
> > Received: from unknown (HELO slinky.cs.nyu.edu) (128.122.20.14) by
> >         sources.redhat.com with SMTP; 8 May 2003 19:03:45 -0000
> > Received: from localhost (pechtcha@localhost) by slinky.cs.nyu.edu
> >         (8.11.7+Sun/8.11.7) with ESMTP id h48J3fM28152; Thu, 8 May 2003
> > 15:03:42
> >         -0400 (EDT)
> > X-Authentication-Warning: slinky.cs.nyu.edu: pechtcha owned process
> > doing
> >         -bs
> > Date: Thu, 8 May 2003 15:03:41 -0400 (EDT)
> > From: Igor Pechtchanski <pechtcha@cs.nyu.edu>
> > Reply-To: cygwin@cygwin.com
> > To: Kern Sibbald <kern@sibbald.com>
> > Cc: cygwin@cygwin.com
> > Subject: Re: pthread_signal() references illegal memory address
> > In-Reply-To: <1052391117.6139.1146.camel@rufus>
> > Message-ID: <Pine.GSO.4.44.0305081501120.22924-100000@slinky.cs.nyu.edu>
> > Importance: Normal
> > MIME-Version: 1.0
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> > X-Annoyance-Filter-Junk-Probability: 0
> > X-Annoyance-Filter-Classification: Mail
> >
> > On 8 May 2003, Kern Sibbald wrote:
> >
> > > Hello,
> > >
> > > Please don't think I'm not interested in this if
> > > it takes a bit of time to get back to you ...
> > >
> > > See responses below:
> > >
> > >
> > > On Mon, 2003-05-05 at 19:30, Igor Pechtchanski wrote:
> > > > On 5 May 2003, Kern Sibbald wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > On Mon, 2003-05-05 at 18:38, Igor Pechtchanski wrote:
> > > > > > On 5 May 2003, Kern Sibbald wrote:
> > > > > > [snip]
> > > > > > > Anyway here is one:
> > > > > > >
> > > > > > > Running WinXP Home version.
> > > > > > >
> > > > > > > Using Cygwin 1.3.20
> > > > > > >
> > > > > > > When running my program with LocalSystem userid
> > > > > > > as a service, doing a pthread_kill(thread_id, SIGUSR2)
> > > > > > > causes some sort of memory fault referencing memory at 0x3a
> > > > > > > (or something like that because the program disappears
> > > > > > > poof).
> > > > > > >
> > > > > > > Running as a normal user works fine.
> > > > > >
> > > > > > What's the exact error message (I assume you get a popup box)?
> > > > >
> > > > > No, I get absolutely nothing. Poof and it is gone, well, the
> > > > > service manager knows it went away but not why.
> > > > >
> > > > > A friend ran the program on Win2K and he got:
> > > > >
> > > > >       Instruction at 0x0041276a referenced memory at 0x3c
> > > > >
> > > > > That appears to be somewhere in the cygwin1.dll.
> > > >
> > > > Try checking the "Allow service to interact with the desktop" box,
> > and you
> > > > should see the error popup on your system too.
> > >
> > > My service always interacts with the desktop. It is capable of
> > > doing MessageBox(), and it always has an icon in the system tray
> > > with a menu that works.
> > >
> > > I get absolutely nothing in terms of output of any sort when
> > > the program crashes -- as I said, it goes poof.  This could
> > > be my own fault for trapping signals, but normally during
> > > signal handling there is a considerable amount of printout,
> > > ...
> > >
> > > > > > Is there a stacktrace file generated?
> > > > >
> > > > > If it is, I don't know where the system put it.
> > > >
> > > > The system should put it in the directory from which the program is
> > run.
> > >
> > > There is no stack dump or any other file in the directory from
> > > which the program (Bacula) executes.
> > >
> > > >
> > > > > >   Did you try setting
> > > > > > "error_start:c:/cygwin/bin/dumper.exe" in your CYGWIN
> > environment
> > > > > > variable?
> > >
> > > I doubt this would help much, maybe I am wrong, please see below.
> > >
> > > > >
> > > > > No, if you can tell me how to set the environment variable for
> > > > > a service, I'll try it, but since it is a service, I am unlikely
> > > > > to get any output.
> > > >
> > > > "cygrunsrv --help", or "man cygrunsrv", or see /bin/ssh-host-config
> > for an
> > > > example.  You might also need the "Allow service to interact with
> > desktop"
> > > > bit.
> > >
> > > None of the above mentioned things exist on my system. In any case,
> > > I have no problem setting the program up as a service (it installs
> > > itself with allowing interaction with the desktop by default).
> > >
> > > > > > Did you try running the program from the command line in a
> > > > > > LocalSystem-owned shell?
> > > > >
> > > > > I ran it in an rxvt shell under my id and it does not crash.
> > > > > Tell me how to get a LocalSystem owned shell and I will try
> > > > > it.  This is XP Home, so I don't have access to a lot of the
> > > > > XP security dialogs.
> > > >
> > > > "at <time> /interactive c:\cygwin\bin\bash.exe -i --login"
> > > > (<time> should be current time however long you're willing to wait,
> > at
> > > > least one minute).  "at /?" for help.
> > > > [Note, this works on Win2k, don't know about XP Home].
> > >
> > > Yes, your trick works on WinXP Home too. So much for Windows
> > > security!
> > >
> > > The interesting thing is that when I run the program under
> > > a rxvt window with the bash shell with the LocalSystem account,
> > > it does NOT crash.  I also ran the program under
> > > a MS DOS shell and I get the same result: it does
> > > not crash.
> > >
> > > It crashes only if it is started by the service dialog.
> > >
> > > > > > Can you provide a simple testcase that
> > > > > > reproduces your problem?
> > > > >
> > > > > Probably not as my program is some 65K+ lines of code.
> > >
> > > > You could try a simple program that calls the offending function
> > (after
> > > > creating some threads, most likely), and see if the problem
> > manifests...
> > >
> > > Well, I was considering doing so, since creating a thread
> > > and sending it a signal is a 10 line program. However, this
> > > problem requires the program to run as a service, and that
> > > is a considerable amount of code.
> >
> > Kern,
> >
> > That's what "cygrunsrv" is for!  It takes *any* command-line program and
> > turns it into a service. :-D  Try making a small command-line example
> > and
> > run it as a service using cygrunsrv (you'll have to install the
> > cygrunsrv
> > package).
> >         Igor
> >
> > > > > I've solved the problem for myself by doing the "signal"
> > > > > a different way, so it is not critical for me but it cost
> > > > > about 8 hours of debugging -- primarily due to the fact that
> > > > > it seems to be dependent on whether or not it is a service.
> > > > >
> > > > > Best regards,
> > > > > Kern
> > > >
> > > > It's most likely dependent on the value of your CYGWIN variable or
> > some
> > > > permissions (as the service runs as LocalSystem).  Trying the
> > program out
> > > > from a LocalSystem-owned window (see above) should give you some
> > idea of
> > > > what's at fault.
> > >
> > > I agree with you, but my CYGWIN environment variable is not set.
> > >
> > > If you have any other ideas I'll try them, otherwise, I'll avoid
> > > using pthread_signal() under CYGWIN.
> > >
> > > Best regards,
> > > Kern
#include "stdio.h"
#include "signal.h"
#include "pthread.h"
#include "unistd.h"

static int hb_bsock;
static pthread_t heartbeat_id;
static int stop;


#ifndef _NSIG
#define BA_NSIG 100
#else
#define BA_NSIG _NSIG
#endif

static const char *sig_names[BA_NSIG+1];

typedef void (SIG_HANDLER)(int sig);
static SIG_HANDLER *exit_handler;

/* 
 * Handle signals here
 */
static void signal_handler(int sig)
{
   static int already_dead = 0;

   if (already_dead) {
      _exit(1);
   }
   /* Ignore certain signals */
   if (sig == SIGCHLD || sig == SIGUSR2) {
      return;
   }
   printf("Got signal %d. Exiting.\n", sig);
   already_dead = sig;
   exit(1);
}


void init_signals(void terminate(int sig))
{
   struct sigaction sighandle;
   struct sigaction sigignore;
   struct sigaction sigdefault;
   exit_handler = terminate;
   sig_names[0]         = "UNKNOWN SIGNAL";
   sig_names[SIGHUP]    = "Hangup";
   sig_names[SIGINT]    = "Interrupt";
   sig_names[SIGQUIT]   = "Quit";
   sig_names[SIGILL]    = "Illegal instruction";;
   sig_names[SIGTRAP]   = "Trace/Breakpoint trap";
   sig_names[SIGABRT]   = "Abort";
#ifdef SIGEMT
   sig_names[SIGEMT]    = "EMT instruction (Emulation Trap)";
#endif
#ifdef SIGIOT
   sig_names[SIGIOT]    = "IOT trap";
#endif
   sig_names[SIGBUS]    = "BUS error";
   sig_names[SIGFPE]    = "Floating-point exception";
   sig_names[SIGKILL]   = "Kill, unblockable";
   sig_names[SIGUSR1]   = "User-defined signal 1";
   sig_names[SIGSEGV]   = "Segmentation violation";
   sig_names[SIGUSR2]   = "User-defined signal 2";
   sig_names[SIGPIPE]   = "Broken pipe";
   sig_names[SIGALRM]   = "Alarm clock";
   sig_names[SIGTERM]   = "Termination";
#ifdef SIGSTKFLT
   sig_names[SIGSTKFLT] = "Stack fault";
#endif
   sig_names[SIGCHLD]   = "Child status has changed";
   sig_names[SIGCONT]   = "Continue";
   sig_names[SIGSTOP]   = "Stop, unblockable";
   sig_names[SIGTSTP]   = "Keyboard stop";
   sig_names[SIGTTIN]   = "Background read from tty";
   sig_names[SIGTTOU]   = "Background write to tty";
   sig_names[SIGURG]    = "Urgent condition on socket";
   sig_names[SIGXCPU]   = "CPU limit exceeded";
   sig_names[SIGXFSZ]   = "File size limit exceeded";
   sig_names[SIGVTALRM] = "Virtual alarm clock";
   sig_names[SIGPROF]   = "Profiling alarm clock";
   sig_names[SIGWINCH]  = "Window size change";
   sig_names[SIGIO]     = "I/O now possible";
#ifdef SIGPWR
   sig_names[SIGPWR]    = "Power failure restart";
#endif
#ifdef SIGWAITING
   sig_names[SIGWAITING] = "No runnable lwp";
#endif
#ifdef SIGLWP
   sig_name[SIGLWP]     = "SIGLWP special signal used by thread library";
#endif
#ifdef SIGFREEZE
   sig_names[SIGFREEZE] = "Checkpoint Freeze";
#endif
#ifdef SIGTHAW
   sig_names[SIGTHAW]   = "Checkpoint Thaw";
#endif
#ifdef SIGCANCEL
   sig_names[SIGCANCEL] = "Thread Cancellation";
#endif
#ifdef SIGLOST
   sig_names[SIGLOST]   = "Resource Lost (e.g. record-lock lost)";
#endif


/* Now setup signal handlers */
   sighandle.sa_flags = 0;
   sighandle.sa_handler = signal_handler;
   sigfillset(&sighandle.sa_mask);
   sigignore.sa_flags = 0;
   sigignore.sa_handler = SIG_IGN;	 
   sigfillset(&sigignore.sa_mask);
   sigdefault.sa_flags = 0;
   sigdefault.sa_handler = SIG_DFL;
   sigfillset(&sigdefault.sa_mask);


   sigaction(SIGPIPE,	&sigignore, NULL);
   sigaction(SIGCHLD,	&sighandle, NULL);
   sigaction(SIGCONT,	&sigignore, NULL);
   sigaction(SIGPROF,	&sigignore, NULL);
   sigaction(SIGWINCH,	&sigignore, NULL);
   sigaction(SIGIO,	&sighandle, NULL);     

   sigaction(SIGINT,	&sigdefault, NULL);    
   sigaction(SIGXCPU,	&sigdefault, NULL);
   sigaction(SIGXFSZ,	&sigdefault, NULL);

   sigaction(SIGHUP,	&sigignore, NULL);
   sigaction(SIGQUIT,	&sighandle, NULL);   
   sigaction(SIGILL,	&sighandle, NULL);    
   sigaction(SIGTRAP,	&sighandle, NULL);   
/* sigaction(SIGABRT,	&sighandle, NULL);   */
#ifdef SIGEMT
   sigaction(SIGEMT,	&sighandle, NULL);
#endif
#ifdef SIGIOT
/* sigaction(SIGIOT,	&sighandle, NULL);  used by debugger */
#endif
   sigaction(SIGBUS,	&sighandle, NULL);    
   sigaction(SIGFPE,	&sighandle, NULL);    
   sigaction(SIGKILL,	&sighandle, NULL);   
   sigaction(SIGUSR1,	&sighandle, NULL);   
   sigaction(SIGSEGV,	&sighandle, NULL);   
   sigaction(SIGUSR2,	&sighandle, NULL);
   sigaction(SIGALRM,	&sighandle, NULL);   
   sigaction(SIGTERM,	&sighandle, NULL);   
#ifdef SIGSTKFLT
   sigaction(SIGSTKFLT, &sighandle, NULL); 
#endif
   sigaction(SIGSTOP,	&sighandle, NULL);   
   sigaction(SIGTSTP,	&sighandle, NULL);   
   sigaction(SIGTTIN,	&sighandle, NULL);   
   sigaction(SIGTTOU,	&sighandle, NULL);   
   sigaction(SIGURG,	&sighandle, NULL);    
   sigaction(SIGVTALRM, &sighandle, NULL); 
#ifdef SIGPWR
   sigaction(SIGPWR,	&sighandle, NULL);    
#endif
#ifdef SIGWAITING
   sigaction(SIGWAITING,&sighandle, NULL);
#endif
#ifdef SIGLWP
   sigaction(SIGLWP,	&sighandle, NULL);
#endif
#ifdef SIGFREEZE
   sigaction(SIGFREEZE, &sighandle, NULL);
#endif
#ifdef SIGTHAW
   sigaction(SIGTHAW,	&sighandle, NULL);
#endif
#ifdef SIGCANCEL
   sigaction(SIGCANCEL, &sighandle, NULL);
#endif
#ifdef SIGLOST
   sigaction(SIGLOST,	&sighandle, NULL);
#endif
}


static void *sd_heartbeat_thread(void *arg)
{
   pthread_detach(pthread_self());
   hb_bsock = 1;
   printf("HB thread started.\n");
   for ( ; !stop; ) {
      sleep(1000);
   }
   hb_bsock = 0;
   return NULL;
}

void start_heartbeat_monitor()
{
   stop = 0;
   hb_bsock = 0;
   pthread_create(&heartbeat_id, NULL, sd_heartbeat_thread, NULL);
}

/* Terminate the heartbeat thread. Used for both SD and DIR */
void stop_heartbeat_monitor() 
{
   /* Wait for heartbeat thread to start */
   while (hb_bsock == 0) {
      printf("Waiting for hb thread to start.\n");
      sleep(1);
   }

   stop = 1;

   while (hb_bsock) {
      /* Cygwin 1.3.20 craps out on the following */
      printf("Send sig %d\n", SIGUSR2);
      pthread_kill(heartbeat_id, SIGUSR2);  /* make heartbeat thread go away */
      sleep(1);
   }
}

void terminate(int sig)
{
   printf("Terminate handler.\n");
   exit(1);
}

int main(int argc, char **argv)
{
   init_signals(terminate);
   for ( ;; ) {
      printf("Start...\n");
      start_heartbeat_monitor();
      sleep(1);
      printf("Stop...\n");
      stop_heartbeat_monitor();
      printf("Start and stop complete.\n");
   }
}

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]