This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

autossh broken with current openssh/cygwin


I'm not sure if it is due to changes in openssh, or changes in Cygwin,
but the current autossh package fails to work.  Instead of detecting
that the connection is alive, it seems to continuously timeout and
recycle the ssh process.  Here is a representative testcase:

$ AUTOSSH_FIRST_POLL=5 AUTOSSH_POLL=5 AUTOSSH_DEBUG=yes autossh -M 30000
-N dessent.net
autossh: PID 3204: short poll time: adjusting net timeouts to 2500
autossh: PID 3204: checking for grace period, tries = 0
autossh: PID 3204: starting ssh (count 1)
autossh: PID 3204: ssh child pid is 4160
autossh: PID 4160: execing /usr/bin/ssh
autossh: PID 3204: check on child 4160
autossh: PID 3204: set alarm for 5 secs
autossh: PID 3204: timeout on io poll, looping to accept again
autossh: PID 3204: too many loops without data
autossh: PID 3204: error on poll: Socket operation on non-socket
autossh: PID 3204: port down, restarting ssh
autossh: PID 3204: checking for grace period, tries = 0
autossh: PID 3204: starting ssh (count 2)
autossh: PID 4728: execing /usr/bin/ssh
autossh: PID 3204: ssh child pid is 4728
autossh: PID 3204: check on child 4728
autossh: PID 3204: set alarm for 5 secs
autossh: PID 3204: not what I sent: "booch autossh 3204 122720421
" : ""
autossh: PID 3204: too many loops without data
autossh: PID 3204: error on poll: Interrupted system call
autossh: PID 3204: port down, restarting ssh
autossh: PID 3204: checking for grace period, tries = 0
autossh: PID 3204: starting ssh (count 3)
autossh: PID 3204: ssh child pid is 5520
autossh: PID 5520: execing /usr/bin/ssh
autossh: PID 3204: check on child 5520
autossh: PID 3204: set alarm for 5 secs
autossh: PID 3204: not what I sent: "booch autossh 3204 840588297
" : ""

(This continues on and on indefinitely.)  I have verified with netcat
that indeed the port 30000/30001 pair can successfully transfer data.

I tried building autossh 1.4 from source but it does not cure the
problem.  I stepped through it, and the problem seems to be in
conn_send_and_receive().  It calls poll(), sees that the write handle is
ready for writing, sends the test string, sets 'ntopoll' to 1, and
re-calls poll() again a second time.  Here you would expect poll() to
return 1 with fd 0 ready for reading after a brief pause, but it just
times out and conn_send_and_receive() returns 1 which results in the
error "timeout on io poll, looping to accept again".  I think from there
on the rest is just cascading failure resulting from that.  It seems to
try to re-accept the data channel but I don't think this succeeds as it
never went away to begin with.

So, anyway, I can't tell if this is a problem with the logic in autossh,
a problem with openssh, or a problem caused by a change in Cygwin (I use
the current snapshots.)  The end result is that on the default settings
autossh recycles its ssh every 10 minutes, which just fills up the logs
with data.  I'm not sure when this regressed, but I know that I've used
autossh for quite a while without noticing this problem until recently.

Brian

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]