Mailing List Archive

ssh client is setting O_NONBLOCK on a pipe shared with other processes
The quick summary is that we invoke git from a parallel invocation of
"make". Git invokes ssh to pull stuff from a remote repo. Ssh sets
O_NONBLOCK on stdout and stderr if they do not refer to a tty. During
our build, stderr refers to a pipe that other jobs run by make (and
make itself) may also write to, and since this is a parallel build,
they may write to that pipe while ssh has it in non-blocking mode.

Make occasionally gets an unexpected EAGAIN error and fails the build
with the error message "make: write error".

We have a workaround, but it seems to me that this could cause
problems with other background uses of ssh too. Should ssh really be
setting O_NONBLOCK if it is running non-interactively?

For more details, please see the thread on the git mailing list at
https://www.spinics.net/lists/git/msg365902.html.

Thanks,
Doug
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
On Sun, 15 Sep 2019, Doug Graham wrote:

> The quick summary is that we invoke git from a parallel invocation of
> "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets
> O_NONBLOCK on stdout and stderr if they do not refer to a tty. During
> our build, stderr refers to a pipe that other jobs run by make (and
> make itself) may also write to, and since this is a parallel build,
> they may write to that pipe while ssh has it in non-blocking mode.
>
> Make occasionally gets an unexpected EAGAIN error and fails the build
> with the error message "make: write error".
>
> We have a workaround, but it seems to me that this could cause
> problems with other background uses of ssh too. Should ssh really be
> setting O_NONBLOCK if it is running non-interactively?

ssh has to set NONBLOCK otherwise it can, well, block - there's
no way for ssh to know a priori how much data it can write to a fd.

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
On Mon, 16 Sep 2019 at 01:18, Doug Graham <edougra@gmail.com> wrote:
[...]
> Make occasionally gets an unexpected EAGAIN error and fails the build
> with the error message "make: write error".

So the make process gets an EAGAIN on the write syscall and doesn't
retry? That sounds like a bug in whatever make you're using, since
that could potentially occur in other circumstances too.

(Same goes for EWOULDBLOCK, as well as EINTR if you don't have
restartable syscalls).

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new)
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
> ssh has to set NONBLOCK otherwise it can, well, block - there's
> no way for ssh to know a priori how much data it can write to a fd.

I don't know anything about how ssh is structured, but I think it must
be a bit more complicated than that. Ssh only sets O_NONBLOCK on an
fd if isatty(fd) returns false, so it's able to function with blocking input
and output if the relevant descriptor refers to a tty (probably the usual
case).


On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm@mindrot.org> wrote:
>
> On Sun, 15 Sep 2019, Doug Graham wrote:
>
> > The quick summary is that we invoke git from a parallel invocation of
> > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets
> > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During
> > our build, stderr refers to a pipe that other jobs run by make (and
> > make itself) may also write to, and since this is a parallel build,
> > they may write to that pipe while ssh has it in non-blocking mode.
> >
> > Make occasionally gets an unexpected EAGAIN error and fails the build
> > with the error message "make: write error".
> >
> > We have a workaround, but it seems to me that this could cause
> > problems with other background uses of ssh too. Should ssh really be
> > setting O_NONBLOCK if it is running non-interactively?
>
> ssh has to set NONBLOCK otherwise it can, well, block - there's
> no way for ssh to know a priori how much data it can write to a fd.
>
> -d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
AFAIK failing to set nonblock on ttys is a blurry mixture of legacy
behaviour, hack and bug.

It's legacy behaviour because it dates back to ssh 1.x when the ssh
process interacted with at most a single tty. This is no longer the
case due to connection multiplexing.

It's a bug for the same reason - a ssh blocked on tty writes that
is not able to service IO to other multiplexed connections is not
behaving well.

It's a hack because it may provide something approximating human-
friendly behaviour when ssh is only interacting with a single tty.
In this case, if the tty is behind then stalling on writes provides
some backpressure to the peer. I'm not sure whether this has been
properly analysed or even whether it makes sense these days.

In any case, disabling nonblock for non-ttys is not what we want to do.

-d


On Sun, 15 Sep 2019, Doug Graham wrote:

> > ssh has to set NONBLOCK otherwise it can, well, block - there's
> > no way for ssh to know a priori how much data it can write to a fd.
>
> I don't know anything about how ssh is structured, but I think it must
> be a bit more complicated than that. Ssh only sets O_NONBLOCK on an
> fd if isatty(fd) returns false, so it's able to function with blocking input
> and output if the relevant descriptor refers to a tty (probably the usual
> case).
>
>
> On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm@mindrot.org> wrote:
> >
> > On Sun, 15 Sep 2019, Doug Graham wrote:
> >
> > > The quick summary is that we invoke git from a parallel invocation of
> > > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets
> > > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During
> > > our build, stderr refers to a pipe that other jobs run by make (and
> > > make itself) may also write to, and since this is a parallel build,
> > > they may write to that pipe while ssh has it in non-blocking mode.
> > >
> > > Make occasionally gets an unexpected EAGAIN error and fails the build
> > > with the error message "make: write error".
> > >
> > > We have a workaround, but it seems to me that this could cause
> > > problems with other background uses of ssh too. Should ssh really be
> > > setting O_NONBLOCK if it is running non-interactively?
> >
> > ssh has to set NONBLOCK otherwise it can, well, block - there's
> > no way for ssh to know a priori how much data it can write to a fd.
> >
> > -d
>
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
> So the make process gets an EAGAIN on the write syscall and doesn't
> retry? That sounds like a bug in whatever make you're using, since
> that could potentially occur in other circumstances too.

What other circumstances? EAGAIN means that something put the
device into non-blocking mode, and normally, that should only happen
if the program calling write had itself previously set O_NONBLOCK. I don't
think programs that don't set O_NONBLOCK are required to handle
EAGAIN or short counts. They *may* need to deal with EINTR but signals
don't come out of nowhere either and many programs run in an environment
where EINTR is also unexpected.

We are using GNU make 3.81 but newer versions of gmake do the same thing:

void
close_stdout (void)
{
int prev_fail = ferror (stdout);
int fclose_fail = fclose (stdout);

if (prev_fail || fclose_fail)
{
if (fclose_fail)
error (NILF, _("write error: %s"), strerror (errno));
else
error (NILF, _("write error"));
exit (EXIT_FAILURE);
}
}


On Sun, Sep 15, 2019 at 11:16 PM Darren Tucker <dtucker@dtucker.net> wrote:
>
> On Mon, 16 Sep 2019 at 01:18, Doug Graham <edougra@gmail.com> wrote:
> [...]
> > Make occasionally gets an unexpected EAGAIN error and fails the build
> > with the error message "make: write error".
>
> So the make process gets an EAGAIN on the write syscall and doesn't
> retry? That sounds like a bug in whatever make you're using, since
> that could potentially occur in other circumstances too.
>
> (Same goes for EWOULDBLOCK, as well as EINTR if you don't have
> restartable syscalls).
>
> --
> Darren Tucker (dtucker at dtucker.net)
> GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new)
> Good judgement comes with experience. Unfortunately, the experience
> usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
> In any case, disabling nonblock for non-ttys is not what we want to do

Ok, well I don't know what the solution is then. I think the real
problem is probably that O_NOBLOCK applies to an "open description"
rather than than to a descriptor, but that behavior is apparently
mandated by POSIX. I think that when ssh sets O_NONBLOCK, that
should have no effect on its parent process or any other process.
So I guess the bug is in POSIX.

On Sun, Sep 15, 2019 at 11:36 PM Damien Miller <djm@mindrot.org> wrote:
>
> AFAIK failing to set nonblock on ttys is a blurry mixture of legacy
> behaviour, hack and bug.
>
> It's legacy behaviour because it dates back to ssh 1.x when the ssh
> process interacted with at most a single tty. This is no longer the
> case due to connection multiplexing.
>
> It's a bug for the same reason - a ssh blocked on tty writes that
> is not able to service IO to other multiplexed connections is not
> behaving well.
>
> It's a hack because it may provide something approximating human-
> friendly behaviour when ssh is only interacting with a single tty.
> In this case, if the tty is behind then stalling on writes provides
> some backpressure to the peer. I'm not sure whether this has been
> properly analysed or even whether it makes sense these days.
>
> In any case, disabling nonblock for non-ttys is not what we want to do.
>
> -d
>
>
> On Sun, 15 Sep 2019, Doug Graham wrote:
>
> > > ssh has to set NONBLOCK otherwise it can, well, block - there's
> > > no way for ssh to know a priori how much data it can write to a fd.
> >
> > I don't know anything about how ssh is structured, but I think it must
> > be a bit more complicated than that. Ssh only sets O_NONBLOCK on an
> > fd if isatty(fd) returns false, so it's able to function with blocking input
> > and output if the relevant descriptor refers to a tty (probably the usual
> > case).
> >
> >
> > On Sun, Sep 15, 2019 at 10:20 PM Damien Miller <djm@mindrot.org> wrote:
> > >
> > > On Sun, 15 Sep 2019, Doug Graham wrote:
> > >
> > > > The quick summary is that we invoke git from a parallel invocation of
> > > > "make". Git invokes ssh to pull stuff from a remote repo. Ssh sets
> > > > O_NONBLOCK on stdout and stderr if they do not refer to a tty. During
> > > > our build, stderr refers to a pipe that other jobs run by make (and
> > > > make itself) may also write to, and since this is a parallel build,
> > > > they may write to that pipe while ssh has it in non-blocking mode.
> > > >
> > > > Make occasionally gets an unexpected EAGAIN error and fails the build
> > > > with the error message "make: write error".
> > > >
> > > > We have a workaround, but it seems to me that this could cause
> > > > problems with other background uses of ssh too. Should ssh really be
> > > > setting O_NONBLOCK if it is running non-interactively?
> > >
> > > ssh has to set NONBLOCK otherwise it can, well, block - there's
> > > no way for ssh to know a priori how much data it can write to a fd.
> > >
> > > -d
> >
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
On Sun, 15 Sep 2019, Doug Graham wrote:

> > In any case, disabling nonblock for non-ttys is not what we want to do
>
> Ok, well I don't know what the solution is then.

you can probably work around it by wrapping your ssh-executing make
commands in a go-between process, e.g.

git [whatever] 2>&1 | cat

This way, the cat process will be owning the stdout file descriptor and
will (presumably) keep it in blocking mode.

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
Doug Graham wrote:
> > So the make process gets an EAGAIN on the write syscall and doesn't
> > retry? That sounds like a bug in whatever make you're using, since
> > that could potentially occur in other circumstances too.
>
> What other circumstances? EAGAIN means that something put the
> device into non-blocking mode, and normally, that should only happen
> if the program calling write had itself previously set O_NONBLOCK.

Any program which makes assumptions about fds that have been passed to
other programs risk that those assumptions no longer hold.


> I don't think programs that don't set O_NONBLOCK are required to handle
> EAGAIN or short counts.

Please think about that some more.

Case in point; EAGAIN can come if you give your fd to another process
and continue using it yourself.

Short counts; It is documented behavior that read() and write() may
return short counts. It is not documented why, so you can not make
any assumptions.

With Linux, the kernel code of the particular device driver determines
what read() and write() calls return, and because the userspace API is
documented to allow short counts the different drivers may and do have
different semantics for return counts, sometimes the way that fits the
particular device, sometimes such that the kernel driver is much
simpler, thus more reliable.

I ignored short counts because convenient, until it caused me a problem. ;)

Now I write a looping function called wr(), rd() do_write() or do_read().


> We are using GNU make 3.81 but newer versions of gmake do the same thing:

GNU programs are like other programs in that they aren't neccessarily
correct.


//Peter
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
> Case in point; EAGAIN can come if you give your fd to another process
> and continue using it yourself.

> Short counts; It is documented behavior that read() and write() may
> return short counts. It is not documented why, so you can not make
> any assumptions.

You might be right about short counts but if you're right about
EAGAIN, there are
bugs everywhere. My first attempt at working around my "make: write error"
failure was to pipe make into cat or tee, eg: "make | tee make.log". But that
caused both cat and tee to fail with EAGAIN. So they have the same "bug" as
make. Also note that make is just calling printf normally and then
just before exiting,
it calls ferror(stdout) to see if any error occurred when it
previously wrote to stdout.
ferror() is returning true. So now the bug has moved into the C library.

Also note that EAGAIN is not a transient error like EINTR that will
probably go away
on a retry. Retrying the write in a tight loop would probably just
burn some extra cpu and
then fail anyway. You'd have to call select() first, or put a delay
in the loop. Are you
suggesting that every program that writes to stdout should implement
such contortions?

Not to mention that if the error occurred in stdout or some other C
library routine, I don't
think the calling program has any way of telling how much output was
sent successfully
and how much should be retried. I could write

if (printf("hello world") < 0 && errno == EAGAIN)
<what here?>

but can I safely assume that none of my string was written before the
error occurred?
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
> On 16 Sep 2019, at 16:06, Doug Graham <edougra@gmail.com> wrote:
>
>> Case in point; EAGAIN can come if you give your fd to another process
>> and continue using it yourself.
>
>> Short counts; It is documented behavior that read() and write() may
>> return short counts. It is not documented why, so you can not make
>> any assumptions.
>
> You might be right about short counts but if you're right about
> EAGAIN, there are
> bugs everywhere. My first attempt at working around my "make: write error"
> failure was to pipe make into cat or tee, eg: "make | tee make.log". But that
> caused both cat and tee to fail with EAGAIN. So they have the same "bug" as
> make. Also note that make is just calling printf normally and then
> just before exiting,
> it calls ferror(stdout) to see if any error occurred when it
> previously wrote to stdout.
> ferror() is returning true. So now the bug has moved into the C library.

Dumb question: shouldn't whatever is calling fork() (here 'make' I believe)
be dup()'ing the FDs just in case the called program does something odd
with them like set O_NONBLOCK? That's what I've always done before
fork(), and I believe what the venerable Mr Stevens recommends.

--
Alex Bligh




_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: ssh client is setting O_NONBLOCK on a pipe shared with other processes [ In reply to ]
> Dumb question: shouldn't whatever is calling fork() (here 'make' I believe)
> be dup()'ing the FDs just in case the called program does something odd
> with them like set O_NONBLOCK?

Doesn't work. That's where I think POSIX is downright weird. Ssh
actually does dup
descriptors 0, 1, and 2, and then only sets O_NONBLOCK on the new descriptors:

if (stdin_null_flag) {
in = open(_PATH_DEVNULL, O_RDONLY);
} else {
in = dup(STDIN_FILENO);
}
out = dup(STDOUT_FILENO);
err = dup(STDERR_FILENO);

/* enable nonblocking unless tty */
if (!isatty(in))
set_nonblock(in);
if (!isatty(out))
set_nonblock(out);
if (!isatty(err))
set_nonblock(err);

The problem, I've learned, it that the original descriptor and the new
copy of it refer to the same
POSIX "file description", so setting O_NONBLOCK on one affects the
other as well. I find this
quite counter-intuitive and it's why I'm starting to believe that the
bug is in POSIX. I also
find it odd a child process can affect this flag in the parent, but
that again is because both the
parent and child's descriptors refer to the same POSIX "file description".
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev