Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!uunet!paralogics!shaw
From: shaw@paralogics.UUCP (Guy Shaw)
Newsgroups: comp.unix.wizards
Subject: Re: Looking for tcsh binary which uses vfork
Summary: Let me rephrase that request
Keywords: tcsh, vfork, SunOS 4.0
Message-ID: <246@paralogics.UUCP>
Date: 8 Sep 89 21:22:54 GMT
References: <243@paralogics.UUCP> <10941@smoke.BRL.MIL>
Organization: Paralogics; Santa Monica, CA
Lines: 141


A short while ago I asked if there was some place that had a
version of tcsh that uses vfork().  Maybe I should rephrase that.
I would like to know if there is some site which has a version of
tcsh that solves the "csh pgrp problem", one way or another.

There really are two issues which I should keep separate:
   1)  I want to know how things work, just because;
   2)  I want a fixed tcsh.

I did get one reply.
In article <10941@smoke.BRL.MIL>, gwyn@brl.arpa (Doug Gwyn) writes:
> In article <243@paralogics.UUCP> shaw@paralogics.UUCP (Guy Shaw) writes:
> >>   Stopped (tty output)
> >> Presumably using vfork() forces things to happen in the right order.
> >The idea that using vfork() would cure this problem sounds reasonable to me.
>
> NO!  All that using vfork() instead of fork() does in this case is to
> change the multiprocess timing so that the real problem, a race condition
> involving process groups, is less evident.  Chris Torek recently posted
> the explanation and suggested fix (set the process group N+1 times in
> an N-process pipeline).

Thank you.  I did read Chris Torek's article.  I have read these articles
on the "csh pgrp problem" subject:

    <712@skye.ed.ac.uk>, richard@aiai.ed.ac.uk (Richard Tobin), 9 Aug 89
    <19000@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek), 11 Aug 89
    <1127@tukki.jyu.fi>, eloranta@tukki.jyu.fi (Jussi Eloranta), 11 Aug 89
    <920@legato.LEGATO.COM>, mojo@legato (Joseph Moran), 12 Aug 89
    <184@sunquest.UUCP>, terry@sunquest.UUCP (Terry Friedrichsen), 17 Aug 89
    <19143@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek), 18 Aug 89

and your reply has prompted me to go back and read them all again,
to see if I interpret them differently the second time.
Blimey!  This redistribution of knowledge is trickier than I thought.
[Dennis Moore, mangled a bit]

In article <19000@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> >Presumably using vfork() forces things to happen in the right order.
>
> This analysis is correct (congratulations: discovering this bug is
> rather tricky---the POSIX folks noticed it eventually, but it took
> quite a while).

Chris Torek didn't seem to be saying that vfork() caused incorrect
behavior, only that there is something better.

> The accepted solution is to set the terminal's process group k+1 times
> when there are k children in a pipeline (or k times with the current
> system): once in each child and once in the parent.  Setting the pgroup
> to whatever it is already is harmless, and this ensures that the pgroup
> is set by the time it needs to be.

Do you mean do a right-to-left series of TIOCSPGRP ioctl calls,
as well as setpgrp calls?  If I understand correctly, the basic idea
is that if you startup a pipeline, say "a | b | c", then you should
proceed from right to left.  So, starting with "c", you should set up
EVERYTHING as if "c" were the only thing you were going to run, without
trying to get too clever and take advantage of the fact that some of
the setup of "c" is going to be overridden in the next stage, right away.
You should not short-stroke any part of it, no matter how short-lived
some aspect of the setup of "c" will be. This includes the process group,
and the terminal process group.  Then, proceed to establish the pipeline,
"b | c", in the same way. Then, finally build "a | b | c".
This way, the shell never leaves a pipeline in a state that isn't
completely setup to run on its own, except for reading from a pipe
with no producer.  I take it that, the way things are now, process "a"
is the only one that bothers with a TIOCSPGRP.  Sorry if I misunderstand
this, I have no source.

> (Most of the mess would go away if process groups were allocated
> by the system, rather than by user code.
Yeah, what he said!


In article <920@legato.LEGATO.COM>, mojo@legato (Joseph Moran) writes:
> Unfortunately, the `simple' fix I know of is to continue to use vfork
> with csh...
> [ . . . ]
> >Presumably using vfork() forces things to happen in the right order.
>
> Exactly - when using vfork the child process gets to run first and
> "borrow the address space" of the parent until the child exec's or
> exit's.  After the child exec's or exit's, the parent gets to run after
> it gets its address space back from the child process.
>
> I think that the general lesson to be learned here is to not introduce
> "temporary hack system calls" because it can be hard to later get rid
> of them because some important program(s) either accidentally or
> consciencely depending on the (subtle effects of that) hack.

Well, what I got from these articles is that, although there is an
"accepted solution" *and* there is a "simple fix", which uses a
"temporary hack system call", the "simple fix" would work correctly.
When you (Doug Gwyn) say " ... the real problem, a race condition
involving process groups, is less evident", do mean that there is still
a chance that vanilla csh from Sun will give me a "Stopped (tty output)"
message, but it just happens less often?

I was left with the impression that I had my choice between two
correct solutions: one that is "the right thing", but for some
reason, I shouldn't expect to see this solution implemented, soon;
and a "simple fix" which nobody likes, but that is somehow simpler,
in the short run, than "the right thing".  So, it would be more
realistic to expect to see a version of tcsh available with the
"simple fix".

But wait.  Why is the "simple fix" simpler than the "accepted solution"?
Now, that I reread these articles, I am guessing that, when Joseph Moran
says "the `simple' fix I know of is to continue to use vfork", he is
referring to how much more complicated it is to fix ALL programs
that rely on vfork() semantics in some way.  He later says,
"As time went on, we found more places that depended on the subtle
effects of vfork."  But I started getting the notion that vfork()
was the simpler fix, even when confining the discussion to fixing tcsh.

So much for trying to understand what is going on; I would like
a tcsh that has this problem fixed and I don't care how.

My personal experience is that tcsh on a Sun 3 runs into this problem
frequently, so this is not just a problem for armchair shell writers.
While running csh, I have not been unable to cause this problem to
manifest itself, *even once*.  This is the ONLY thing that I have noticed
about tcsh that detracts from its record as an interactive shell
which is superior in every way to vanilla csh.
I DO NOT want to go back to using csh.

If the world must be divided between "scruffies" and "neats", then I am
a split personality.  As a "neat", I do prefer correct and satisfying
solutions; but a hack will do, in an emergency.  For instance, I prefer
constructivist mathematics, but I don't just dismiss existence proofs and
indirect proofs, especially when that is all there is, for now.

"I'll admit, it's not the most satisfying way to conquer the world,
but I'll take what I can get."  -- Dr. Destructo

-- 
Guy Shaw
Paralogics
paralogics!shaw@uunet.uu.net  or  uunet!paralogics!shaw