Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!ames!think!eplunix!das From: das@eplunix.UUCP (David Steffens) Newsgroups: comp.unix.wizards Subject: Re: Looking for tcsh binary which uses vfork Summary: process group race _not_ sufficient to explain "Stopped..." message Keywords: tcsh, vfork, SunOS 4.0 Message-ID: <783@eplunix.UUCP> Date: 14 Sep 89 17:14:45 GMT References: <243@paralogics.UUCP> <10941@smoke.BRL.MIL> <246@paralogics.UUCP> Organization: Eaton-Peabody Lab, Boston, MA Lines: 40 In article <10941@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes: >In article <243@paralogics.UUCP> shaw@paralogics.UUCP (Guy Shaw) writes: >>> Stopped (tty output) >>> Presumably using vfork() forces things to happen in the right order. >>The idea that using vfork() would cure this problem sounds reasonable to me. >NO! All that using vfork() instead of fork() does in this case is to >change the multiprocess timing so that the real problem, a race condition >involving process groups, is less evident. Chris Torek recently posted >the explanation and suggested fix (set the process group N+1 times in >an N-process pipeline). First of all, let me say that I haven't read any of the original articles. Not because I didn't want to, but because I joined this discussion late and the original articles had already been expunged from our news machine. But some experimentation on my Sun4 running tcsh w/o vfork under SunOS4.0.3 leads me to believe that the above explanation is only partly correct. The correct part is that vfork gets things to happen in the right order. The incorrect part is that the race involves setting of process groups. While it is true that the tty ends up in the wrong process group, the _real_ race is over which process gets to run (and possibly finish) first. Take a simple pipe: ``ls | more''. If the 1st process (ls) finishes _before_ the 2nd process (more) is completely setup, then one gets the "Stopped (tty ouytput)" message, otherwise not. In terms of the source, the palloc() needed to associate the ``more'' process with the job has not been done by the time that the ``ls'' process finishes. Then the wait loop in pwait() thinks the _whole_ job is done and steals the tty out from under the ``more'' process. Now the ``more'' process gets to run and set itself up, but too late! This isn't possible w/ vfork because the child _always_ gets to run 1st. As of this writing, I don't yet have a fix. Since I might be barking up the wrong tree, I thought I'd check in with the wizards before spending a lot of time on it. Is my analysis correct? Am I missing something? -- {harvard,mit-eddie,think}!eplunix!das David Allan Steffens 243 Charles St., Boston, MA 02114 Eaton-Peabody Laboratory (617) 573-3748 Mass. Eye & Ear Infirmary