Path: utzoo!attcan!uunet!dino!ux1.cso.uiuc.edu!brutus.cs.uiuc.edu!psuvax1!shire!schwartz From: schwartz@shire.cs.psu.edu (Scott E. Schwartz) Newsgroups: comp.sys.isis Subject: failure detection? Message-ID: Date: 2 Mar 90 15:50:01 GMT Sender: news@cs.psu.edu (Usenet) Organization: Penn State University Computer Science Lines: 16 Hi all, Playing with Isis I've noticed something which surprised me. If I run several instances of the grid demo I can kill and restart individual processes with no problem. But if I send SIGTSTP (i.e, control-Z) to one of them, they all hang, seemingly forever, or until I continue the stopped job. I expected Isis to eventually decide that the stopped participant had failed and continue without it. Worse, if I wait too long before allowing the stopped process to continue, the system never seems to recover at all. Have I misunderstood something crucial? This is on a sun4/260 under 4.0.3. -- Scott Schwartz schwartz@cs.psu.edu "the same idea is applied today in the use of slide rules." -- Don Knuth