Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!rice!uw-beaver!Teknowledge.COM!polya!shelby!neon!rwpratt From: rwpratt@Neon.Stanford.EDU (Robert W. Pratt) Newsgroups: comp.arch Subject: Re: Fault Tolerant Micros Message-ID: <1990Jan18.185520.3682@Neon.Stanford.EDU> Date: 18 Jan 90 18:55:20 GMT References: <13910004@hpisod2.HP.COM> Organization: Computer Science Department, Stanford University Lines: 27 In article <13910004@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner) writes: >From: lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) >>Live CPU recovery has become much less interesting since >>multiprocessors came along. With the right software, a failed >>processor does not imply a failed process. For example, Tandem >>checkpoints each process regularly, so that a different processor can >>do a prompt checkpoint-resumption. > >Tandem has apparently decided that this was not the correct model >to implement fault tolerance, although I've searched for and not >found yet an official statement on just how the S2 does do FT. I think a more accurate statement would be that Tandem decided not to do checkpointing for fault tolerance under UNIX system V.3, since that would have called for radical (IMHO) changes to UNIX. Guardian (Tandem's proprietary OS) still uses checkpointing. Disclaimer: I did not work on the S2, and the above is exclusively my opinion,not Tandem's or Stanford's. Bob P. -- Bob Pratt INTERnet: pratt@jessica.stanford.edu (much more reliable) pratt_robert@comm.tandem.com (checked more, but flaky) USMail: 2225 Sharon Rd. #323 Menlo Park, CA. 94025