Path: utzoo!attcan!uunet!mailrus!tut.cis.ohio-state.edu!snorkelwacker!apple!sun-barr!newstop!exodus!cortex.Sun.COM!rtrauben From: rtrauben@cortex.Sun.COM (Richard Trauben) Newsgroups: comp.arch Subject: Re: Re: Fault Tolerance [LONG] Message-ID: <38@exodus.Eng.Sun.COM> Date: 8 Feb 90 19:47:23 GMT References: <1990Feb2.035201.21073@tandem.com> <13910014@hpisod2.HP.COM> Sender: news@exodus.Eng.Sun.COM Reply-To: rtrauben@cortex.EBay.Sun.COM (Richard Trauben) Organization: Sun Microsystems, Inc. Mt. View, Ca. Lines: 19 Dan Hepner responds to a thread about redundant mass-store and datacom requests wrt rolling back to a checkpoint after a PE-pair failure: >> The IO request atomicity can be addressed as part of the problem of >> checkpoint atomicity. Once the atomic checkpoint mechanism is developed, >> the initiation of IO requests can be incorporated, so that the initiation >> of an IO request happens only at the time of a successful checkpoint. >> From the recovery processor's point of view, either the checkpoint/ >> IO request happened or it didn't, and that is discernible. A consequence of what you suggest is that a unique checkpoint must exist for every packet in a duplex conversation (over a link) where there are dependencies between talker and listener (debit/credit): as in one checkpoint per TCP/IP or X.25 packet. While it works, I suspect it becomes THE bottleneck in packet transmission rates and might lead to a very high frequency of checkpoints per second. Richard