Path: utzoo!attcan!uunet!mailrus!tut.cis.ohio-state.edu!snorkelwacker!apple!sun-barr!newstop!exodus!cortex.Sun.COM!rtrauben
From: rtrauben@cortex.Sun.COM (Richard Trauben)
Newsgroups: comp.arch
Subject: Re: Re: Fault Tolerance [LONG]
Message-ID: <38@exodus.Eng.Sun.COM>
Date: 8 Feb 90 19:47:23 GMT
References: <1990Feb2.035201.21073@tandem.com> <13910014@hpisod2.HP.COM>
Sender: news@exodus.Eng.Sun.COM
Reply-To: rtrauben@cortex.EBay.Sun.COM (Richard Trauben)
Organization: Sun Microsystems, Inc.  Mt. View, Ca.
Lines: 19


Dan Hepner responds to a thread about redundant mass-store and datacom
requests wrt rolling back to a checkpoint after a PE-pair failure:

>> The IO request atomicity can be addressed as part of the problem of 
>> checkpoint atomicity. Once the atomic checkpoint mechanism is developed, 
>> the initiation of IO requests can be incorporated, so that the initiation 
>> of an IO request happens only at the time of a successful checkpoint.
>> From the recovery processor's point of view, either the checkpoint/
>> IO request happened or it didn't, and that is discernible.

A consequence of what you suggest is that a unique checkpoint must 
exist for every packet in a duplex conversation (over a link) where there
are dependencies between talker and listener (debit/credit): as in 
one checkpoint per TCP/IP or X.25 packet. While it works, I suspect it
becomes THE bottleneck in packet transmission rates and might lead to
a very high frequency of checkpoints per second. 

Richard