Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!agate!saturn!darrell@midgard.ucsc.edu From: darrell@midgard.ucsc.edu (Darrell Long) Newsgroups: comp.os.research Subject: TR Available (replication, reliability, regeneration) Message-ID: <5395@saturn.ucsc.edu> Date: 7 Nov 88 21:37:04 GMT Sender: usenet@saturn.ucsc.edu Lines: 62 Approved: comp-os-research@jupiter.ucsc.edu For those of you interested in replication... The following technical report (UCSC-CRL-88-18) can be ordered from UCSC. Send your request to: Jennifer Madden Baskin Center for Computer Engineering and Information Sciences University of California Santa Cruz, CA 95064 The Reliability of Regeneration-Based Replica Control Protocols Darrell D.E. Long John L. Carroll Kris Stewart Computer and Information Sciences Computer Science Division University of California San Diego State University Santa Cruz, CA 95064 San Diego, CA 92182 ABSTRACT The accessibility of vital information can be enhanced by replicating the data on several sites, and employing a consistency control protocol to manage the copies. The most common measures of accessibility include reliability, which is the probability that a replicated data object will remain continuously accessible over a given time period, and availability, which is the steady-state probability that the data object is accessible at any given moment. For many applications, the reliability of a system is a more important measure of its performance than its availability. These applications are characterized by the property that interruptions of service are intolerable and usually involve interaction with real-time processes, such as process control or data gathering where the data will be lost if it is not captured when it is available. The reliability of a replicated data object depends on maintaining a viable set of current replicas. When storage is limited it may not be feasible to simply replicate a data object at enough sites to achieve the desired level of reliability. If new replicas of a data object can be created faster than a system failure can be repaired, better reliability can be achieved by creating new replicas on other sites in response to changes in the system configuration. This technique, known as regeneration, approximates the reliability provided by additional replicas for a modest increase in storage costs. Several strategies for replica maintenance are considered, and the benefits of each are analyzed. While the availability afforded by each of the protocols is quite similar, the reliabilites vary greatly. Formulas describing the reliability of the replicated data object are presented, and closed-form solutions are given for the tractible cases. Numerical solutions, validated by simulation results, are used to analyze the trade-offs between reliability and storage costs. With estimates of the mean times to site failure and repair in a given system, the numerical techniques presented here can be applied to predict the fewest number of replicas required to provide the desired level of reliability.