Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!agate!saturn!mcvax!tel2.vtt.fi!savela@uunet.UU.NET From: mcvax!tel2.vtt.fi!savela@uunet.UU.NET (Markku Savela) Newsgroups: comp.os.research Subject: Re: References for Fault Tolerent, "safe" file system Message-ID: <7640@saturn.ucsc.edu> Date: 23 May 89 20:12:54 GMT Sender: usenet@saturn.ucsc.edu Organization: Technical Research Centre of Finland Lines: 26 Approved: comp-os-research@jupiter.ucsc.edu In article <7597@saturn.ucsc.edu>, moscom!adp@cs.rochester.edu (Alan Percy) writes: > > We where going to use dual hard disks and controllers. The system > would have the dual media and a driver that would write to both, > but read from only one. If a media failure was detected the > backup disk would be read from. The bad track on the primary would > be reassigned and rewritten with data from the backup. This method was an option in a PDP-11 based multiuser operating system which we designed in 70's in my earlier employment. One additional detail has to be noted - if media failure is detected no futher attempts should be done on this disk. System should revert to backup only. All kind of havoc may result if the failure is transient.. The "dual write"-option wasn't very popular, although some sites used it. The trouble was just those transient error (or someone hitting "write protect" or "off line" accidentally. System reverted fully to backup and nobody noticed anything. And, naturally nobody read the error messages from the console and the next time system was booted, users had trashed disks, because primary disk was again in use... :-( I guess the backup disk should have had some mark that the primary has been dropped, but we never got to implement that.