Path: utzoo!attcan!uunet!aplcen!uakari.primate.wisc.edu!sdd.hp.com!ucsd!ucbvax!agate!darkstar!hplajw.hpl.hp.com From: wilkes@hplajw.hpl.hp.com (John Wilkes) Newsgroups: comp.os.research Subject: Re: Why Machines Crash Message-ID: <4843@darkstar.ucsc.edu> Date: 2 Jul 90 17:08:57 GMT Sender: usenet@darkstar.ucsc.edu Organization: Hewlett-Packard Laboratories Lines: 27 Approved: comp-os-research@jupiter.ucsc.edu In article <4667@darkstar.ucsc.edu>, mis@Seiden.com (Mark Seiden) writes: > > marzullo@cs.cornell.edu (Keith Marzullo) writes: > > >In March of 1986 at the IBM Workshop on Fault-Tolerant Distributed Computing > >Jim Gray talked about a study he did at Tandem about the reasons machines > >crash (most likely: operator pushed reset). Does anyone have a reference to > >a published version of this study? You might also try: %z InProceedings %K Gray86a %A Jim Gray %T Why do computers stop and what can be done about it? %C Proc. 5th Symp. on Reliability in Distrib. Software and Database Sys. %D 1986 %P 3 11 %p IEEE Computer Society Press, catalog number 86CH2260--8 %x An analysis of the failure statistics of a commercially available %x fault-tolerant system shows that administration and software %x are the major contributors to failure. john wilkes (sorry for the arcane format)