Path: utzoo!attcan!uunet!aplcen!uakari.primate.wisc.edu!sdd.hp.com!ucsd!ucbvax!agate!darkstar!hplajw.hpl.hp.com
From: wilkes@hplajw.hpl.hp.com (John Wilkes)
Newsgroups: comp.os.research
Subject: Re: Why Machines Crash
Message-ID: <4843@darkstar.ucsc.edu>
Date: 2 Jul 90 17:08:57 GMT
Sender: usenet@darkstar.ucsc.edu
Organization: Hewlett-Packard Laboratories
Lines: 27
Approved: comp-os-research@jupiter.ucsc.edu


In article <4667@darkstar.ucsc.edu>, mis@Seiden.com (Mark Seiden) writes:
> 
> marzullo@cs.cornell.edu (Keith Marzullo) writes:
> 
> >In March of 1986 at the IBM Workshop on Fault-Tolerant Distributed Computing
> >Jim Gray talked about a study he did at Tandem about the reasons machines
> >crash (most likely: operator pushed reset). Does anyone have a reference to
> >a published version of this study?

You might also try:

%z InProceedings
%K Gray86a
%A Jim Gray
%T Why do computers stop and what can be done about it?
%C Proc. 5th Symp. on Reliability in Distrib. Software and Database Sys.
%D 1986
%P 3 11
%p IEEE Computer Society Press, catalog number 86CH2260--8
%x An analysis of the failure statistics of a commercially available
%x fault-tolerant system shows that administration and software
%x are the major contributors to failure.


john wilkes
(sorry for the arcane format)