Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!umich!samsung!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!sdsu!ncr-sd!ziggy!donnh From: donnh@ziggy.SanDiego.NCR.COM (Donn Holtzman) Newsgroups: comp.arch Subject: Re: Fault Tolerant Micros Message-ID: <2484@ncr-sd.SanDiego.NCR.COM> Date: 6 Feb 90 01:17:52 GMT References: <13910004@hpisod2.HP.COM> <13910010@hpisod2.HP.COM> Sender: news@ncr-sd.SanDiego.NCR.COM Reply-To: donnh@ziggy.SanDiego.NCR.COM (Donn Holtzman) Organization: NCR Corporation, Rancho Bernardo Lines: 37 In article <13910010@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner) writes: > >Ideally FT would exist completely in the hardware, and present >a platform to the OS which looks like a non-FT machine. > From my perspective one problem that exists with a "HW only" solution to FT is the issue of SW failures. As the gentleman from Tandem pointed out there are a class of faults (Heisen Bugs) which are very timing and case dependant. A loosely coupled approach, such as Tandem's, will recover from many SW based faults simply because the timing and load characteristics are different on another processor. If the bug causes your kernel to hang, TMR or pair-and-spare approaches won't succeed. Performance is certainly and issue but one can trade check pointing overhead for recovery speed (at least in the OLTP arena). On the other hand the "HW only" approaches are easier to explain and sell. They certainly are conceptually simpler, if not actually simpler to implement correctly. >The reality is that this can't be quite true. FT vendors will >always be required to supply whatever kernel support that their >idiosyncratic implementation requires, and an OS port to such a >machine will always be more difficult than on a non-FT platform. > This is a good point. I would be surprised if Tandem didn't have to make kernel changes to make their machine work. But in this day of standards and narrow market windows it was probably easier to sell this approach to management then a large SW effort (kernel changes or no kernel changes) to support a loosely coupled approach. Interesting stuff. Donn Holtzman NCR E&M San Diego Donn.Holtzman@SanDiego.NCR.COM