Path: utzoo!attcan!uunet!cs.utexas.edu!samsung!xylogics!merk!alliant!linus!eachus From: eachus@linus.mitre.org (Robert I. Eachus) Newsgroups: comp.sys.amiga.tech Subject: Re: Parity Checking / ECC RAM on the A3000 Message-ID: Date: 6 Jun 90 19:10:39 GMT References: <1655@lpami.wimsey.bc.ca> Sender: usenet@linus.mitre.org Organization: The Mitre Corporation, Bedford, MA Lines: 49 In-reply-to: lphillips@lpami.wimsey.bc.ca's message of 28 May 90 22:52:06 GMT In article <1655@lpami.wimsey.bc.ca> lphillips@lpami.wimsey.bc.ca (Larry Phillips) writes: > In Very Important Applications, I would go for ECC. In other situations, I > would go for no checking at all. Parity is useless. >> DRAMs these days are much more reliable than 10 years ago... even 5 years >> ago. > You rest my case. :-) This is more in the nature of an agreement than a flame, but there are circumstatnces where ECC is LESS relaible than no checking currently...specifically when speed limits are being pushed. Modern DRAMs are fairly well protected against cosmic ray induced errors, and other transients, but if you use ECC circutry, the overall reliability of a memory system has to include the possibility that the ECC circutry returns the wrong value or (much more common) does not assert the correct value soon enough. On most ECC memory systems this is the most frequent cause of uncorrected error (even though it most frequently occurs when a bit has been flipped). Worse, if such an error occur when the memory is read correctly, it is not even detected by most tranisent fault counters. It takes a lot of extra logic to look for signal changes on the buss immediately after the correct value is expected to be asserted. I used to work at Stratus Computer, and (due to duplexed ECC memory boards) we could detect and count both types of faults. (It's only a failure if the program sees bad data...) That is the minimum I would recommend for life critical systems. We did see both kinds of faults, and I would imagine that without very careful design, today ECC memory does not significantly improve reliability. (Before the flames start...If a system has a transient memory failure every ten months with ECC and every three months without, that is not a significant difference. As a fault-tolerant designer, I would want to push it out to several years, preferably a century or two. You can't do that with simple ECC.) If you have a friend with an IBM compatible with parity ask him when he last had a parity error. My guess is that with 256K or 1 Meg parts, it should be significantly less than 1 per Megabyte per year. -- Robert I. Eachus Amiga 3000 - The hardware makes it great, the software makes it awesome, and the price will make it ubiquitous.