Path: utzoo!yunexus!geac!syntron!jtsv16!uunet!seismo!sundc!pitstop!sun!amdcad!ames!mailrus!cornell!hal From: hal@gvax.cs.cornell.edu (Hal Perkins) Newsgroups: comp.sys.next Subject: Re: NeXT Memory - No Error Checking or Parity ! Keywords: Memory,errors,parity Message-ID: <22141@cornell.UUCP> Date: 28 Oct 88 20:28:33 GMT Article-I.D.: cornell.22141 References: <549@gt-eedsp.UUCP> <1807@desint.UUCP> Sender: nobody@cornell.UUCP Reply-To: hal@gvax.cs.cornell.edu (Hal Perkins) Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 35 In article <1807@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes: >CDC made that mistake, too, on the old 6000 series machines. The way >I heard it, somebody "discovered" that most of the parity errors on the >3000 series were in the parity bits themselves. So dropping the parity >bits would not only save money, but would cut pointless downtime. The way I heard it, the parity bit was omitted on the 6000 series to save time. The clock would have had to be slower to generate and check parity. Apprently they assumed that if a memory module went bad, it would be obvious that there was a problem and the operator or field engineer could run diagnostics. It didn't work like that though. I was operating a 6400 a couple of times when a memory module failed. The machine would start acting weird, like it was having a nervous breakdown. Jobs would abort for no apparent reason and then work just fine when they were rerun, other jobs would appear to run correctly, but when rerun would produce different answers, parts of the operating system would abort or deadlock, etc. We learned that these symptoms probably meant a hardware problem, but then we'd have to tell the engineers to rerun their last couple of day's work to be safe, since there could have been errors in their numbers before things got bad enough to be noticable. Later CDC machines as well as Cray's have error correcting memory, which is essential in huge memories if you want to have acceptable MTBF. Personally, it's fine with me if a workstation-class machine doesn't have ECC, but I would like to have parity so I know when something is wrong. I wouldn't want to be riding on an airplane designed on machines without any form of error detection. Hal Perkins hal@cs.cornell.edu Cornell CS