Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!elroy.jpl.nasa.gov!sdd.hp.com!hplabs!hpl-opus!jewett From: jewett@hpl-opus.hpl.hp.com (Bob Jewett) Newsgroups: comp.arch Subject: Re: parity is for farmers? Message-ID: <73930001@hpl-opus.hpl.hp.com> Date: 23 May 91 20:29:54 GMT References: <1991May21.232331.24888@cs.umn.edu> Organization: HP Labs, High Speed Electronics Dept., Palo Alto, CA Lines: 36 > >Is memory so reliable today that > >parity doesn't give enough benefit to bother with? > > It's marginal, and depends on circumstances. Modern memory *is* pretty > reliable, ... > On well-broken-in hardware, parity errors are quite rare. (Utzoo gets > maybe one or two a year on 24MB of relatively old memory.) On the ~20 systems in this department, we see DRAM error rates that vary according to the type of memory chips used. Systems that have 1Mb chips seem to average about one error for every 400 megabyte-months of operation. That's one error on a 16MB system every 25 months. Systems that use 4Mb chips (i.e., all the new ones) have one error every 100 megabyte-months, or four times a year for a 32Meg system. >> Does only ECC give a >> strong enough guarantee - and that is too expensive... > ECC is more painful in all the above ways, and tends to be used only for > server-class machines where availability sells. In practice, with current > error rates, unless the application is one where crashes are utterly > unacceptable, parity is amply sufficient. Yes, it depends on what the costs of a crash are. If you have spent a couple of days working on an IC design and have neglected to write a checkpoint version, a crash costs at least several hundred dollars. All the new systems here have ECC. We have had about 50 corrected errors in the last year, not counting two bursts of errors on two systems that had hardware problems. In our situation, is would have been unacceptable to have had that many more crashes. ECC is required. Parity is not sufficient. Bob Jewett [Not an official statement, etc.]