Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!decwrl!shelby!portia.stanford.edu!shasta.stanford.edu!jackk From: jackk@shasta.stanford.edu (Jack Kouloheris) Newsgroups: comp.arch Subject: Workstation Data Integrity Message-ID: <1990Aug3.204358.330@portia.Stanford.EDU> Date: 3 Aug 90 20:43:58 GMT Sender: jackk@portia.Stanford.EDU (Jack Kouloheris) Reply-To: jackk@shasta.stanford.edu (Jack Kouloheris) Organization: Stanford University Lines: 22 I'm a bit puzzled by the lack of any type of memory error detection/ correction on many workstations and high-end PCs. These workstations are beginning to have memories that rival or exceed those of the previous generation of minicomputers, which almost always used some sort of ECC protection. Do manufacturers feel that it isn't needed any more ? A 1Mbit DRAM chip may have a typical soft error rate of .001-.005 PPM/KPOH/bit. Suppose we have a workstation with 16 Megabytes of memory ( = approx 1.34 * 10^ 8 bits). This yields a memory system error rate of .671 errors/KPOH, a non-negligible number. Servers may have even more memory than this, and may be running continually, so some errors are bound to occur. What happens if a bit flips, and then the data is paged out or written to a file ? The error is now permanent and can propagate. Why does no one worry about this ? Some SUNs have parity checking on the memory system, but what does the OS do when a parity error occurs, since correction is not possible ? Jack