Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!mailrus!cornell!uw-beaver!mit-eddie!bu-cs!mirror!frog!cpoint!alien From: alien@cpoint.UUCP (Alien Wells) Newsgroups: comp.sys.next Subject: Re: NeXT Memory - No Error Checking or Parity ! Keywords: Memory,errors,parity Message-ID: <1429@cpoint.UUCP> Date: 15 Dec 88 02:53:00 GMT References: <549@gt-eedsp.UUCP> <8348@alice.UUCP> Reply-To: alien@cpoint.UUCP (Alien Wells) Organization: Clearpoint Research Corp., Hopkinton Mass. Lines: 66 Disclaimer: I work for a company whose main business is producing aftermarket memories. As such, I am exposed the memory business - but I cannot claim to be a memory expert. Memory reliability is extremely important in a computer. With decreasing cell sizes, it is becoming easier to have spurious bit errors, and the larger memory sizes lead to increased probabilities of failures. Even before joining Clearpoint, I considered the lack of parity to be a major problem with the Macintosh. I am extremely surprised to see it repeated by NeXT. Some figures about memory reliability. Prof McEliece (Caltech) in a paper called "The Reliability of Computer Memories" (Jan 1985 - Scientific American) estimated soft failure rate of a single memory cell at 1 every 1,000,000 years. In a 1MB board with party - this is a MBTF of 43 days. TI estimates MBTF more optimistically (no surprise). For their 64K DRAMS they estimate MBTF of 33.4 days for an 8MB system. AMD estimated a 16MB system would have an MBTF of 13 days. These error rates and MBTFs are for 64K DRAMS. Since 1MB DRAMS are considered to have twice as many errors per device, but 16 times the bits, multiply the above times by a factor of 8 to get MBTF estimates for 1MB chips. Thus, the optimistic TI estimate would lead to an extrapolation of an 8 month MBTF for soft errors for an 8MB system using 1MB memory chips. Prof McEliece's figures would extrapolate to 43 days for an 8MB system. TI estimates hard errors to be roughly 1/5 to 1/3 as likely as soft errors. Any 'reasonable' memory or computer manufacturer will use a 72 hour burn-in to assure infant mortality problems are found before shipment, but I think that the above figures are a compelling argument for a system-level approach to handle errors in the field. The simplest thing to do is parity checking. However, more and more vendors are using VLSI to incorporate Error Detection and Correction (EDC) circuitry on their memory boards. Standard EDC will detect 2 or more errors and correct 1 in the word size it deals with. The number of check bits required is log(2) of the word size. Thus, the following chart shows the memory overhead required: Word Size EDC Check Bits 8-bit Parity Bits --------- -------------- ----------------- 8 5 1 16 6 2 32 7 4 64 8 8 As you can see, by the time you get to 64 bit memory - there really isn't a reasonable excuse to not use EDC. (Of course, you could start using 16 bit parity ... but the protection is significantly diluted) Even 32 bit memories are seeing EDC used more and more often. In conclusion - I think that NeXT is bucking the trend in moving to no protection at all instead of moving to EDC protection for their memory. If the NeXT machine takes off, I expect that there will be a demand for 0MB next boxes which get populated with a 3rd party memory board - just for the reliability concerns. (-: Unless the claim is that the University Environment doesn't care about reliable operation any more than they care about packaged software. :-) For anyone who is interested in designing, evaluating, or purchasing computer memories, Clearpoint publishes a 70+ page "bible" entitled "The Designer's Guide to Add-In Memory". This is chock full of good information, and very light on the propaganda. It is available at no charge by calling: 1-800-CLEARPT Apologies: I thought I had sent this quite a while back, and recently found that I had not. I apologize if this seems dated.