Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!steinmetz!sunray!oconnor From: oconnor@sunray.steinmetz (Dennis Oconnor) Newsgroups: comp.unix.wizards,comp.arch Subject: Re: Double-bit errors and ECC memory Message-ID: <7319@steinmetz.steinmetz.UUCP> Date: Fri, 11-Sep-87 10:48:28 EDT Article-I.D.: steinmet.7319 Posted: Fri Sep 11 10:48:28 1987 Date-Received: Sat, 12-Sep-87 20:02:30 EDT References: <1184@itm.UUCP> <797@spar.SPAR.SLB.COM> <2891@phri.UUCP> Sender: root@steinmetz.steinmetz.UUCP Reply-To: oconnor@sunray.UUCP (Dennis Oconnor) Organization: General Electric CRD, Schenectady, NY Lines: 43 Summary: 1+LOG2(data_width) for one error, NOT 1+2*number_of_errors Xref: mnetor comp.unix.wizards:4203 comp.arch:2146 In article <2891@phri.UUCP> roy@phri.UUCP (Roy Smith) writes: > Note that the typical-but-mythical memory board described above >has 7 check bits per 32 bit data word. Since you need 2N+1 check bits to >correct an N-bit error, this board should be able to detect and correct as >many as 3 bad bits in any 32-bit word. Thus, you could, if you wanted, go >so far as to pluck out any 3 RAM chips on the board without loosing any >function (other than, maybe, access speed). >-- >Roy Smith, {allegra,cmcl2,philabs}!phri!roy >System Administrator, Public Health Research Institute >455 First Avenue, New York, NY 10016 Sorry, this is incorrect. To perform just SINGLE bit error CORRECTION you need 1+log2(word-width) bits of ECC bits. That means you need 6 bits for a 32-bit word, 5 for a 16-bit halfword, and 4 for a byte. Which is why you don't see ECC perfromed at the byte level, and DO see it performed at the word level, even though this makes writing a byte a pain in the neck ( to write a byte into an ECC'd word, you must read out the word, substitute in the new byte, and recompute the ECC for the new word; then you can write it back ). To perform DOUBLE bit error CORRECTION, you need to DOUBLE the number of check bits ( for randomly-occuring bit errors; block-error correcting codes where all the errors are assumed to be djacent are different, these are applicable to serial media like disk drives, not to memories ). Error DETECTION is another kettle of fish : for instance, a single parity bit detects ALL situations where an odd number of errors has occurred. A simple explanation ( intuitive, not neccesarily a proof ) for why you need 1+log2(word-width) bits of check code to correct a single bit error is the following : You need to be able to locate the error to correct it, and to locate a bit in a word of length(word-width + check-bits) [remember, the error might be in the check bits] you need log2(word-width + check-bits) bits of information. If number_of_check_bits < number_of_data_bits, this is equivalent to 1+log2(word-width). I could be SLIGHTLY wrong about this stuff : it's been a while. -- Dennis O'Connor oconnor@sungoddess.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"