Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!steinmetz!sunray!oconnor
From: oconnor@sunray.steinmetz (Dennis Oconnor)
Newsgroups: comp.unix.wizards,comp.arch
Subject: Re: Double-bit errors and ECC memory
Message-ID: <7319@steinmetz.steinmetz.UUCP>
Date: Fri, 11-Sep-87 10:48:28 EDT
Article-I.D.: steinmet.7319
Posted: Fri Sep 11 10:48:28 1987
Date-Received: Sat, 12-Sep-87 20:02:30 EDT
References: <1184@itm.UUCP> <797@spar.SPAR.SLB.COM> <2891@phri.UUCP>
Sender: root@steinmetz.steinmetz.UUCP
Reply-To: oconnor@sunray.UUCP (Dennis Oconnor)
Organization: General Electric CRD, Schenectady, NY
Lines: 43
Summary: 1+LOG2(data_width) for one error, NOT 1+2*number_of_errors
Xref: mnetor comp.unix.wizards:4203 comp.arch:2146

In article <2891@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>	Note that the typical-but-mythical memory board described above
>has 7 check bits per 32 bit data word.  Since you need 2N+1 check bits to
>correct an N-bit error, this board should be able to detect and correct as
>many as 3 bad bits in any 32-bit word.  Thus, you could, if you wanted, go
>so far as to pluck out any 3 RAM chips on the board without loosing any
>function (other than, maybe, access speed).
>-- 
>Roy Smith, {allegra,cmcl2,philabs}!phri!roy
>System Administrator, Public Health Research Institute
>455 First Avenue, New York, NY 10016

Sorry, this is incorrect. To perform just SINGLE bit error CORRECTION
you need 1+log2(word-width) bits of ECC bits. That means you need
6 bits for a 32-bit word, 5 for a 16-bit halfword, and 4 for a byte.
Which is why you don't see ECC perfromed at the byte level, and DO
see it performed at the word level, even though this makes writing
a byte a pain in the neck ( to write a byte into an ECC'd word, you
must read out the word, substitute in the new byte, and recompute
the ECC for the new word; then you can write it back ). To perform
DOUBLE bit error CORRECTION, you need to DOUBLE the number of check
bits ( for randomly-occuring bit errors; block-error correcting
codes where all the errors are assumed to be djacent are different,
these are applicable to serial media like disk drives, not to memories ).
Error DETECTION is another kettle of fish : for instance, a single
parity bit detects ALL situations where an odd number of errors has
occurred. 

A simple explanation ( intuitive, not neccesarily a proof ) for why
you need 1+log2(word-width) bits of check code to correct a
single bit error is the following : You need to be able to locate
the error to correct it, and to locate a bit in a word of
length(word-width + check-bits) [remember, the error might be in
the check bits] you need log2(word-width + check-bits) bits of
information. If number_of_check_bits < number_of_data_bits,
this is equivalent to 1+log2(word-width).

I could be SLIGHTLY wrong about this stuff : it's been a while.

--
	Dennis O'Connor 	oconnor@sungoddess.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
        "If I have an "s" in my name, am I a PHIL-OSS-IF-FER?"