Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!overload!dillon From: dillon@overload.UUCP (Matthew Dillon) Newsgroups: comp.sys.amiga.tech Subject: Re: Parity Checking / ECC RAM on the A3000 Message-ID: Date: 1 Jun 90 02:30:41 GMT References: <1655@lpami.wimsey.bc.ca> Lines: 83 >The fact that a properly designed ECC scheme can correct errors in the ECC bits >themselves makes it far more desirable for reliability and recoverability, >though at a greater cost. > >Parity schemes, on the other hand, cannot detect the failure of a parity bit >itself, and thus reduces the overall reliability as a tradeoff for knowing when A parity scheme will detect all one bit errors, even if the bit that error'd is the parity bit itself. The parity scheme does not know *which* bit err'd, or whether it was the parity bit itself that err'd, but it will detect any single bit error. A reasonable ECC scheme (7 bits to correct 32 bits as I mentioned in my previous posting) will detect and correct all 1 bit errors where that 1 bit is any one of the 32 bits. It will detect any single bit error in the ECC code itself in which case the real data is assumed to be valid and no other action is taken. I believe the scheme will also detect any two bit errors (through all 39 bits). One should never think of an ECC scheme in terms of whether the erronous bits are in the ECC part or the real-data part. Or, at least, I never think of it that way. You tend to produce weak algorithms when you consider cases that depend on the meaning of bits rather than work on a general algorithm that can do a better job all around. An interesting extension to ECC for anybody interested is to consider the general-expansion case... to correct N bits of error in the data portion of the code (32 bits), and to detect and ignore one and two bit errors in the ECC itself. 32 bits + 7 bits ECC corrects any single bit error in the 32 bits (7 = lg(32+1) + 1) \__________________/ + 7 bits ECC corrects any single bit error in the 40 bits, which means this corrects any two-bit errors that occur in the first 39 bits, since it will correct one and the 7 bit ECC will correct the other. (7 = lg(39+1) + 1) And so on. The number of bits of ECC required for each level goes up according to the log of the number of bits requiring correction. To correct you start out the outmost level and move inward. Also, there is another term which I have not described which needs to be added to detect multi-bit errors in the outer ECC codes to keep the algorithm a general N bit detect and correct. IT can get messy. >you had an error, even if that error is meaningless and would not have happened >without the parity bit being present. Statistically speaking, if parity is Thinking of things that way will wind you into a corner fast! >have a lot of choice. With an ECC scheme, the system can make note of the error >and keep using the memory, allowing it to map the page out when the number of >errors exceeds a threshhold over a predefined period of time. It will also >allow reporting of single bit errors to the operator, who can make a good >judgement as to the root cause, and take action as appropriate. This is one good use of ECC.. .to detect failing memory. >In Very Important Applications, I would go for ECC. In other situations, I >would go for no checking at all. Parity is useless. If the machine must stay up for months at a time, ECC does get to be important. >| // Larry Phillips | >| \X/ lphillips@lpami.wimsey.bc.ca -or- uunet!van-bc!lpami!lphillips | >| COMPUSERVE: 76703,4322 -or- 76703.4322@compuserve.com | >+-----------------------------------------------------------------------+ -Matt -- Matthew Dillon uunet.uu.net!overload!dillon 891 Regal Rd. Berkeley, Ca. 94708 USA