Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!overload!dillon
From: dillon@overload.UUCP (Matthew Dillon)
Newsgroups: comp.sys.amiga.tech
Subject: Re:  Parity Checking / ECC RAM on the A3000
Message-ID: <dillon.4072@overload.UUCP>
Date: 1 Jun 90 02:30:41 GMT
References: <1655@lpami.wimsey.bc.ca>
Lines: 83

>The fact that a properly designed ECC scheme can correct errors in the ECC bits
>themselves makes it far more desirable for reliability and recoverability,
>though at a greater cost.
>
>Parity schemes, on the other hand, cannot detect the failure of a parity bit
>itself, and thus reduces the overall reliability as a tradeoff for knowing when

    A parity scheme will detect all one bit errors, even if the bit that
    error'd is the parity bit itself.  The parity scheme does not know *which*
    bit err'd, or whether it was the parity bit itself that err'd, but it
    will detect any single bit error.

    A reasonable ECC scheme (7 bits to correct 32 bits as I mentioned in my
    previous posting) will detect and correct all 1 bit errors where that 1
    bit is any one of the 32 bits.  It will detect any single bit error in
    the ECC code itself in which case the real data is assumed to be valid
    and no other action is taken.  I believe the scheme will also detect any
    two bit errors (through all 39 bits).

    One should never think of an ECC scheme in terms of whether the erronous
    bits are in the ECC part or the real-data part.  Or, at least, I never
    think of it that way.  You tend to produce weak algorithms when you
    consider cases that depend on the meaning of bits rather than work on
    a general algorithm that can do a better job all around.

    An interesting extension to ECC for anybody interested is to consider
    the general-expansion case... to correct N bits of error in the data
    portion of the code (32 bits), and to detect and ignore one and two bit
    errors in the ECC itself.

	32 bits + 7 bits ECC		    corrects any single bit error in
					    the 32 bits (7 = lg(32+1) + 1)

	\__________________/ + 7 bits ECC   corrects any single bit error in
					    the 40 bits, which means this
					    corrects any two-bit errors that
					    occur in the first 39 bits, since
					    it will correct one and the 7 bit
					    ECC will correct the other.

					    (7 = lg(39+1) + 1)


    And so on.	The number of bits of ECC required for each level goes up
    according to the log of the number of bits requiring correction.  To
    correct you start out the outmost level and move inward.  Also, there is
    another term which I have not described which needs to be added to
    detect multi-bit errors in the outer ECC codes to keep the algorithm
    a general N bit detect and correct.  IT can get messy.

>you had an error, even if that error is meaningless and would not have happened
>without the parity bit being present.	Statistically speaking, if parity is

    Thinking of things that way will wind you into a corner fast!

>have a lot of choice. With an ECC scheme, the system can make note of the error
>and keep using the memory, allowing it to map the page out when the number of
>errors exceeds a threshhold over a predefined period of time. It will also
>allow reporting of single bit errors to the operator, who can make a good
>judgement as to the root cause, and take action as appropriate.

    This is one good use of ECC.. .to detect failing memory.

>In Very Important Applications, I would go for ECC.  In other situations, I
>would go for no checking at all. Parity is useless.

    If the machine must stay up for months at a time, ECC does get to be
    important.

>|   //   Larry Phillips						 |
>| \X/	  lphillips@lpami.wimsey.bc.ca -or- uunet!van-bc!lpami!lphillips |
>|	  COMPUSERVE: 76703,4322  -or-	76703.4322@compuserve.com	 |
>+-----------------------------------------------------------------------+

				-Matt

--


    Matthew Dillon	    uunet.uu.net!overload!dillon
    891 Regal Rd.
    Berkeley, Ca. 94708
    USA