Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!uflorida!mailer.cc.fsu.edu!delta!mayne
From: mayne@delta.cs.fsu.edu (William Mayne)
Newsgroups: comp.theory
Subject: Re: Hamming codes 101
Message-ID: <1991Mar29.210253.1256@mailer.cc.fsu.edu>
Date: 30 Mar 91 02:02:53 GMT
References: <26MAR91.00093317@uc780.umd.edu> <1991Mar26.100044.16609@mailer.cc.fsu.edu> <8380@skye.cs.ed.ac.uk>
Reply-To: mayne@cs.fsu.edu
Organization: Florida State University Dept. of Computer Science
Lines: 123

In article <8380@skye.cs.ed.ac.uk> ddr@cs.edinburgh.ac.uk (Doug Rogers) writes:
>The first posting on this was flawed in that it did not take acount of the check bits themselves being faulty.
>
>I've always thought of hamming codes in terms of the binary value of the bit
>
>Example for 16 bits of which 
>
>        Bit address	Purpose
>	0  0000		overall parity check for 2 bit error detection		
>	1  0001		check bit
>[Rest of table omitted]

For brevity and generality, the check bits are exactly those whose bit
addresses are powers of 2, namely 1, 2, 4, 8...
The property of interest for the purpose of understanding the check
calculation is that the binary representation has just one 1.

Numbering a bit 0 and using it for overall parity is separate from
the Hamming code. It is a good idea since the Hamming code which
starts numbering bits with 1 is most efficient when the word size
is a Mersenne (sp?) number (2^n-1) and common word sizes are 2^n.
In other words that extra bit would be useless in a pure Hamming
code, 2^n bits can contain no more data bits than 2^n-1, so the 
extra bit in a word size of 2^n might as well be used as a parity
bit as suggested. Otherwise it would go to waste. The reason is
clear. If the Hamming code were applied over the whole word the
last bit would be numbered 2^n and that would be another check bit.
 
>By making the check sum for all bits with one of their address bits set at 1 be 
>even, then the offending error is found by anding the addresses of the check bits that fail on receipt. 

I find this confusing and probably wrong. I say "probably wrong" because I
frankly don't understand it clearly enough to apply and test it. I suggest
the following alternative description.

The receiver's calculation of the check, which uses only the bits
numbered 1 and higher is this:

check:=0;
for i=1 to n /* n=highest bit address */
  if bit(i)=1 then check:=check XOR i /* bitwise exclusive or of binary */
if check not = 0 then           /* check = address of error bit */
  bit(check) = bit(check) XOR 1 /* correct bit(check) */

The key to the Hamming coding is that the check bits are calculated
to make the check "sum" come out 0 if there are no errors. The sender
does this by taking the XOR of the bit positions of all the data bits
which are 1. This will be a number <= n. That number is then used to
set the check bits. Say the check on the data bits only came out to
be 6 (binary 0110). Then set check bits 4 and 2 (4+2=6). Now the
check over all the bits (except the parity bit) will be 0. If
any one of the data or check bits is changed the check sum will be
the address of that bit.

The added parity check bit is calculated by the sender after the
check bits are set. Either even or odd parity can be used for this.

The receiver performs the Hamming correction shown above before
checking the parity, which is intended to detect two bit errors.
(No pun intended, though I don't think this would be a pun in
the U.K. anyway. I note that the previous post was from Scotland.)

It is only possible to detect a two bit error, not to repair it.
By the way, it is ironic to note that there is a 1 bit error which
cannot be distinguished from a two bit error and therefore can't
be repaired. That is if the parity bit, which played no part in the
Hamming code, is flipped.

The two bit error detection works because any two bit error not
involving bit 0 will cause the calculated check to be non-zero,
causing the Hamming check to change 1 bit (probably (certainly?)
not one of the two which were wrong, it turns out). The parity would
have been correct before, so the the Hamming correction will cause the
resulting parity to be wrong. A two bit error involving bit 0 will be
detected because the Hamming correction will correctly repair the other
bad bit, but the parity bit itself will not be corrected by the Hamming
correction.

>For actual implementation the addresses may be shuffled to combine the check bits at one end of the message. This though illustrates that :
>   s':=lg2(s')+s+1
>
>where s' is the total number of bits needed and s is the number of bits available for
>data.
>which is not soluable to give an easy equation for the number of check bits needed.

This is actually easier than it seems. Of s' bits exactly log2(s') will be
used for check bits, which is to say just the number of bits whose addresses
are powers of 2. The rest are available for data, so s=s'-log2(s') or
s'=s+log2(s'). This ignores the extra 1 for the parity bit which, as
shown above, isn't actually part of the Hamming code scheme.

I wouldn't say that it is difficult to find s' given s. The log2 calculation
asks only for the largest integer in log2(s'). As a first approximation try
substituting log2(s). Chances are when truncated log2(s) and log2(s') will
be the same. Next calculate s'=s+log2(s). If it happens that log2(s') is not
actually = log2(s) just increase s' by 1.

If s'-log2(s') is bigger than the s you started with it just means that for
the log2(s') check bits you have more data bits. If you don't need
them subtract the extra bits from s' for the actual number of total bits
you need to encode s data bits. (This might be clearer if I renamed s and
s' at each step, but I find too names confusing rather than helpful.)

Theory: The trick to correcting a block containing up to n error bits is
to map the data bit string on to a larger bit string containing check
bits in such a way that the Hamming distance between the encoding of
any two bit strings is greater than n. The Hamming distance between any
two bit strings of equal size is just the number of bit positions at
which the two strings are different, i.e. the number of 1s in the
bitwise XOR of the strings. Then if n or fewer bits are changed the
result cannot be the same as the proper image of any other data bits.
Correction is accomplished by determing which data would yield a coded
string whose Hamming distance from the coded string actually received
is n or less. This is guaranteed to be unique. Another way of looking
at it is to find a way to convert the corrupted block to a valid one
by changing n or fewer bits. The great elegance of the Hamming code
described above for the case of n=1 is that it is the most efficient
possible (i.e. it requires the fewest possible check bits for any given
number of data bits) and it is easy to calculate. I don't know how
to generalize it for n>1, or if any similarity elegant algorithms
for large n are known.

Bill Mayne (mayne@cs.fsu.edu)