Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!uflorida!mailer.cc.fsu.edu!delta!mayne From: mayne@delta.cs.fsu.edu (William Mayne) Newsgroups: comp.theory Subject: Re: Hamming codes 101 Message-ID: <1991Mar29.210253.1256@mailer.cc.fsu.edu> Date: 30 Mar 91 02:02:53 GMT References: <26MAR91.00093317@uc780.umd.edu> <1991Mar26.100044.16609@mailer.cc.fsu.edu> <8380@skye.cs.ed.ac.uk> Reply-To: mayne@cs.fsu.edu Organization: Florida State University Dept. of Computer Science Lines: 123 In article <8380@skye.cs.ed.ac.uk> ddr@cs.edinburgh.ac.uk (Doug Rogers) writes: >The first posting on this was flawed in that it did not take acount of the check bits themselves being faulty. > >I've always thought of hamming codes in terms of the binary value of the bit > >Example for 16 bits of which > > Bit address Purpose > 0 0000 overall parity check for 2 bit error detection > 1 0001 check bit >[Rest of table omitted] For brevity and generality, the check bits are exactly those whose bit addresses are powers of 2, namely 1, 2, 4, 8... The property of interest for the purpose of understanding the check calculation is that the binary representation has just one 1. Numbering a bit 0 and using it for overall parity is separate from the Hamming code. It is a good idea since the Hamming code which starts numbering bits with 1 is most efficient when the word size is a Mersenne (sp?) number (2^n-1) and common word sizes are 2^n. In other words that extra bit would be useless in a pure Hamming code, 2^n bits can contain no more data bits than 2^n-1, so the extra bit in a word size of 2^n might as well be used as a parity bit as suggested. Otherwise it would go to waste. The reason is clear. If the Hamming code were applied over the whole word the last bit would be numbered 2^n and that would be another check bit. >By making the check sum for all bits with one of their address bits set at 1 be >even, then the offending error is found by anding the addresses of the check bits that fail on receipt. I find this confusing and probably wrong. I say "probably wrong" because I frankly don't understand it clearly enough to apply and test it. I suggest the following alternative description. The receiver's calculation of the check, which uses only the bits numbered 1 and higher is this: check:=0; for i=1 to n /* n=highest bit address */ if bit(i)=1 then check:=check XOR i /* bitwise exclusive or of binary */ if check not = 0 then /* check = address of error bit */ bit(check) = bit(check) XOR 1 /* correct bit(check) */ The key to the Hamming coding is that the check bits are calculated to make the check "sum" come out 0 if there are no errors. The sender does this by taking the XOR of the bit positions of all the data bits which are 1. This will be a number <= n. That number is then used to set the check bits. Say the check on the data bits only came out to be 6 (binary 0110). Then set check bits 4 and 2 (4+2=6). Now the check over all the bits (except the parity bit) will be 0. If any one of the data or check bits is changed the check sum will be the address of that bit. The added parity check bit is calculated by the sender after the check bits are set. Either even or odd parity can be used for this. The receiver performs the Hamming correction shown above before checking the parity, which is intended to detect two bit errors. (No pun intended, though I don't think this would be a pun in the U.K. anyway. I note that the previous post was from Scotland.) It is only possible to detect a two bit error, not to repair it. By the way, it is ironic to note that there is a 1 bit error which cannot be distinguished from a two bit error and therefore can't be repaired. That is if the parity bit, which played no part in the Hamming code, is flipped. The two bit error detection works because any two bit error not involving bit 0 will cause the calculated check to be non-zero, causing the Hamming check to change 1 bit (probably (certainly?) not one of the two which were wrong, it turns out). The parity would have been correct before, so the the Hamming correction will cause the resulting parity to be wrong. A two bit error involving bit 0 will be detected because the Hamming correction will correctly repair the other bad bit, but the parity bit itself will not be corrected by the Hamming correction. >For actual implementation the addresses may be shuffled to combine the check bits at one end of the message. This though illustrates that : > s':=lg2(s')+s+1 > >where s' is the total number of bits needed and s is the number of bits available for >data. >which is not soluable to give an easy equation for the number of check bits needed. This is actually easier than it seems. Of s' bits exactly log2(s') will be used for check bits, which is to say just the number of bits whose addresses are powers of 2. The rest are available for data, so s=s'-log2(s') or s'=s+log2(s'). This ignores the extra 1 for the parity bit which, as shown above, isn't actually part of the Hamming code scheme. I wouldn't say that it is difficult to find s' given s. The log2 calculation asks only for the largest integer in log2(s'). As a first approximation try substituting log2(s). Chances are when truncated log2(s) and log2(s') will be the same. Next calculate s'=s+log2(s). If it happens that log2(s') is not actually = log2(s) just increase s' by 1. If s'-log2(s') is bigger than the s you started with it just means that for the log2(s') check bits you have more data bits. If you don't need them subtract the extra bits from s' for the actual number of total bits you need to encode s data bits. (This might be clearer if I renamed s and s' at each step, but I find too names confusing rather than helpful.) Theory: The trick to correcting a block containing up to n error bits is to map the data bit string on to a larger bit string containing check bits in such a way that the Hamming distance between the encoding of any two bit strings is greater than n. The Hamming distance between any two bit strings of equal size is just the number of bit positions at which the two strings are different, i.e. the number of 1s in the bitwise XOR of the strings. Then if n or fewer bits are changed the result cannot be the same as the proper image of any other data bits. Correction is accomplished by determing which data would yield a coded string whose Hamming distance from the coded string actually received is n or less. This is guaranteed to be unique. Another way of looking at it is to find a way to convert the corrupted block to a valid one by changing n or fewer bits. The great elegance of the Hamming code described above for the case of n=1 is that it is the most efficient possible (i.e. it requires the fewest possible check bits for any given number of data bits) and it is easy to calculate. I don't know how to generalize it for n>1, or if any similarity elegant algorithms for large n are known. Bill Mayne (mayne@cs.fsu.edu)