Path: utzoo!yunexus!maccs!gordan From: gordan@maccs.UUCP (gordan) Newsgroups: comp.protocols.tcp-ip Subject: Checksums (was Re: Ping, checksum algorithm?) Summary: Detailed description of TCP/IP checksums Keywords: TCP IP UDP one's complement checksum Message-ID: <1080@maccs.UUCP> Date: 23 Mar 88 01:03:47 GMT Article-I.D.: maccs.1080 Posted: Tue Mar 22 20:03:47 1988 References: <123@heart-of-gold> Reply-To: gordan@maccs.UUCP () Organization: Worldwide Phlogiston Cartel Lines: 250 In article <123@heart-of-gold> jc@heart-of-gold (John M Chambers x7780 1E342) writes: -Does anyone have a PD version of ping? How about a C-coded routine that -does the IP checksum calculation? We have one that is written in VAX -assembly language, which is OK for a VAX, but it doesn't work too well -on a 68020.... -------------------------------------------------------------------------- -- One's complement of the one's complement sum checksum in TCP/IP -- In the RFC documents, the "one's complement of the one's complement sum" checksums are mentioned in a single paragraph, and are never even described, an omission that seems incredible. Although the algorithm must be well known to many people, a written description seems to be lacking. So here's an attempt to provide a short description (with examples). The only authority I can cite for the following is myself (but it seems to work, judging from actual TCP/IP packets). Someone kindly let me know if this is horribly wrong. Basically, IP, TCP, and UDP require doing one's complement sums of 16-bit words. That is, you must take a bunch of 16-bit words and sum them (ignoring overflows) _as if their bit patterns represented one's complement numbers_. The trick, then, is doing one's complement arithmetic on a two's complement machine. Without going into any arithmetical justification, here's how to do a one's complement sum on a two's complement machine, in pseudocode: INT16 sum; INT16 *word; /* pointer to start of 16-bit words to be summed */ sum = 0; for (i = 0; i < `number of 16-bit words to be summed'; i++) { `byte-swap word[i], if necessary (see comment on byte-order)' sum += word[i]; /* do NOT combine these two lines ... */ sum += `CARRY'; /* ... into sum += word[i] + CARRY !!!! */ } where CARRY is the value of the hardware carry bit (0 or 1), as set by the addition in the previous line (note you mustn't do sum += word[i] + CARRY as one line, since a high-level language could rearrange the order of addition and add the value of the carry bit before it was set). Of course, the value of the carry bit is not accessible from a higher-level language like C. A perfectly equivalent method (very suitable if your machine has 32-bit integers) is: INT32 sum32, word32 INT16 *word; /* pointer to start of 16-bit words to be summed */ sum32 = 0; for (i = 0; i < `number of 16-bit words to be summed'; i++) { `byte-swap word[i], if necessary (see comments on byte-order)' `copy word[i] to word32, zero-extended (NOT sign-extended)' /* (e.g., 0xedcb -> 0x0000edcb, not 0xffffedcb) */ sum32 += word32; } sum = `add the two 16-bit halves of sum32 to each other' This works, since the carry bit values for 16-bit addition of the least significant 16-bit word accumulate in the most significant 16-bit word of the 32-bit sum. (This is probably what you would use on a 68020 -- and you can forget about byte-swapping on a 68020 as well). After calculating a one's complement sum, you have to take its one's complement (invert all the bits) to get the actual checksum used in IP, TCP, and UDP (but note that UDP treats a calculated checksum of 0x0000 as a special case -- see the RFC). It is of course necessary to take byte-order into account. (Byte-order: if adjacent memory locations on a machine contain the following bytes: X : 0x12 X+1 : 0x34 then what is the value of the 16-bit word whose address is X? (assuming a byte-addressable machine and valid alignment for X to be read as a 16-bit value). If the 16-bit value is 0x1234, the machine is said to be ``big-endian;'' if it is 0x3412, the machine is ``little-endian.'' Some machine architectures (Motorola 680x0, etc.) are big-endian, others (Intel 80x86, VAX) are little-endian. TCP/IP headers use big-endian byte-order. Thus, life is easier on a Sun than on a VAX. Some examples follow, using actual packets (see the appropriate RFC docs for IP, UDP, and TCP, and ignore the Ethernet stuff). In case anyone's curious, the IP addresses here are used on a LAN unconnected to any outside network (they do not respect the class A/B/C Internet naming scheme). An Ethernet UDP/IP packet --------------------------------------------------- 1: ff-ff-ff-ff-ff-ff 02-60-8c-09-58-97 08-00 2: 45 00 3: 00-24 00-01 00-00 ff 11 61-31 01-00-58-97 01-00- 4: -00-00 5: 09-46 00-2a 00-10 c9-ca 6: 01 06 4a 48 45 56 7: 41 58 8: 00 00 00 00 00 00 00 00 00 00 --------------------------------------------------- Line 1: Ethernet header Lines 2-8: Ethernet data Lines 2-4: IP header Lines 5-7: IP data Line 5: UDP header Lines 6-7: UDP data Line 8: Garbage padding to satisfy Ethernet minimum packet size (Ethernet header + data >= 60 bytes). --------------------------------------------------- An Ethernet TCP/IP packet ---------------------------------------------------- 1: 08-00-2b-02-d2-67 08-00-02-00-51-23 08-00 2: 45 00 3: 00-4b 44-46 00-00 1e 06 56-3a 01-00-00-0b 01-00- 4: -00-23 5: 00-17 07-a8 06-14-56-f0 d3-1d-aa-a4 50 18 6: 00-68 b1-d0 00-00 7: 0d 0a 0d 0a 4d 63 4d 61 73 74 8: 65 72 20 55 6e 69 76 65 72 73 69 74 79 20 56 41 9: 58 20 38 36 30 30 0d 0a 0d 10: 00 --------------------------------------------------- Line 1: Ethernet header Lines 2-10: Ethernet data Lines 2-4: IP header Lines 5-9: IP data Lines 5-6: TCP header Lines 7-9: TCP data Line 10: Garbage Ethernet padding (to send an even number of bytes) ---------------------------------------------------- In the first packet, the IP Checksum field is 0x6131 (in the middle of line 3). The IP checksum is calculated over all 16-bit words in the header (except the checksum field itself is taken to be zero, prior to actually calculating it). Thus the 16-bit words that go into calculating the IP checksum are (from lines 2,3,4): 0x4500, 0x0024, 0x0001, 0x0000, 0xff11, 0x0000, 0x0100, 0x5897, 0x0100, 0x0000. The 32-bit sum of zero-extended words is 0x 0001 9ecd, so the one's complement sum is 0x9ece. The one's complement of this is the checksum, 0x6131. The UDP Checksum field in the same packet is 0xc9ca (at the end of line 5). Unlike IP, the UDP checksum is calculated not only over the UDP header, but also over the UDP data, and over a pseudo-header consisting of the IP source and destination addresses, the IP Protocol field zero-extended to 16-bits, and a UDP length word. Again the checksum field itself is taken to be zero during the actual calculation, since we can't know its value before actually computing it. Thus the 16-bit words that go into calculating the UDP checksum are (from lines 5,6,7): 0x0946, 0x002a, 0x0010, 0x0000, 0x0106, 0x4a48, 0x4556, 0x4158; (and from the pseudo-header): 0x0100, 0x5897, 0x0100, 0x0000, 0x0011 (UDP protocol number = 0x11 or 17 decimal), and 0x0010 (the UDP length). The 32-bit sum of zero-extended words is 0x 0001 3634, so the 16-bit one's complement sum is 0x3635 and the checksum is 0xc9ca as required. In the second packet shown, the IP checksum is 0x563a (in the middle of line 3). The 16-bit words that go into calculating the IP checksum are (from lines 2,3,4): 0x4500, 0x004b, 0x4446, 0x0000, 0x1e06, 0x0000, 0x0100, 0x000b, 0x0100, 0x0023. The 32-bit sum of zero-extended words is 0x 0000 a9c5, so the one's complement sum is 0xa9c5. The one's complement of this is the checksum, 0x563a. The TCP Checksum field in the same packet is 0xb1d0 (the second word in line 6). Just as for UDP, the TCP checksum is calculated over all 16-bit words in the TCP header, data, and pseudo-header. The 16-bit words that go into calculating the checksum are: From the TCP header: 0x0017, 0x07a8, 0x0614, 0x56f0, 0xd31d, 0xaaa4, 0x5018, 0x0068, 0x0000 (checksum field itself is initially zero), 0x0000. From the TCP data: 0x0d0a, 0x0d0a, 0x4d63, 0x4d61, 0x7374, 0x6572, 0x2055, 0x6e69, 0x7665, 0x7273, 0x6974, 0x7920, 0x5641, 0x5820, 0x3836, 0x3030, 0x0d0a, 0x0d00 (we have an odd number of data bytes, so the last byte is zero-filled on the right to form a 16-bit word). From the pseudo-header: 0x0100, 0x000b (from the source IP address), 0x0100, 0x0023 (from the destination IP address), 0x0006 (zero-extended IP Protocol word, 0x6 = TCP), 0x0037 (the TCP Length, i.e. the length of the TCP header and data). Here, the 32-bit sum of zero-extended words is 0x 0007 4e28, so the 16-bit one's complement sum is 0x4e2f and the checksum is 0xb1d0, as required. Note the TCP length must be calculated as the total IP Length (0x4b in this case) minus the length of the IP header (5 32-bit words in this case, or 0x14 (decimal 20) bytes). The TCP header itself does not store the number of bytes of TCP data, so the TCP layer relies on the IP layer to supply it with this information. This describes how the sender of a packet calculates the checksum. The receiver, on the other hand, can verify the checksum quickly in the following manner: it simply one's complement sums the 16-bit words and checks if the result is 0xffff (except UDP's special case behavior must again be taken into account) -- if it is, the checksum is correct and the packet data is valid. A moment's thought should show why this works. Recall that when the sender calculated the checksum, the checksum field itself was zero; when the receiver looks at the header, however, the field has been filled in. The checksum is the one's complement of the one's complement sum, and whenever a number and its one's complement are added together, the result is 0xffff. -- Many Americans work side by side with space aliens who look human -- but you can spot these visitors by looking for certain Gordan Palameta tip-offs, say experts. mnetor!maccs!gordan