Path: utzoo!yunexus!maccs!gordan
From: gordan@maccs.UUCP (gordan)
Newsgroups: comp.protocols.tcp-ip
Subject: Checksums (was Re: Ping, checksum algorithm?)
Summary: Detailed description of TCP/IP checksums
Keywords: TCP IP UDP one's complement checksum
Message-ID: <1080@maccs.UUCP>
Date: 23 Mar 88 01:03:47 GMT
Article-I.D.: maccs.1080
Posted: Tue Mar 22 20:03:47 1988
References: <123@heart-of-gold>
Reply-To: gordan@maccs.UUCP ()
Organization: Worldwide Phlogiston Cartel
Lines: 250

In article <123@heart-of-gold> jc@heart-of-gold (John M Chambers x7780 1E342) writes:
-Does anyone have a PD version of ping?  How about a C-coded routine that
-does the IP checksum calculation?  We have one that is written in VAX
-assembly language, which is OK for a VAX, but it doesn't work too well
-on a 68020....


--------------------------------------------------------------------------
  -- One's complement of the one's complement sum checksum in TCP/IP --


In the RFC documents, the "one's complement of the one's complement sum"
checksums are mentioned in a single paragraph, and are never even
described, an omission that seems incredible.

Although the algorithm must be well known to many people, a written
description seems to be lacking.  So here's an attempt to provide a
short description (with examples).  The only authority I can cite for
the following is myself (but it seems to work, judging from actual
TCP/IP packets).  Someone kindly let me know if this is horribly wrong.


Basically, IP, TCP, and UDP require doing one's complement sums of
16-bit words.  That is, you must take a bunch of 16-bit words and sum
them (ignoring overflows) _as if their bit patterns represented one's
complement numbers_.  The trick, then, is doing one's complement
arithmetic on a two's complement machine.

Without going into any arithmetical justification, here's how to do a
one's complement sum on a two's complement machine, in pseudocode:

  INT16 sum;
  INT16 *word;     /* pointer to start of 16-bit words to be summed */

  sum = 0;

  for (i = 0; i < `number of 16-bit words to be summed'; i++)
  {
    `byte-swap word[i], if necessary (see comment on byte-order)'
    sum += word[i];   /* do NOT combine these two lines ... */
    sum += `CARRY';   /* ... into sum += word[i] + CARRY !!!! */
  }

where CARRY is the value of the hardware carry bit (0 or 1), as set by
the addition in the previous line (note you mustn't do sum += word[i] +
CARRY as one line, since a high-level language could rearrange the order
of addition and add the value of the carry bit before it was set).

Of course, the value of the carry bit is not accessible from a
higher-level language like C.  A perfectly equivalent method (very
suitable if your machine has 32-bit integers) is:

  INT32 sum32, word32
  INT16 *word;     /* pointer to start of 16-bit words to be summed */

  sum32 = 0;

  for (i = 0; i < `number of 16-bit words to be summed'; i++)
  {
    `byte-swap word[i], if necessary (see comments on byte-order)'
    `copy word[i] to word32, zero-extended (NOT sign-extended)'
              /* (e.g., 0xedcb -> 0x0000edcb, not 0xffffedcb) */
    sum32 += word32;
  }

  sum = `add the two 16-bit halves of sum32 to each other'

This works, since the carry bit values for 16-bit addition of the least
significant 16-bit word accumulate in the most significant 16-bit word
of the 32-bit sum.  (This is probably what you would use on a 68020 --
and you can forget about byte-swapping on a 68020 as well).

After calculating a one's complement sum, you have to take its one's
complement (invert all the bits) to get the actual checksum used in IP,
TCP, and UDP (but note that UDP treats a calculated checksum of 0x0000
as a special case -- see the RFC).


It is of course necessary to take byte-order into account.

  (Byte-order:  if adjacent memory locations on a machine contain the
  following bytes:

  X   :       0x12
  X+1 :       0x34

  then what is the value of the 16-bit word whose address is X?
  (assuming a byte-addressable machine and valid alignment for X to be
  read as a 16-bit value).

  If the 16-bit value is 0x1234, the machine is said to be
  ``big-endian;'' if it is 0x3412, the machine is ``little-endian.''

Some machine architectures (Motorola 680x0, etc.) are big-endian, others
(Intel 80x86, VAX) are little-endian.  TCP/IP headers use big-endian
byte-order.  Thus, life is easier on a Sun than on a VAX.


Some examples follow, using actual packets (see the appropriate RFC docs
for IP, UDP, and TCP, and ignore the Ethernet stuff).  In case anyone's
curious, the IP addresses here are used on a LAN unconnected to any
outside network (they do not respect the class A/B/C Internet naming
scheme).


     An Ethernet UDP/IP packet
---------------------------------------------------
1:  ff-ff-ff-ff-ff-ff 02-60-8c-09-58-97 08-00      
2:                                            45 00
3:  00-24 00-01 00-00 ff 11 61-31 01-00-58-97 01-00-
4: -00-00                                          
5:        09-46 00-2a 00-10 c9-ca                  
6:                                01 06 4a 48 45 56
7:  41 58                                          
8:        00 00 00 00 00 00 00 00 00 00            
---------------------------------------------------
Line  1:    Ethernet header
Lines 2-8:  Ethernet data

Lines 2-4:  IP header
Lines 5-7:  IP data

Line  5:    UDP header
Lines 6-7:  UDP data

Line  8:    Garbage padding to satisfy Ethernet minimum packet size
            (Ethernet header + data >= 60 bytes).
---------------------------------------------------


    An Ethernet TCP/IP packet
----------------------------------------------------
1:  08-00-2b-02-d2-67 08-00-02-00-51-23 08-00 
2:                                            45 00
3:  00-4b 44-46 00-00 1e 06 56-3a 01-00-00-0b 01-00-
4: -00-23 
5:        00-17 07-a8 06-14-56-f0 d3-1d-aa-a4 50 18
6:  00-68 b1-d0 00-00 
7:                    0d 0a 0d 0a 4d 63 4d 61 73 74
8:  65 72 20 55 6e 69 76 65 72 73 69 74 79 20 56 41
9:  58 20 38 36 30 30 0d 0a 0d 
10:                            00
---------------------------------------------------
Line  1:    Ethernet header
Lines 2-10: Ethernet data

Lines 2-4:  IP header
Lines 5-9:  IP data

Lines 5-6:  TCP header
Lines 7-9:  TCP data

Line  10:   Garbage Ethernet padding (to send an even number of bytes)
----------------------------------------------------


In the first packet, the IP Checksum field is 0x6131 (in the middle of
line 3).  The IP checksum is calculated over all 16-bit words in the
header (except the checksum field itself is taken to be zero, prior to
actually calculating it).  Thus the 16-bit words that go into
calculating the IP checksum are (from lines 2,3,4): 0x4500, 0x0024,
0x0001, 0x0000, 0xff11, 0x0000, 0x0100, 0x5897, 0x0100, 0x0000.

The 32-bit sum of zero-extended words is 0x 0001 9ecd, so the one's
complement sum is 0x9ece.  The one's complement of this is the checksum,
0x6131.

The UDP Checksum field in the same packet is 0xc9ca (at the end of line
5).  Unlike IP, the UDP checksum is calculated not only over the UDP
header, but also over the UDP data, and over a pseudo-header consisting
of the IP source and destination addresses, the IP Protocol field
zero-extended to 16-bits, and a UDP length word.  Again the checksum
field itself is taken to be zero during the actual calculation, since we
can't know its value before actually computing it.

Thus the 16-bit words that go into calculating the UDP checksum are
(from lines 5,6,7): 0x0946, 0x002a, 0x0010, 0x0000, 0x0106, 0x4a48,
0x4556, 0x4158; (and from the pseudo-header): 0x0100, 0x5897, 0x0100,
0x0000, 0x0011 (UDP protocol number = 0x11 or 17 decimal), and 0x0010
(the UDP length).  The 32-bit sum of zero-extended words is 0x 0001
3634, so the 16-bit one's complement sum is 0x3635 and the checksum is
0xc9ca as required.


In the second packet shown, the IP checksum is 0x563a (in the middle of
line 3).  The 16-bit words that go into calculating the IP checksum are
(from lines 2,3,4):  0x4500, 0x004b, 0x4446, 0x0000, 0x1e06, 0x0000,
0x0100, 0x000b, 0x0100, 0x0023.

The 32-bit sum of zero-extended words is 0x 0000 a9c5, so the one's
complement sum is 0xa9c5.  The one's complement of this is the checksum,
0x563a.

The TCP Checksum field in the same packet is 0xb1d0 (the second word in
line 6).  Just as for UDP, the TCP checksum is calculated over all 16-bit
words in the TCP header, data, and pseudo-header.  The 16-bit words that
go into calculating the checksum are:

     From the TCP header:

0x0017, 0x07a8, 0x0614, 0x56f0, 0xd31d, 0xaaa4,
0x5018, 0x0068, 0x0000 (checksum field itself is initially zero),
0x0000.

     From the TCP data:

0x0d0a, 0x0d0a, 0x4d63, 0x4d61, 0x7374, 0x6572,
0x2055, 0x6e69, 0x7665, 0x7273, 0x6974, 0x7920, 0x5641, 0x5820,
0x3836, 0x3030, 0x0d0a, 0x0d00 (we have an odd number of data bytes,
so the last byte is zero-filled on the right to form a 16-bit word).

     From the pseudo-header:

0x0100, 0x000b (from the source IP address),
0x0100, 0x0023 (from the destination IP address),
0x0006 (zero-extended IP Protocol word, 0x6 = TCP),
0x0037 (the TCP Length, i.e. the length of the TCP header and data).

Here, the 32-bit sum of zero-extended words is 0x 0007 4e28, so the
16-bit one's complement sum is 0x4e2f and the checksum is 0xb1d0, as
required.


Note the TCP length must be calculated as the total IP Length (0x4b in
this case) minus the length of the IP header (5 32-bit words in this
case, or 0x14 (decimal 20) bytes).  The TCP header itself does not store
the number of bytes of TCP data, so the TCP layer relies on the IP layer
to supply it with this information.


This describes how the sender of a packet calculates the checksum.  The
receiver, on the other hand, can verify the checksum quickly in the
following manner: it simply one's complement sums the 16-bit words and
checks if the result is 0xffff (except UDP's special case behavior must
again be taken into account) -- if it is, the checksum is correct and
the packet data is valid.

A moment's thought should show why this works.  Recall that when the
sender calculated the checksum, the checksum field itself was zero; when
the receiver looks at the header, however, the field has been filled in.
The checksum is the one's complement of the one's complement sum, and
whenever a number and its one's complement are added together, the
result is 0xffff.


-- 
Many Americans work side by side with space
aliens who look human -- but you can spot
these visitors by looking for certain               Gordan Palameta
tip-offs, say experts.                              mnetor!maccs!gordan