Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!decvax!genrad!panda!talcott!harvard!seismo!caip!topaz!hedrick From: hedrick@topaz.RUTGERS.EDU (Charles Hedrick) Newsgroups: net.bugs.4bsd Subject: fix to TCP hangs and slow transfers Message-ID: <4517@topaz.RUTGERS.EDU> Date: Mon, 3-Mar-86 00:55:54 EST Article-I.D.: topaz.4517 Posted: Mon Mar 3 00:55:54 1986 Date-Received: Tue, 4-Mar-86 04:38:27 EST Organization: Rutgers Univ., New Brunswick, N.J. Lines: 52 First, I should warn you that the problem I am about to describe was observed on a Pyramid 90X. However a quick perusal of other source suggests that the problem is probably present in our Sun 2.0 source and in 4.3. So I conclude that this problem is generic to 4bsd implementations. However symptoms may or may not be present on other systems, depending upon the details of how they use the variable rcv_adv. The symptom is that connections attempting to send data from a DEC-20 or Symbolics 3600 to Unix hang. Or connections from any kind of system may become super-slow (like about 1000bit/sec on an Ethernet). I now believe that the problem is due to incorrect initialization of rcv_adv. This variable indicates the receive window advertised to the other end. However it is not a window size. It is a sequence number, namely the largest sequence number that the other end has ever been authorized to send. This is sort of a "high water mark", since silly-window prevention can cause the window to shrink. In such cases rcv_adv does not become less. Except when this window shrinking has happened, the actual advertised window size is rcv_adv - rcv_nxt. Now for the bug. rcv_adv is set in only one place, in tcp_output: if (SEQ_GT(tp->rcv_nxt+win, tp->rcv_adv)) tp->rcv_adv = tp->rcv_nxt + win; This works fine, except for the first time. rcv_adv is initialized to zero. Unfortunately, sequence numbers are compared using a modulo arithmetic, such that some sequence numbers are actually less than zero. If a connection has such "negative" sequence numbers, then this test always fails, and rcv_adv is never updated. rcv_adv is used only one place, in tcp_output to calculate when to issue window updates. For connections that have bad values of rcv_adv, the effect can be missing window updates. If the TCP implementation on the other end is correct, it will eventually issue a probe, and the connection will be restarted. However such connections may be mysteriously slow. If the TCP implementation at the other end does not issue zero-window probes (TOPS-20), or issues them incorrectly (Symbolics, apparently -- there is some evidence that their probe has a data length of zero), then the connection will simply hang. Different Unix versions may use slightly different tests for when to do window updates. So the probability of hanging will depend upon the implementation. The fix that I recommend is to change the definition of tcp_rcvseqinit so that it initializes rcv_adv as well as rcv_nxt. #define tcp_rcvseqinit(tp) \ (tp)->rcv_nxt = (tp)->irs + 1; (tp)->rcv_adv += (tp)->rcv_nxt The obvious code would be (tp)->rcv_adv = (tp)->rcv_nxt. However sometimes rcv_adv is given a non-zero value before the sequence numbers are initialized. So it seems safer to use the code above.