Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!husc6!panda!genrad!decvax!ucbvax!MIT-MULTICS.ARPA!JSLove From: JSLove@MIT-MULTICS.ARPA.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: A New TCP Retransmission Algorithm Message-ID: <870410003051.138454@MIT-MULTICS.ARPA> Date: Thu, 9-Apr-87 19:30:00 EST Article-I.D.: MIT-MULT.870410003051.138454 Posted: Thu Apr 9 19:30:00 1987 Date-Received: Sat, 11-Apr-87 21:50:19 EST Sender: daemon@ucbvax.BERKELEY.EDU Distribution: world Organization: The ARPA Internet Lines: 103 Approved: tcp-ip@sri-nic.arpa Counterproposal: the following rules are rather complex, and require more memory than some implementations may use (e.g., a timestamp per packet), but they may prevent the sort of explosion we see in SRTT somewhat better, and they use information which is already available. This combines several schemes, used without credit, so if you think you recognize the serial numbers, you probably do. 1. Keep a timestamp associated with each packet in the retransmission queue which tells when it was first transmitted. This timestamp is provided by IP, and indicates when the packet was given to the device driver. Since retransmissions should use the same IP identifier for reassembly, it is reasonable for TCP and IP to share the same copy of the packet for retransmission purposes. 2. Never retransmit a packet which hasn't been transmitted yet. There are a number of implementations which seem to have this problem. In fact, never retransmit a packet which hasn't had a certain minimum delay since its last transmission: SRTT * VR * RB ** RC Where SRTT is smoothed round trip time, VR is variance ratio, RB is retransmission base, and RC is retransmission count. 3. Let's set VR and RB to 2. This is arbitrary; substitute values to taste. RC is initially zero, and is reset for every packet. It doesn't have to be stored in the packet, since only one packet is retransmitted at a time. The SRTT for the first packet is unknown and doesn't matter. The RTTs used to calculate the SRTT use a variable weight, SF (smoothing factor) which is initially zero. The MSF (maximum SF) is also arbitrary, but values from 4 to 8 seem reasonable. 4. When we transmit the first packet, we set the timer to one second. After the timer has elapsed, we retransmit, and increase the interval geometrically (by RB), so we retransmit after 2 more seconds, then 4, and so on. 5. When the acknowledgement comes back after 10 seconds from the first transmission, we have possible RTTs of 10, 9, 7, and 3. We take the WCRTT (worst case RTT) of 10. 6. For subsequent datagrams, we distinguish between three kinds of RTT: a) the datagram was retransmitted (so we have a WCRTT). b) the datagram wasn't retransmitted, but it was acknowledged at the same time as a subsequent datagram, or a retransmission of a preceding datagram was sent subsequent to the transmission of this datagram. We can't be certain how much of the RTT was due to the influence of other datagrams, so we have an MRTT (merged RTT). c) the datagram wasn't retransmitted, and no retransmissions of preceding datagrams were transmitted after this one. We have an ARTT (actual RTT). 7. Each time an acknowledgement packet is processed, we may produce RTTs, including at most one WCRTT or ARTT, and possibly several MRTTs. All contribute to the SRTT, but at different weights. For the first packet (SYN) or if they would tend to decrease the SRTT, all types count at full weight, so SF = minimum (SF + 1, MSF) SRTT = ((SF - 1) * SRTT + RTT) / SF If they would tend to increase the SRTT, then the weights differ, so that an ARTT counts at full weight, a WCRTT at a lesser weight (perhaps half), and an MRTT somewhere in between. This means that an estimate of the RTT is more strongly influenced by optimistic or reliable data. This should help the algorithm converge on a small number. 8. Depending on the nature of the RTT from the preceding packet, we choose one of the following functions to generate the minimum retransmission interval for the current packet: Delay = maximum (SRTT, ARTT) * VR Delay = maximum (SRTT * VR, MRTT) Delay = maximum (SRTT * VR, WCRTT * PLR) PLR is the packet loss ratio. Lacking a filter to determine PLR as well as VR, it should be set to some arbitrary value sufficient to keep the delay from blowing up in the face of high packet loss rates. Let us assume that packet loss rates (in both directions) never average more than 50%, so a PLR of .5 should be sufficient. 9. Don't ever use a simple timer to determine if a connection has died. Instead, kill the connection when RC gets to a large value. Assuming that SRTT is one, a large value might be 8 (nearly 5 minutes), but the general idea is that the geometric backoff means that a long delay will be needed to determine that the connection is broken, so rather than pick a fixed time interval you can base this on the fixed time interval of consecutive lost packets. Note that failing to acknowledge a packet isn't necessarily enough information since if the remote site closes its window it can still indicate that it is alive by acknowledging old data. However, the geometric backoff means that it might take a very long time to detect the reopened window. A window opening ACK wouldn't be acceptable unless it opened the window farther than it had been before being closed. Does anyone admit to actually closing windows? This might be an argument to limit the maximum retransmission interval to say, two minutes. Even then, you might want a connection to keep trying forever. Comments?