Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!cbosgd!ulysses!ucbvax!JUPITER.BELLCORE.COM!karn From: karn@JUPITER.BELLCORE.COM.UUCP Newsgroups: mod.protocols.tcp-ip Subject: A New TCP Retransmission Algorithm Message-ID: <8704080059.AA20330@jupiter.bellcore.com> Date: Tue, 7-Apr-87 19:59:48 EST Article-I.D.: jupiter.8704080059.AA20330 Posted: Tue Apr 7 19:59:48 1987 Date-Received: Sat, 11-Apr-87 03:27:11 EST Sender: daemon@ucbvax.BERKELEY.EDU Distribution: world Organization: The ARPA Internet Lines: 112 Approved: tcp-ip@sri-nic.arpa I'd like to describe a TCP retransmission algorithm that I've been using with excellent results for the last several months. Background TCP has the following well known problem. When an ACK finally returns for a segment that was retransmitted, you don't know which transmission caused it. Therefore you can't be certain what the round trip time (RTT) for this packet was. Different implementations react differently to this problem: 1. Some assume that the first transmission(s) was/were lost and restart the round trip timer on each retransmission. 2. Others ignore the measurement (since it isn't reliable), keeping their previous estimates of the round trip time. Both approaches share a serious problem. Consider the case where the retransmission occurred because the smoothed round trip time (SRTT) was too small, not because packets are being lost. If approach #1 is used, the returning ACK is in fact responding to the FIRST transmission, so timing from a later transmission gives a measurement that is too small. Because the measurements are erroneously small, SRTT never adapts to the true value; because the SRTT never adapts to its true value, unnecessary retransmissions keep happening, resulting in erroneously small RTT measurements. This vicious circle can go on essentially forever. If the measurement is instead rejected as unreliable, SRTT is never updated, and again unnecessary retransmissions go on forever. Believe me, I've seen more than my share of TCPs behave this way. The current accepted way out of this problem is to assume the worst whenever a timeout occurs by measuring the RTT from the FIRST transmission of a given segment. If network loading suddenly increases, or if the initial estimate of the round trip time was too low, there will be a few retransmissions but the correct value *will* be found; the TCP eventually learns and unnecessary retransmissions stop. As long as the network packet loss rate is low, this approach works quite well. Retransmissions caused by the occasional dropped packet result in erroneously large round trip time measurements being fed into the smoother, but this is better than erring on the low side. Since few packets are being dropped, subsequent packets will most likely bring the SRTT value back down to its "real" value long before the next packet is dropped, so the timeout won't waste too much time. A serious problem develops, however, when the network is very lossy. A "raw" amateur packet radio channel (no link level acknowledgments) often loses 25% or more of the packets sent due to collisions and poor signal levels. Such rates drive this algorithm crazy, especially when consecutive packets are lost. In this situation, retransmission intervals are increasing rapidly due to back-off, and these intervals are added to the total round trip measurement for the segment. Enough such measurements can can drive the smoothed round trip time estimate right through the roof. This is especially true when a nonlinear smoothing function is used that reacts more quickly to increases in round trip delay than to decreases (e.g., Mills, RFC-889). Some people might say that the answer is simple: just constrain the round trip time to some arbitrary limit, say, 30 or 60 seconds, but there are real problems with this. Just how do you pick this "arbitrary" value? Through measurement? Just like the Dow Jones average, any number you see today is almost guaranteed to be exceeded tomorrow. If the delay through the Internet exceeds your arbitrary timeout limit, you start retransmitting unnecessarily, further adding to whatever it is that is causing the large delay in the first place. Perhaps we need robots to stand in the office of every protocol hacker: WARNING WARNING DANGER WILL ROBINSON! ARBITRARY PROTOCOL TIMEOUT DETECTED! EVENTUAL CONGESTION COLLAPSE IS NOW GUARANTEED! (SOONER OR LATER) A Better Way Thinking there *had* to be a better way, I devised and implemented the following rules: 1. If a segment being ACKed was sent only once, update the estimated round trip time and recompute the timeout interval. (This is standard behavior). 2. If a timeout occurs, back off the timeout timer interval (e.g., double it), retransmit the oldest unacknowledged segment, and restart the timeout timer. (Again, nothing unusual). 3. When an ACK comes in for a segment that was sent more than once (i.e., retransmitted at least once) do NOT attempt to measure its round trip time; leave the smoothed estimate alone. FURTHERMORE, KEEP THE BACKED-OFF TIMER SETTING FOR THE *NEXT* SEGMENT, AND DO NOT RECOMPUTE IT FROM (BETA*SRTT) UNTIL IT OR A LATER SEGMENT HAS BEEN ACKED WITHOUT A RETRANSMISSION. The purpose of this last rule is as follows. If the retransmission was caused by a too-small SRTT, keeping the backed-off timeout for the following segment gives an ACK a chance to come back without the RTT measurement being spoiled by an unnecessary retransmission. This gives a valid data point that can be fed into the smoothed estimate, driving it toward its true value. On the other hand, if the timeout was caused by the packet being lost altogether, the estimated round trip time will not (and should not) change. Only valid RTT measurements are fed into the smoother, and the timeout is always backed off whenever retransmissions occur until valid RTT measurements are again obtained. This algorithm seems to work extremely well in practice. Even when the packet loss rate is so high as to cause many packets to be lost in a row, SRTT always reflects a "sane" value. If the network delay suddenly increases, there may be a short period where the retransmission timeout "oscillates" between a value based on the SRTT and the backed-off interval necessary to allow an ACK to come back without a retransmission, but this stops when the computed SRTT catches up with the current network delay. Dave Mills' nonlinear algorithm shortens this period by allowing the smoothed estimate to react more quickly to sudden increases, but even without it the algorithm always converges. Comments? Phil Karn