Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!cbosgd!ulysses!ucbvax!JUPITER.BELLCORE.COM!karn
From: karn@JUPITER.BELLCORE.COM.UUCP
Newsgroups: mod.protocols.tcp-ip
Subject: A New TCP Retransmission Algorithm
Message-ID: <8704080059.AA20330@jupiter.bellcore.com>
Date: Tue, 7-Apr-87 19:59:48 EST
Article-I.D.: jupiter.8704080059.AA20330
Posted: Tue Apr  7 19:59:48 1987
Date-Received: Sat, 11-Apr-87 03:27:11 EST
Sender: daemon@ucbvax.BERKELEY.EDU
Distribution: world
Organization: The ARPA Internet
Lines: 112
Approved: tcp-ip@sri-nic.arpa

I'd like to describe a TCP retransmission algorithm that I've been using
with excellent results for the last several months.

Background

TCP has the following well known problem.  When an ACK finally returns for a
segment that was retransmitted, you don't know which transmission caused it.
Therefore you can't be certain what the round trip time (RTT) for this
packet was.  Different implementations react differently to this problem:

1.  Some assume that the first transmission(s) was/were lost and restart
the round trip timer on each retransmission.

2.  Others ignore the measurement (since it isn't reliable), keeping their
previous estimates of the round trip time.

Both approaches share a serious problem.  Consider the case where the
retransmission occurred because the smoothed round trip time (SRTT) was too
small, not because packets are being lost.  If approach #1 is used, the
returning ACK is in fact responding to the FIRST transmission, so timing
from a later transmission gives a measurement that is too small. Because the
measurements are erroneously small, SRTT never adapts to the true value;
because the SRTT never adapts to its true value, unnecessary retransmissions
keep happening, resulting in erroneously small RTT measurements.  This
vicious circle can go on essentially forever.  If the measurement is instead
rejected as unreliable, SRTT is never updated, and again unnecessary
retransmissions go on forever.  Believe me, I've seen more than my share of
TCPs behave this way.

The current accepted way out of this problem is to assume the worst whenever
a timeout occurs by measuring the RTT from the FIRST transmission of a given
segment.  If network loading suddenly increases, or if the initial estimate
of the round trip time was too low, there will be a few retransmissions but
the correct value *will* be found; the TCP eventually learns and unnecessary
retransmissions stop.  As long as the network packet loss rate is low, this
approach works quite well. Retransmissions caused by the occasional dropped
packet result in erroneously large round trip time measurements being fed
into the smoother, but this is better than erring on the low side.  Since
few packets are being dropped, subsequent packets will most likely bring the
SRTT value back down to its "real" value long before the next packet is
dropped, so the timeout won't waste too much time.

A serious problem develops, however, when the network is very lossy.  A
"raw" amateur packet radio channel (no link level acknowledgments) often
loses 25% or more of the packets sent due to collisions and poor signal
levels.  Such rates drive this algorithm crazy, especially when consecutive
packets are lost.  In this situation, retransmission intervals are
increasing rapidly due to back-off, and these intervals are added to the
total round trip measurement for the segment. Enough such measurements can
can drive the smoothed round trip time estimate right through the roof.
This is especially true when a nonlinear smoothing function is used that
reacts more quickly to increases in round trip delay than to decreases
(e.g., Mills, RFC-889).

Some people might say that the answer is simple: just constrain the round
trip time to some arbitrary limit, say, 30 or 60 seconds, but there are real
problems with this.  Just how do you pick this "arbitrary" value? Through
measurement? Just like the Dow Jones average, any number you see today is
almost guaranteed to be exceeded tomorrow.  If the delay through the
Internet exceeds your arbitrary timeout limit, you start retransmitting
unnecessarily, further adding to whatever it is that is causing the large
delay in the first place.  Perhaps we need robots to stand in the office of
every protocol hacker:

	WARNING WARNING DANGER WILL ROBINSON!
	ARBITRARY PROTOCOL TIMEOUT DETECTED!
	EVENTUAL CONGESTION COLLAPSE IS NOW GUARANTEED!
	(SOONER OR LATER)

A Better Way

Thinking there *had* to be a better way, I devised and implemented the
following rules:

1. If a segment being ACKed was sent only once, update the estimated round
trip time and recompute the timeout interval. (This is standard behavior).

2. If a timeout occurs, back off the timeout timer interval (e.g., double
it), retransmit the oldest unacknowledged segment, and restart the timeout
timer. (Again, nothing unusual).

3. When an ACK comes in for a segment that was sent more than once (i.e.,
retransmitted at least once) do NOT attempt to measure its round trip time;
leave the smoothed estimate alone.  FURTHERMORE, KEEP THE BACKED-OFF TIMER
SETTING FOR THE *NEXT* SEGMENT, AND DO NOT RECOMPUTE IT FROM (BETA*SRTT) UNTIL
IT OR A LATER SEGMENT HAS BEEN ACKED WITHOUT A RETRANSMISSION.

The purpose of this last rule is as follows.  If the retransmission was
caused by a too-small SRTT, keeping the backed-off timeout for the following
segment gives an ACK a chance to come back without the RTT measurement being
spoiled by an unnecessary retransmission.  This gives a valid data point
that can be fed into the smoothed estimate, driving it toward its true
value.  On the other hand, if the timeout was caused by the packet being
lost altogether, the estimated round trip time will not (and should not)
change.  Only valid RTT measurements are fed into the smoother, and the
timeout is always backed off whenever retransmissions occur until valid RTT
measurements are again obtained. 

This algorithm seems to work extremely well in practice. Even when the
packet loss rate is so high as to cause many packets to be lost in a row,
SRTT always reflects a "sane" value.  If the network delay suddenly
increases, there may be a short period where the retransmission timeout
"oscillates" between a value based on the SRTT and the backed-off interval
necessary to allow an ACK to come back without a retransmission, but this
stops when the computed SRTT catches up with the current network delay.
Dave Mills' nonlinear algorithm shortens this period by allowing the
smoothed estimate to react more quickly to sudden increases, but even
without it the algorithm always converges.

Comments?

Phil Karn