Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!husc6!panda!genrad!decvax!ucbvax!MIT-MULTICS.ARPA!JSLove
From: JSLove@MIT-MULTICS.ARPA.UUCP
Newsgroups: mod.protocols.tcp-ip
Subject: Re: A New TCP Retransmission Algorithm
Message-ID: <870410003051.138454@MIT-MULTICS.ARPA>
Date: Thu, 9-Apr-87 19:30:00 EST
Article-I.D.: MIT-MULT.870410003051.138454
Posted: Thu Apr  9 19:30:00 1987
Date-Received: Sat, 11-Apr-87 21:50:19 EST
Sender: daemon@ucbvax.BERKELEY.EDU
Distribution: world
Organization: The ARPA Internet
Lines: 103
Approved: tcp-ip@sri-nic.arpa

Counterproposal:  the following rules are rather complex, and require
more memory than some implementations may use (e.g., a timestamp per
packet), but they may prevent the sort of explosion we see in SRTT
somewhat better, and they use information which is already available.
This combines several schemes, used without credit, so if you think you
recognize the serial numbers, you probably do.

1.  Keep a timestamp associated with each packet in the retransmission
queue which tells when it was first transmitted.  This timestamp is
provided by IP, and indicates when the packet was given to the device
driver.  Since retransmissions should use the same IP identifier for
reassembly, it is reasonable for TCP and IP to share the same copy of
the packet for retransmission purposes.

2.  Never retransmit a packet which hasn't been transmitted yet.  There
are a number of implementations which seem to have this problem.  In
fact, never retransmit a packet which hasn't had a certain minimum delay
since its last transmission:

   SRTT * VR * RB ** RC

Where SRTT is smoothed round trip time, VR is variance ratio, RB is
retransmission base, and RC is retransmission count.

3.  Let's set VR and RB to 2.  This is arbitrary; substitute values to
taste.  RC is initially zero, and is reset for every packet.  It doesn't
have to be stored in the packet, since only one packet is retransmitted
at a time.  The SRTT for the first packet is unknown and doesn't matter.
The RTTs used to calculate the SRTT use a variable weight, SF (smoothing
factor) which is initially zero.  The MSF (maximum SF) is also
arbitrary, but values from 4 to 8 seem reasonable.

4.  When we transmit the first packet, we set the timer to one second.
After the timer has elapsed, we retransmit, and increase the interval
geometrically (by RB), so we retransmit after 2 more seconds, then 4,
and so on.

5.  When the acknowledgement comes back after 10 seconds from the first
transmission, we have possible RTTs of 10, 9, 7, and 3.  We take the
WCRTT (worst case RTT) of 10.

6.  For subsequent datagrams, we distinguish between three kinds of RTT:

a) the datagram was retransmitted (so we have a WCRTT).

b) the datagram wasn't retransmitted, but it was acknowledged at the
same time as a subsequent datagram, or a retransmission of a preceding
datagram was sent subsequent to the transmission of this datagram.  We
can't be certain how much of the RTT was due to the influence of other
datagrams, so we have an MRTT (merged RTT).

c) the datagram wasn't retransmitted, and no retransmissions of
preceding datagrams were transmitted after this one.  We have an ARTT
(actual RTT).

7.  Each time an acknowledgement packet is processed, we may produce
RTTs, including at most one WCRTT or ARTT, and possibly several MRTTs.
All contribute to the SRTT, but at different weights.  For the first
packet (SYN) or if they would tend to decrease the SRTT, all types count
at full weight, so

   SF = minimum (SF + 1, MSF)
   SRTT = ((SF - 1) * SRTT + RTT) / SF

If they would tend to increase the SRTT, then the weights differ, so
that an ARTT counts at full weight, a WCRTT at a lesser weight (perhaps
half), and an MRTT somewhere in between.  This means that an estimate of
the RTT is more strongly influenced by optimistic or reliable data.
This should help the algorithm converge on a small number.

8.  Depending on the nature of the RTT from the preceding packet, we
choose one of the following functions to generate the minimum
retransmission interval for the current packet:

   Delay = maximum (SRTT, ARTT) * VR
   Delay = maximum (SRTT * VR, MRTT)
   Delay = maximum (SRTT * VR, WCRTT * PLR)

PLR is the packet loss ratio.  Lacking a filter to determine PLR as well
as VR, it should be set to some arbitrary value sufficient to keep the
delay from blowing up in the face of high packet loss rates.  Let us
assume that packet loss rates (in both directions) never average more
than 50%, so a PLR of .5 should be sufficient.

9.  Don't ever use a simple timer to determine if a connection has died.
Instead, kill the connection when RC gets to a large value.  Assuming
that SRTT is one, a large value might be 8 (nearly 5 minutes), but the
general idea is that the geometric backoff means that a long delay will
be needed to determine that the connection is broken, so rather than
pick a fixed time interval you can base this on the fixed time interval
of consecutive lost packets.

Note that failing to acknowledge a packet isn't necessarily enough
information since if the remote site closes its window it can still
indicate that it is alive by acknowledging old data.  However, the
geometric backoff means that it might take a very long time to detect
the reopened window.  A window opening ACK wouldn't be acceptable unless
it opened the window farther than it had been before being closed.  Does
anyone admit to actually closing windows?  This might be an argument to
limit the maximum retransmission interval to say, two minutes.  Even
then, you might want a connection to keep trying forever.

Comments?