Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!decvax!ucbvax!LBL-CSAM.ARPA!van
From: van@LBL-CSAM.ARPA.UUCP
Newsgroups: mod.protocols.tcp-ip
Subject: Re: Analyzing Acknowledgement Strategies
Message-ID: <8701151850.AA18157@lbl-csam.arpa>
Date: Thu, 15-Jan-87 13:50:15 EST
Article-I.D.: lbl-csam.8701151850.AA18157
Posted: Thu Jan 15 13:50:15 1987
Date-Received: Thu, 15-Jan-87 23:58:59 EST
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The ARPA Internet
Lines: 82
Approved: tcp-ip@sri-nic.arpa

Dave -

  Thanks for the kind words.  I wish the DOE felt the same way.
My network research funds were cut off a year ago because "DARPA
funds transport protocol research, not DOE".  While trying to
find new money, I got the impression that no one funds transport
protocol research since transport protocols are a "solved
problem" (this while the mean Arpanet transit delay was 15
seconds -- I tried to laugh but it hurt too much). 


  I'm not sure which Nagle algorithm you mean.  I was assuming
instantaneous sources and sinks on each end of a connection so
all segments sent were the max segment size.  So the
accumulate-until-the-ack Nagle algorithm didn't apply.  I didn't
know about the send-one-packet-when-retransmitting algorithm at
the time and didn't simulate it. 

  I did do an empirical test of the Nagle retransmit algorithm
shortly after we brought up 4.3bsd.  I ran an A-B-A-B test with
and without the algorithm (4.3 uses the algorithm) and took
statistics on sends, acks, rexmits, etc..  Each of the four test
phases ran a week.  Local network traffic made it hard to assess
the results but it was clear that using the algorithm reduced the
number of retransmitted packets by about 30%. 

  I expected the effect to be larger.  A quick look at some trace
data suggested that there was a problem when you resumed normal
behavior after a retransmit.  Since you'll always be sending into
an empty window at this point, 4.3 will blast out 8 back-to-back
packets.  A couple of packets near the end of this blast are
almost certain to be dropped so you'll end up in retransmit state
2 RTT after the previous recovery.  The new algorithm we want to
try is designed to filter this turn-on transient. 


  I didn't look at median filters while looking at RTT estimators
but I did investigate several other FIR filters.  I think that in
the clock sync problem you want a filter with good low pass
characteristics.  For RTT, I wanted a filter with good transient
response:  Because congestion builds up exponentially, if it's
detected late senders have to take drastic action to clear it and
throughput really takes a hit.  If it's detected early senders
can make small adjustments to damp it out and throughput is
hardly affected.  The simulations suggested that "early" was
really early -- within 3 to 4 packet times of the onset. 

  The problem with FIR filters was that there's usually a
discontinuity in the RTT samples at integer multiples of W, the
window size in mss packets.  If the filter was short or tapered
enough to have good transient reponse, this discontinuity wiped
it out.  If the filter was long enough to smooth the
discontinuity, it had no transient response. 

  Measuring RTT and output packet rate is essentially estimating
a parameter and its time derivative.  This is the well known
"radar tracking problem" (estimating distance and velocity from
radar return echos) and my original estimator was a copy of the
IIR alpha-beta tracker equation from a GE radar handbook.  This
had trouble because it's fixed gain.  When the network is not
congested there's a lot of stochastic noise and the filter gain
has to be low to ignore this noise.  When the network starts to
get congested, the stochastic noise gets squeezed out and you
could use much higher gain.  Investigation of variable gain
filters led me to the Kalman's work on optimal stochastic
estimators. 

  A Kalman filter seems ideal for this problem.  It's simple,
computationally efficient (6 parameters instead of the current 1
but all integer arithmetic) and always has the maximum gain that
the data supports.  Also, if the underlying model is at all close
to reality, it's self tuning so network administrators don't have
to be mathematicians.  I was just going to start on a
Box-Jenkins' style gateway model identification when the money
ran out.  I keep intending to work on this in my copious free
time but ....

  - Van

ps- perhaps we should take this discussion off line?  I've been
    filling up peoples' mailboxes lately and I get the impression
    that there are only two or three of us interested in this topic.