Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!uwm.edu!rpi!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!mimsy!nems!dtix!curt
From: curt@dtix.dt.navy.mil (Welch)
Newsgroups: comp.protocols.tcp-ip
Subject: Re: *.jhuapl.edu -- serious gateway thrashing
Message-ID: <1110@nems.dt.navy.mil>
Date: 27 Feb 90 00:19:20 GMT
References: <4790@aplcen.apl.jhu.edu>
Sender: news@nems.dt.navy.mil
Reply-To: curt@dtix.dt.navy.mil (Curt Welch)
Distribution: usa
Organization: David Taylor Research Center, Bethesda, MD
Lines: 73

In article <4790@aplcen.apl.jhu.edu> trn@aplcen.apl.jhu.edu (Tony Nardo) writes:
>For the past few weeks, links between the *.jhuapl.edu nodes and the
>non-MILNET community have been somewhat unstable.  Today, however, is
>the first time that I've seen an extreme case of gateway thrashing:
>
>warper.110% traceroute uunet.uu.net
>traceroute to uunet.uu.net (192.48.96.2), 30 hops max, 40 byte packets
> 1  apl-b3-gw (128.244.3.1)  0 ms  10 ms  0 ms
> 2  apl-gw (128.244.1.1)  0 ms  10 ms  0 ms
> 3  RESTON-DCEC-MB.DDN.MIL (26.21.0.104)  290 ms MARINA-DEL-REY-MB.DDN.MIL (26.
>6.0.103)  320 ms  330 ms

  [multiple MB hops deleted]

>12  * CAMBRIDGE-MB.DDN.MIL (10.3.0.5)  3920 ms *
>etc.
>
>Does anyone have any insights as to how this thrashing starts?  How it
>may be stopped?

We have been seeing this same problem for weeks.  One minute,
traceroute shows a normal route off of the MILNET through one of the
mail-bridges, and the next minute, we see traceroute output like the
example above.  Our packets are being passed around between the mail
bridges, but they never leave the MILNET/ARPANET.  Whenever this
gateway thrashing starts, it lasts long enough to break TCP
connections.

It has gotten so bad in the last week that it almost stopped our news
feed.  The nntp connections, when they could get started, would only
last for about 3 to 5 minutes before being disconnected.

For weeks, ftp and telnet connections to anywhere off of the MILNET
have been terrible.  They would only last a few minutes before
disconnecting, and even when they were connected they were really to
slow to use.

For the past few months, ftp connections to non MILNET sites have been
getting worse and worse.  I installed traceroute a month ago in an
effort to get a handle on these network problems.

When I first saw this problem, I assumed that some of the mail bridges
must be going down.  Now, I would guess that this problem is being
caused by too much traffic through the mail bridges.

Who runs the mail bridges and who can tell me what's going on?

What has been changing in the past few months that has caused this?

Has the traffic really been increasing or has the number of gateways
been decreasing?  Or is something much more complex causing this problem?

Why do the mail bridges bounce packets around like that?  Do they really
think that the best route is through the other bridge or do they use
a packet routing algorithm that gives the packets to another bridge
when the queue for the outbound link is full?. 

Who do I need to contact to get this problem resolved?

Do we have to get a connection to NSFnet to get away from this problem?

Is there anything we could be doing wrong to cause this?

Is there anything we can change to get around this problem?

Thanks in advance for any help anyone can give us.
(while I can still talk to you...)

Curt Welch
curt@dtix.dt.navy.mil

P.S. Our gateway to the MILNET is through dtrc-b1-gw.dt.navy.mil,
     a cisco router, MILNET address 26.22.0.81.