Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!cbatt!cbosgd!ucbvax!XX.LCS.MIT.EDU!Lixia
From: Lixia@XX.LCS.MIT.EDU (Lixia Zhang)
Newsgroups: mod.protocols.tcp-ip
Subject: Re: Why is the ARPANet in such bad shape these days?
Message-ID: <12242742311.36.LIXIA@XX.LCS.MIT.EDU>
Date: Mon, 29-Sep-86 04:32:05 EDT
Article-I.D.: XX.12242742311.36.LIXIA
Posted: Mon Sep 29 04:32:05 1986
Date-Received: Tue, 30-Sep-86 20:25:16 EDT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The ARPA Internet
Lines: 81
Approved: tcp-ip@sri-nic.arpa

The following replies to two internet congestion related messages together.

    Date: Sat, 27 Sep 1986  21:35 EDT
    From: Rob Austein <SRA@XX.LCS.MIT.EDU>
    Subject: Why is the ARPANet in such bad shape these days?
    ......
    The NOC is refering to this mess as a "congestion problem" at the IMP
    level.  The current theory the last few times I talked to the NOC was
    that we have managed to reach the bandwidth limit of the existing
    hardware.  A somewhat scary thought...

Could someone from BBN provide measured network throughput numbers to
convince us that we indeed have hit the HARDWARE bandwidth limit?

					...If this is in fact the case (and
    there is circumstancial evidence that it is, such as the fact that the
    net becomes usable again during off hours), we are in for a long
    siege, since it is guarenteed to take the DCA and BBN a fair length of
    time to deploy any new hardware or bring up new trunks.

Better performance during off hours surely indicates that the problem is
network load-related, but does not necessarily mean that the DATA traffic
has hit the hardware limit -- there is a large percentage of non-data
traffic flowing in the net.  According to the measurement on a number of
gateways, in the week of 9/15-9/21, (as more or less the same for all time)
       43% of all received packets are addressed to a gateway
       48% of all sent packets originate at a gateway
Presumbly these gateway-gateway packets are routing updates, ICMP redirects,
etc.  But why should they take such a high percentage of the total traffic?
Can someone explain to us?

Even for data packets, I wonder if anyone has an idea about how much extra
traffic is generated by the known extra-hop routing problem.  More on this
later.

    ALSO, IF THERE IS ANYBODY FROM BBN WHO KNOWS MORE ABOUT THE PROBLEM
    AND IS WILLING TO SHARE IT, -PLEASE- DO.  IT'S HARD TO MAKE ANY KIND
    OF CONTINGENCY PLANS IN A VACUUM.

    --Rob

I capitalized the sentence, hoping no one would pretend not seeing it.


    Date: Sun, 28 Sep 86 04:48:39 edt
    From: hedrick@topaz.rutgers.edu (Charles Hedrick)
    Subject: odd routings

    I have been looking at our EGP routings.  I checked a few sites that I
    know we talk to a lot.  Our current EGP peers are yale-gw and
    css-ring-gw.  (We keep a list of possible peers, and the gateway picks
    2.  It will change if one of them becomes inaccessible.  This particular
    pair seems to be fairly stable.)  Here I what I found:
    ......
    MIT:  They seem to have 4 different networks.  The ones with direct
	  Arpanet gateways are 18 (using 10.0.0.77) and 128.52 (using
	  10.3.0.6).  EGP was telling us to use 10.3.0.27 (isi) and
	  10.2.0.37 (purdue) respectively...

This is probably caused by the EGP extra-hop problem: if MIT gateways are
EGP neighboring with isi and purdue gateways, all other core gateways will
tell you to go through isi/purdue gateways to get to MIT, even though everyone
is on ARPANET.  This should be a contributor to the cognestion too.

One question is: Can anyone tell us WHEN this extra-hop problem will be
completely eliminated?

Another question is how the stubs select core EGP neighbors; if they all
concentrate on a small number of core gateways, bottlenecks will be created,
because the extra-hop problem says that if a stub gw EGP-neighbors with a
core gw, most traffic to the stub is likely to travel through that core gw
as well.  Hedrick listed their coded-in core EGP gateway candidates in his
message.  Is the same list used by all non-core gateways?  Does someone know
how many stub gateways EGP-neighbor with one core gateway?  Will some
stub-core rebinding help relieve the congestion?

In short, reducing network overhead and fixing some long-standing protocol
problems may be a way to relieve the current poor net performance.

Lixia
-------