Path: utzoo!attcan!uunet!lll-winken!ncis.llnl.gov!helios.ee.lbl.gov!pasteur!ucbvax!BRL.MIL!reschly From: reschly@BRL.MIL ("Robert J. Reschly Jr.") Newsgroups: comp.protocols.tcp-ip Subject: Re: Odd FTP Problem Message-ID: <8902240120.aa16481@SEM.BRL.MIL> Date: 24 Feb 89 06:20:27 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 494 Mark, A couple of weeks ago BRL noted severe difficulties with connectivity. We were able to trace this to ICMP Network Unreachables (we're a BSD shop), which appeared to be the result of core route "flopping". At the end of this message, I'll tack on the message I sent to BBN on the subject. The raw data file mentioned in that message is still available if anyone is masochisitic enough to want to look at it. When we spoke to BBN the afternoon before sending that message, they told us they had identified a problem with routing, and a message received the next afternoon confirmed that a fix to the problem alluded to in the phone conversation would be fielded in the next few days. By mid-week the following week, connectivity did indeed appear to be better than previously. Since this last change, we still see more routing variability than we feel should be present though it does look better than before the change. One curious thing, has anyone else noticed the EGP peers bouncing in and out? We peer with BMILDCEC, BMILBBN, and BMILMTR in that order (though we only exchange updates with one at any given time), and we are continually having to re-acquire one or more of these beasties (as I write this, the gateway is trying to acquire BMILMTR). Have these gateways been bouncing up and down a lot? We have also started looking at the EGP information we are getting a little more closely, and have seen hopcounts as high as 62(!). In the last few days, our PSN insufficient resource (type 4) messages are haunting us again. We had earlier reported these and BBN reconfigured our PSN with more space allocated to buffers to lessen the severity of that problem. I suppose we'll have to complain about this again. Has anyone else noted any interesting behavior since the change? Later, Bob -------- Phone: (301)278-6678 AV: 298-6678 FTS: 939-6678 Arpa: reschly@BRL.MIL (or BRL.ARPA) UUCP: ...!brl-smoke!reschly Postal: Robert J. Reschly Jr. U.S. Army Ballistic Research Laboratory Systems Engineering and Concepts Analysis Division Advanced Computer Systems Team ATTN: SLCBR-SE (Reschly) APG, MD 21005-5066 (Hey, *I* don't make 'em up!) **** For a good time, call: (303) 499-7111. Seriously! **** ================ Date: Thu, 9 Feb 89 5:51:55 EST From: "Robert J. Reschly Jr." To: meason@wash.bbn.com, amalis@bbn.com cc: jcst@BRL.MIL Subject: More Node 29 troubles. Mike, Here is a summary of our recent experience and a copy of Phil's message. First, the incompletes are still with us though they appear to be at the reduced level we noted after the PSN buffer configuration changes. The only note here is that these messages are still coming in at a much greater rate than before our switching to EGP peering with the Buttergates. We are currently seeing these 5 to 10 (on average) times an hour, rather than 5 to 10 times a day. Second, as Phil notes in the enclosed message, we have been suffering from what looks like significant routing instability since switching to EGP peering with the Buttergates. The variability in numbers of reported routes was noted as soon as we switched, but we did not notice any actual reachability problems until a while later. A typical sequence would be: Establish a connection (e.g. FTP, TELNET, rlogin); everything appears fine, connectivity is good and round trip times are reasonable. After a few minutes of operation, suddenly the the connection freezes. The connection usually closes at this time. Attempt to restart the connection -- this usually fails Wait a few minutes, then attempt to restart the connection. This usually succeeds as if there was never any problem. At this point the cycle repeats. Running an experiment with ping shows that the loss of communication coincides with the receipt of ICMP Network Unreachable messages. I ran a ping experiment against louie.udel.edu to see if I could duplicate and record the symptoms today. I'll include a summary from the first part of that at the end of this message, and will put the raw data, (roughly 1.3MB collected over 4 hours between 1800 EST and 2200 EST 8 Feb 1989) in the public FTP area of vgr.brl.mil. Note that since this is a script of a terminal session, there are a few control characters and escape sequences buried in this file. We currently EGP peer with the buttergates at DCEC and CAMBRIDGE as our primary and fallback. I have also made some changes to the gateway software to extract a bit more information but have nothing to present at this time. The raw data is the composite of a 15 second timestamp loop, the ping, and the gateway console all smashed together and intertwingled. The ping generates the "xx bytes" messages as well as the verbose dumps of most other ICMP messages. Much of the gateway console output is prepended by ": ", though there are a few messages which are different (e.g. "ICMP redirect" and "UPTIME" messages. The gateway software is of local origin. If you have any questions about any of it, get in touch with us and we will clarify. Finally, you will find a number of "milr: msg with link 27 from 4/48" followed by an equal number of "milr: pack len , format 15, illen " messages. The values range over a small set for each. We only started noticing these today, but had not been closely watching the gateway for the few days prior to today. The "link" parameter is the link type from the IMP leader -- we are 1822 connected. I hope this stuff helps. Later, Bob -------- Phone: (301)278-6678 AV: 298-6678 FTS: 939-6678 Arpa: reschly@BRL.MIL (or BRL.ARPA) UUCP: ...!brl-smoke!reschly Postal: Robert J. Reschly Jr. U.S. Army Ballistic Research Laboratory Systems Engineering and Concepts Analysis Division Advanced Computer Systems Team ATTN: SLCBR-SE (Reschly) APG, MD 21005-5066 (Hey, *I* don't make 'em up!) **** For a good time, call: (303) 499-7111. Seriously! **** ----- Forwarded message # 1: Received: from smoke.brl.mil by SEM.BRL.MIL id aa07207; 2 Feb 89 7:56 EST Received: from SMOKE.BRL.MIL by SMOKE.BRL.MIL id aa12789; 2 Feb 89 7:52 EST Received: from SRI-NIC.ARPA by SMOKE.BRL.MIL id aa12653; 2 Feb 89 7:45 EST Received: from vgr.brl.mil by SRI-NIC.ARPA with TCP; Thu, 2 Feb 89 01:47:18 PST Date: Thu, 2 Feb 89 4:41:04 EST From: Phil Dykstra To: tcp-ip@sri-nic.arpa Subject: Instability in the Core Message-ID: <8902020441.aa16937@VGR.BRL.MIL> Tonight I was trying to talk to some machines on XEROX-NET (net 13), and once again was hit with oscillating Net-Up/Net-Unreachable. This has been happening to me for the past several days for net 13 as well as several other nets (FYI, I'm 26.2.0.29). We have been getting EGP info from the RESTON-DCEC Butterfly (26.21.0.104). I started watching tonight to see why these routes kept appearing and disappearing and found major unrest in the routing information we were getting. Here are nine consecutive EGP routing updates (taken at three minute intervals). They span 0400 EST. Int Ext Routes (~A B C) 5 95 479 6 85 536 5 95 401 6 86 598 17 333 263 6 84 507 15 266 241 5 94 456 8 270 193 6 91 599 16 335 263 4 93 453 8 266 194 6 87 580 17 321 257 The fields are number of internal and external EGP gateways, total number of routes, and the approximate number of class A, B, and C (approx because this includes a few of our fixed routes). I have complete EGP dumps for the last six updates if anyone wishes to study the changes. It really bothers me that the number of class A networks could double/half every three minutes! There is also a 10% to 50% change in the total number of routes every three minutes. One wouldn't expect the number of internal EGP gateways to change so fast either [thought the LSI-11's used to flop like that too]. It is nearly impossible to get data through when the routes come and go this fast. I realize that the Butterfly folks are probably working on this, but I wasn't sure everyone was aware how bad things are right now (I recall one other TCP-IP note about it). Is there anything we can do to help diagnose this? - Phil uunet!brl!phil ----- End of forwarded messages ================ Script started on Wed Feb 8 18:11:57 1989 PING louie.udel.edu (128.175.1.3): 56 data bytes 64 bytes from 128.175.1.3: icmp_seq=0 time=466 ms ... through ... 64 bytes from 128.175.1.3: icmp_seq=95 time=433 ms 64 bytes from 128.175.1.3: icmp_seq=96 time=981 ms Wed Feb 8 18:14:15 EST 1989 64 bytes from 128.175.1.3: icmp_seq=96 time=1948 ms <<