Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!mit-eddie!genrad!decvax!ucbvax!CCQ.BBN.COM!pogran From: pogran@CCQ.BBN.COM (Ken Pogran) Newsgroups: comp.protocols.tcp-ip Subject: Re: NSFNET woe: causes and consequences Message-ID: <8710041724.AA27780@ucbvax.Berkeley.EDU> Date: Sun, 4-Oct-87 13:10:59 EDT Article-I.D.: ucbvax.8710041724.AA27780 Posted: Sun Oct 4 13:10:59 1987 Date-Received: Wed, 7-Oct-87 06:25:19 EDT Sender: daemon@ucbvax.BERKELEY.EDU Distribution: world Organization: The ARPA Internet Lines: 130 Dave, The message you sent to the tcp-ip list the other day regarding the NSFNET woes you observed caused us here at BBN to put on our thinking caps. We worked to understand how what you saw relates to what we know about what's happening in the ARPANET these days. I think we already understood what was behind a good bit of what you observed, and your message gave us the impetus to investigate a few more things as well. This message describes the situation as we understand it. There are four separate underlying issues: 1. The number of "reachable networks" in the Internet has just nudged upwards of 300 for the first time. (The Internet used to be growing at a rate of about 10 networks/month; that rate has accelerated over the past few months.) 2. For the week ending Thursday, 1 October, the ARPANET handled a record 202 million packets. (Traffic over the past few months has been in the 180s -- itself a record over last spring.) 3. We've begun the "beta test" on the ARPANET of the new PSN software release, PSN 7.0, and -- sure enough -- there have been a few problems. And, finally, 4. The limit, that you described in your message, of 64 virtual circuits in the ACC 5250 X.25 driver that is used by several X.25-connected gateways on the ARPARNET The first two issues just demonstrate that things continue to get busier and busier in the ARPANET and in the Internet. We've put out a new version of LSI-11 "core gateway" software that allows for 400, rather than 300, reachable gateways to give the core some breathing room again. And I shudder to think what ARPANET (and, hence, Internet) performance would be like if we tried to handle over 200 million packets per week without the so-called "Routing Patch" that was installed late in the summer that considerably improved the performance of the ARPANET routing algorithm. I think the third issue, the beginning of the PSN 7.0 beta test on the ARPANET, contributed to some of what you saw and helped to obscure some of the other causes of what you observed. As you know, last weekend, we put PSN 7 into a portion of the ARPANET. CMU was one of the nodes that got PSN 7. PSN 7 contains a new "End-to-End" protocol for management of the flow of data between source PSNs and destination PSNs. It's the first re-do of the End-to-End protocol in the ARPANET EVER. We're expecting a lot of improvement in efficiency within the PSN and, hence, some network performance improvement. To make a graceful, phased cutover to the New End-to-End feasible, PSN 7.0 contains code for both the new and the old End-to-End protocols. So as we've introduced PSN 7.0, it's been with the OLD end-to-end protocol. Now unfortunately, having code for two End-to-End protocols coresident takes up memory space that would normally go to buffers, etc. for handling traffic. So, yes -- during the 3-4 week phased cutover, the ARPANET PSN's will be a little short on buffer space; there's not much that can be done about that. But once ALL nodes are cut over to the New End-to-End protocol, we will install PSN 7.1, which will remove the old End-to-End, reclaim that memory space, and -- in the case of the ARPANET nodes in which C/300 processors have replaced the C/30s -- be able to use DOUBLE the main memory. Back to the problem at hand: You mentioned the report of "resource shortage"s in the PSNs. This happened with the CMU PSN for reasons we still don't understand. However, this WASN'T "the usual BBN euphemism for ... connection blocks which manage ARPANET virtual circuits" that you suggested in your message -- we've usually got plenty of those these days. The resource shortage the CMU PSN reported to the NOC had to do with the PSN's X.25 interface. Since several higher-priority problems showed up with PSN 7, we decided the best thing to do was to return the CMU node to PSN 6 and work on this one later. We have some preliminary ideas of what might have happened, and we'll be investigating this week. As for delays in the ARPANET: It turns out that the version of PSN 7.0 that was deployed last weekend contained a bug in the "Routing Patch" that worsened, instead of improved, the performance of the routing algorithm. We are frankly embarassed about that. This problem was fixed Thursday night, 1 October -- about the time you sent your message. We'd be very interested in hearing from you how things looked from the NSFNet side THIS weekend. From your description it certainly sounds like the 64 VC limit in the ACC 5250 is the proximate cause of the problem at CMU last weekend. We now count 83 gateways attached to the ARPANET. A gateway on the ARPANET that's handling a lot of diverse traffic to other gateways as well as to other ARPANET hosts is very likely to need more than 64 VCs. We think we can provide a work-around for this problem over the short term. The PSN has a "idle timer" for each VC, and can initiate a Close of the VC if it hasn't been used for awhile. We can configure that timer to be pretty short and thus recyle the gateway's VCs. Of course, some overhead will be incurred to re-establish a VC to send the next IP datagram to that destination, but that's probably preferable to having things plug up for lack of VCs. Note that by having the PSN reclaim idle VCs, we shouldn't see much "loss of data" that you alluded to in your message. We would be happy to work with administrators at sites that have gateways with ACC 5250s who would like to try this out. In closing, let me say that we at BBN share your concerns about the issues to be faced as the ARPANET evolves toward a gateway-to-gateway service from its traditional host-to-host or host-to-gateway service. The way gateways are attached to the network is one of a number of urgent architectural and engineering issues that must be addressed. Regards, Ken Pogran Manager, System Architecture BBN Communications Corporation P.S. TO THE COMMUNITY: As the PSN 7.0 upgrade proceeds in the ARPANET, we'll probably encounter a few more problems. As described in the DDN Management Bulletin distributed earlier, please send reports of problems to ARPAUPGRADE@BBN.COM. BBN will respond.