Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site ucbvax.BERKELEY.EDU Path: utzoo!decvax!decwrl!ucbvax!tcp-ip From: mike@BRL.ARPA (Mike Muuss) Newsgroups: mod.protocols.tcp-ip Subject: Gateway Slots Message-ID: <8512120559.AA10948@ucbvax.berkeley.edu> Date: Wed, 11-Dec-85 21:08:43 EST Article-I.D.: ucbvax.8512120559.AA10948 Posted: Wed Dec 11 21:08:43 1985 Date-Received: Fri, 13-Dec-85 02:13:55 EST Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 124 Approved: tcp-ip@sri-nic.arpa Sirs - I am writing this letter to bring to your attention a serious operational problem with the CORE gateway system which provides routing connectivity between the ARPANET, MILNET, SATNET, and all LANs within the InterNet system. Briefly stated, the problem is that the current core gateway software only has room for a fixed number of routes between networks, currently about 100. (I'll call these routing table entries "slots"). Within the past few weeks, the number of networks (mostly LANs) connected to the InterNet system has exceeded the number of slots, resulting in a shortage of slots. Attempts to provide routing information to the core system are processed only as slots become available -- on a first-come, first-served basis. Some gateway somewhere has to crash to relinquish a slot for another gateway to gain connectivity. MAJOR FAILURE IN OPERATIONAL SYSTEM. This past weekend, due to an extensive power outage, both of BRL's gateways were down, relinquishing the slots we had been using. BRL's IMP resumed operating Sunday night, and BRL's 2 Gateways resumed operation Monday morning, but BRL was completely without network connectivity throughout the day Monday as we waited for slots to become available within the core gateway system. Lack of slots prevented any access to or from the MILNET, blocking mail flow between BRL and AMC-HQ, USNA, ARDC, WSMR, and the numerous other hosts we do regular business with. Fortunately, other gateways went down through the day, and by Monday evening BRL had reacquired routing slots. A one-day network outage was no disaster, and we survived. However, if we loose network connectivity for a day or more every time our gateways or IMP go down, BRL has a major operational problem. Unless corrective action is taken, this problem will steadily become worse, because more and more MILNET sites will be operating attached LANs, and traffic is shifting from directly attached hosts to LAN-attached hosts. BRL feels the effect of this problem more keenly because BRL hosts are exclusively LAN-attached. However, all LAN-attached hosts within the InterNet system are affected by this problem! This problem was also encountered a few months ago, and BBN responded promptly by increasing the number of slots to the current limit. BBN is aware of the current problem, and is investigating solutions. However, they may not be able to increase the table sizes this time, due to limited memory in the core gateways. The medium-term solution to this problem would be to replace all LSI-11/03 core gateways with LSI-11/23 gateways, which have 4 times as much memory. I am under the impression that BBN has already developed software which takes advantage of the extra memory in the 11/23. The long-term solution is, of course, to replace the core gateway system with Butterfly gateways, but that is a long time away. SHORT-TERM SOLUTION NEEDED. There are several options. 1) Take administrative action. Insist that the most recent N new networks connected to the InterNet system immediately disconnect, until the number of available slots can be increased. 2) Provide a technological response. Instituting emergency measures, rapidly replace the core gateway system with 11/23 systems. 2a) Have BBN immediately upgrade all 11/03 systems withing the GGP core. 2b) If BBN does not have necessary equipment on hand, or en route, additional 11/23 system could be borrowed. For example, BRL has an 11/23 system which is temporarily not being used. BRL would be willing to loan it to DCA on a short term basis until BBN could procure the necessary 11/23 hardware. Certainly there are enough unused 11/23 systems throughout the combined Services that an immediate hardware solution could be implemented using loaned equipment. 3) Apply software magic, and increase the current table size without changing any hardware. This may be easy, but more likely it will be costly in time, costly in manpower, or simply impossible. MEDIUM-TERM DISASTER AWAITS. Even assuming that the current difficulty can be overcome, this problem will reappear again soon in another form. Indeed, the second stage of this problem is almost upon us. Here, the difficulty is again a growth limitation in the core gateway software. The core exchanges routing information between it's gateways using GGP (Gateway-to-Gateway Protocol). There exists an upper limit on the length of a GGP packet, and GGP is currently defined so as to contain information about the total InterNet system in a single packet. Thus, when the number of gateways increases beyond the number that can fit in a GGP packet, we will again experience competition for "slots" -- this time GGP packet "slots". Again, several solutions exist: 1) Administratively prohibit connecting more LANs than the GGP protocol can support. 2) Modify or extend the GGP protocol and the supporting core gateway software to ease or eliminate the current limits. 3) Replace the GGP protocol with something else (no finished design for a replacement exists yet, although it is being thought about). 3a) Replace GGP within the existing 11/23 systems with the new protocol. 3b) Replace all the 11/23 systems with Butterfly systems and the new protocol. Current plans for GGP replacement are being formed within BBN and the GADS Task Force (chaired by the able Dave Mills). I would like to suggest that the priority of this task be elevated, and that it's funding be increased. Investing in an extra man-year now might give us a long-term solution to this problem before disaster strikes. (I might also point out that the GADS Task Force is presently operating with little or no funding). Either GADS or BBN must get switched into "high gear" to solve this problem. SUMMARY. The "lack of slots" problem is upon us. Serious operational failures have already been experienced, and the problem will be getting worse. A short term solution is needed. Several options are available, none expensive. Worse, a secondary form of the problem will strike soon, even if we weather the current storm. Solutions can be found, but all will require effort and money. Spending money takes time, so we need to worry now. Sincerely, Mike Muuss Leader, Advanced Computer Systems Team U. S. Army Ballistic Research Lab