Path: utzoo!attcan!uunet!lll-winken!decwrl!ucbvax!MANTA.NOSC.MIL!ron From: ron@MANTA.NOSC.MIL (Ron Broersma) Newsgroups: comp.protocols.tcp-ip Subject: Re: *.jhuapl.edu -- serious gateway thrashing Message-ID: <9003020608.AA10396@manta.nosc.mil> Date: 2 Mar 90 06:08:31 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 33 I'm wondering if some of this gateway thrashing isn't related to the fact that EGP packets from the MILNET core started exceeding 4096 bytes a month or two ago. At that time, I was tracing some thrashing problems and I noticed the following symptoms. Over the course of an hour, the packets would gradually increase in size. Just as they got within 10 to 20 bytes of 4096, many of the EGP implementations would suddenly start getting checksum errors or buffer overflows because they had 4K buffers. The ones that got a few packets with bad checksums would suddenly stop peering with that core gateway. Then, all of a sudden the EGP packets out of the core would be smaller by a few hundred bytes because of many fewer peers. As the EGP players all tried to acquire a different gateway, they would not get the checksum errors for an hour or so until the packets approached 4096 bytes and they would again perform this dance-of-the-gateways. The message here is to make sure your EGP implementation can handle packets larger than 4K bytes. The most recent gated supports 8K packets as I recall. Something I had considered was to make a list of all the networks that disappeared from the EGP packets right after the "dance". Then if one could determine who announced those nets to the core you could get a handle on where the broken EGP implementations were located. There's some other strangeness going on too. We had a case this week where one site running EGP was announcing its network to the core but the core wasn't telling anybody else about it. By peering with a different mailbridge, it started working. Strange. And to top it off, the ground started shaking yesterday. But we think that is an unrelated (hardware) problem. --Ron