Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!ames!ucbcad!ucbvax!SUN.COM!nowicki From: nowicki@SUN.COM.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Congestion Message-ID: <8702112124.AA00479@rose.sun.com> Date: Wed, 11-Feb-87 16:24:18 EST Article-I.D.: rose.8702112124.AA00479 Posted: Wed Feb 11 16:24:18 1987 Date-Received: Fri, 13-Feb-87 22:14:30 EST Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 35 Approved: tcp-ip@sri-nic.arpa I am not sure which is the right group for this discussion, but the recent congestion problems have brought up two important points. First, the MX record support from Berkeley for sendmail does not do any caching. Perhaps they thought the local name server would cache, but not when the desired name server is down. For example, last week Decwrl.DEC.COM was essentially unreachable from the Arpanet. The DEC.COM name servers are either on the other side of Decwrl (128.45), or behind other unreliable gateways (net 36). Thus mail started to pile up, and we quickly had hundreds of messages sitting in the queue. Each run through the queue did hundreds of MX lookups which had to timeout. I extended our simple cache (which already remembered if hosts are up or down) to cache the result of the MX request (especially if the request timed out). This got the queue flowing again. Second, there seems to be a bug in the HDH code of the PSNs (aka IMPs). During periods of congestion, the HDLC layer blocks us from sending back the "Host Up" messages that are required in HDH. The PSN then declares us to be down, clears its buffers, then immediately hears the Host Up message and declares us to be back up. This happens every few minutes during the day. Not only does throwing the buffered data away increase congestion in the short term by causing more retransmissions, there are higher-level instabilities. If a host tries to send us a TCP segment or ACK during the time that the IMP thinks we are down, they get a "Host Dead" message and reset the TCP connection, which means the entire mail message has to be retransmitted. This just makes matters worse. I have tried to contact BBN about the second problem, since it is a bug in their software, but I keep getting the run-around. The NOC people just say "must be congestion". I KNOW it is congestion, but it still is a bug! Does anyone at BBN read these lists? -- Bill Nowicki Sun Microsystems