Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!ucbvax!GLACIER.STANFORD.EDU!jbn From: jbn@GLACIER.STANFORD.EDU.UUCP Newsgroups: mod.protocols.tcp-ip Subject: Re: Re: RING vs. ETHER - Theory and practice. Message-ID: <8607302323.AA13783@ucbvax.Berkeley.EDU> Date: Wed, 30-Jul-86 18:25:24 EDT Article-I.D.: ucbvax.8607302323.AA13783 Posted: Wed Jul 30 18:25:24 1986 Date-Received: Thu, 31-Jul-86 17:45:07 EDT References: <8607211628.AA13382@ucbvax.Berkeley.EDU> Sender: daemon@ucbvax.BERKELEY.EDU Reply-To: glacier!jbn (John B. Nagle) Organization: Stanford University, IC Laboratory Lines: 53 Approved: tcp-ip@sri-nic.arpa 1. If you are losing packets due to having too few receiving buffers in your Ethernet controller, get a modern Ethernet controller. The worst known offender is the old 3COM Multibus Ethernet controller used in early SUN systems; not only does it have only two receiving buffers, it has no overrun detection, and thus the software never tallies the many packets it tends to lose. 2. If you are losing packets due to congestion problems in a TCP-based system, this can be fixed; see my various RFCs on the subject. "Improving" the protocol by adding extra acknowledgements or fancier retransmission schemes is NOT the answer. I've developed some workable solutions that are documented in RFCs and implemented in 4.3BSD. 3. The real need for link-level acknowledges, or at least some indication of non-delivery that works most of the time, is for routing around faults. Ethernets transmit happily into black holes; when the destination dies, the source never knows. When the destination Ethernet node is a gateway, and said gateway goes down, there is no low-level way for the sending Ethernet node to notice this and divert to an alternate gateway. This is a serious problem in hi-rel systems, because we have no standard way for a host on a multi-gateway Ethernet to behave which will cause it to divert from one gateway to another when one gateway fails. There are a number of approaches to this problem, all of them lousy: - Ignore it and put up with at least minutes and perhaps indefinite downtime when a supposedly redundant gateway fails. (Considered unacceptable in military systems) - Shorten the ARP timeout to 10 seconds or so and spend excessive resources sending ARPs. (Tends to cause one retransmit every 10 seconds due to non-clever ARP implementations). - Let the hosts participate in some kind of nonstandard routing protocol so they can tell when a gateway dies. (No good for off-the-shelf hosts). - Let the transport layer inform the datagram layer when a retransmit occurs, so that the datagram layer can trigger the selection of a different gateway; if this causes selection of an up but ill-chosen gateway, a redirect from that gateway corrects the situation. (Some code to do this is in 4.2BSD, but it wasn't fully implemented.) It's all so much easier if you have link-level failure-to deliver indications. John Nagle