Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!pasteur!ucbvax!PURDUE.EDU!narten From: narten@PURDUE.EDU (Thomas Narten) Newsgroups: comp.protocols.tcp-ip Subject: Re: PSN 7 End-to-End question. Message-ID: <8801312221.AA01892@percival.cs.purdue.edu> Date: 31 Jan 88 22:21:14 GMT References: <8801291809.AA01914@Pescadero> Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 133 > Presumably, the EE and Imp-to-imp protocols also consume the INTERNAL > resources of the network while they are doing this managing. Is there > any evidence to assure us that these protocols are a net performance > win over a simple, lean and mean best-efforts datagram service, which is > all that IP/TCP wants and can use? Total "best effort" systems work well only as long as the switches and communications lines run below maximum capacity. Once maximum capacity is reached or exceeded, problems arise. Many of them are solvable, but they must be addressed, and the resulting system may no longer be "lean and mean". 1) Best effort systems rely totally on hosts for congestion management. That is, transport protocols are responsible for congestion control and congestion avoidance. In practice, existing protocols don't play by those rules. For instance, only recently (Van Jacobson's work) have TCP implementations started reacting (in a positive way) to congestion. UDP based protocols implement no congestion control at all. I cringe at the thought of running NFS across the Internet. 2) Without TOS priority, all datagrams are considered equal. Routing protocols suffer just as much from congestion as other protocols, but consequences are much more severe. Nowhere in the Internet (that I know of) are the datagrams that are used for the exchange of routing information given precedence over others. 3) As John Nagle describes, congestion collapse is inevitable in situations where transport protocols send more packets into the network in response to congestion. Furthermore, I know of no point-to-point networks that are designed to run steady-state at or above maximum capacity. They all assume that the network will be lightly loaded. Some networks (e.g. the ARPANET) take steps that guarantee this. 4) In order for best effort systems to work well, transport protocols must practice congestion avoidance. Congestion control deals with congestion once it exists, congestion avoidance is aimed at keeping congestion from ever reaching the point where the congestion control mechanism kicks in. Congestion avoidance aims to run the network at maximum throughput *AND* minimum delay. Consider TCP: its window tends to open as far as it can (8 packets at 1/2K each). The network is forced to buffer the entire window of packets. If the two endpoints are separated by a slow speed link, most of the packets will be buffered there. Congestion and delay increase. Congestion could be reduced without reducing throughput by decreasing the size of the window. In reality, TCP won't do anything unless a packet is dropped, or a source quench is received. A new mechanism is needed to distinguish between high delays due to congestion from those due to the transmission media (e.g. satellite vs. terrestrial links). Raj Jain's work deserves close study in this regard. 5) In the present Internet, congestion avoidance is a dream. That pushes congestion control/avoidance into the gateways and physical networks. In many systems, congestion control simply resorts to dropping the packet that just arrived that can't be put anywhere. Some of the issues: a. Fairness: He who transmits the most packets, gets the most resources. This discourages well-tuned protocols, and encourages antisocial behavior. b. "Fair queuing" schemes limit the resources that a particular class of packets can allocate. For instance, Dave Mills' selective preemption scheme limits buffer space according to source IP addresses. There are fairness issues here too. All connections and protocols from the same source are lumped into the same class. Does the "right" thing happen when a TCP connection competes with a NETBLT connection? c. Queuing strategies increase rather than decrease the per-packet overhead. Furthermore, the information used to group datagrams into classes must be readily available. In the worst case, you have to be able to parse higher layer packet headers. d. These queuing strategies rely entirely on local rather than global information. It may be that 90% rather than 20% of the packets should be discarded; a link two hops away might be even more congested. 6) Because of (1) above, physical network designers should give considerable thought to congestion control/avoidance. The ARPANET practices congestion avoidance. That is one of the biggest reasons that one of the oldest networks, based on "old" and "obsolete" technology still works extremely well in today's environment. People should be much more careful in distinguishing between the ARPANET and the Internet. For instance, "the ARPANET is congested", usually really means that the gateways connected to it are congested, or the Internet routing mechanism has broken down. My understanding of ARPANET internals is as follows: Before packets can be sent, an end-to-end VC is opened to to the destination IMP. I call this type of VC "weak", because it provides VC services, but is actually implemented as a sliding window protocol (roughly) similar to TCP. "Strong" VCs refer to those in which buffers and routes are preallocated, and packet switches contain state information about the circuits that pass through them. Inside the ARPANET, IP packets are fragmented into small 200+ byte datagrams that are sent through the network using best effort delivery. The destination IMP reassembles them and sends back an acknowledgement that advances the window. The number of packets in the network at any given time for any given src/dest IMP pair is limited. This essentially limits the total number of packets in the network at any one time, resulting in one form of congestion avoidance. Presumably the window size (8 IP packets) has been chosen based on extensive engineering considerations. This scheme also raises the same fairness issues described above. For instance, should (or shouldn't) the gateways at the NSFnet/ARPANET gateways be able to get more resources than site X? Of course, total best effort systems have advantages over other schemes. One is their relative simplicity, and the loose coupling among gateways and packet switches. Another is the ability of one user to grab a large percentage of all available network resources. Although considered a disadvantage if the user is a broken TCP implementation, it is necessary if a user is to expect good performance running a well tuned bulk transfer protocol (e.g. NETBLT). > What is the best reference to understand how these protocols manage the > network resources, particularly in dealing with network congestion? > Thanks, > David Cheriton I too am interested in further references, especially those relating to best effort systems. Thomas Narten