Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-spam!ames!ucbcad!ucbvax!UDEL.EDU!Mills From: Mills@UDEL.EDU.UUCP Newsgroups: comp.protocols.tcp-ip Subject: Life after source quench Message-ID: <8711081434.aa25516@Huey.UDEL.EDU> Date: Sun, 8-Nov-87 14:34:43 EST Article-I.D.: Huey.8711081434.aa25516 Posted: Sun Nov 8 14:34:43 1987 Date-Received: Sat, 14-Nov-87 14:56:18 EST Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 139 Folks, Thanks to Hans-Werner Braun, who scrounged the log of the NCAR (National Center for Atmospheric Research) fuzzball gateway on the NSFNET Backbone net, we may have additional insight as to the effectiveness of its quench mechanism and the implications for TCP implementations. The NCAR fuzzball is seriously overloaded at times and, using the preemption and quench policies described previously to this group, can be quite vocal about it. ICMP Source Quench messages are sent when the mean queue length exceeds about 1.5 and at a rate depending on the number of 256-octet blocks queued for a selected host. Presently, only the host with the largest number of blocks is selected on the assumption that quenchable flows do not occur very often and are almost always due to a single host. See my previous messages for justification. The following data illustrates typical scenarios found at NCAR. Each line represents one quench message sent for traffic in the direction shown between the two hosts. The two three-digit numbers are the ICMP type and code fields (octal), where the code (second) field reveals the number of 256-octet blocks queued at the time the quench was sent. (This interpretation of the code field is at variance with the spec, but this is research, right?) As expected, quenchable flows are relatively infrequent and are characterized by large traffic surges lasting up to several minutes. For example, the code field for the first line shows 120 (170 octa1) 256-octet segments sent by host 128.6.4.7 to host 128.102.16.10 living on a single output queue! In the first surge the flow lasted about a minute during which six quenches were sent. It is not cear from these data what the preemption policy was doing, but it is likely that some quantity of packets were being dropped during this period. HOST : 128.6.4.7 : RUTGERS.EDU,RUTGERS.RUTGERS.EDU,RUTGERS.ARPA : SUN-3/180 18:25:45 ?TRAP-I-ICMP 004 170 [128.6.4.7] -> [128.102.16.10] 18:25:46 ?TRAP-I-ICMP 004 135 [128.6.4.7] -> [128.102.16.10] 18:25:47 ?TRAP-I-ICMP 004 105 [128.6.4.7] -> [128.102.16.10] 18:26:38 ?TRAP-I-ICMP 004 127 [128.6.4.7] -> [128.102.16.10] 18:26:42 ?TRAP-I-ICMP 004 140 [128.6.4.7] -> [128.102.16.10] 18:26:44 ?TRAP-I-ICMP 004 135 [128.6.4.7] -> [128.102.16.10] The next surge shows a seven-minute surge at the beginning and two shorter surges at the end, with only sporadic quenches between. HOST : 128.117.8.7 : (unlisted - who is this USAN dude?) 20:26:43 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:26:45 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:26:46 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:27:16 ?TRAP-I-ICMP 004 101 [128.117.8.7] -> [128.118.28.2] 20:27:28 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:27:30 ?TRAP-I-ICMP 004 151 [128.117.8.7] -> [128.118.28.2] 20:27:30 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:27:45 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:28:11 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.118.28.2] 20:28:12 ?TRAP-I-ICMP 004 142 [128.117.8.7] -> [128.118.28.2] 20:28:32 ?TRAP-I-ICMP 004 142 [128.117.8.7] -> [128.118.28.2] 20:28:33 ?TRAP-I-ICMP 004 110 [128.117.8.7] -> [128.118.28.2] 20:28:48 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:29:31 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:29:31 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:30:27 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.118.28.2] 20:30:47 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:31:23 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:31:24 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:31:36 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:31:37 ?TRAP-I-ICMP 004 151 [128.117.8.7] -> [128.118.28.2] 20:31:41 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:31:53 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:32:06 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:32:07 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:32:10 ?TRAP-I-ICMP 004 142 [128.117.8.7] -> [128.118.28.2] 20:32:27 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:32:33 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:32:34 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.118.28.2] 20:32:35 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:32:56 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:32:58 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:33:15 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.118.28.2] 20:48:14 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.118.28.2] 20:54:21 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.118.28.2] 21:46:50 ?TRAP-I-ICMP 004 104 [128.117.8.7] -> [128.118.28.2] 21:46:51 ?TRAP-I-ICMP 004 104 [128.117.8.7] -> [128.112.18.2] 21:58:30 ?TRAP-I-ICMP 004 124 [128.117.8.7] -> [128.112.18.2] 21:58:37 ?TRAP-I-ICMP 004 110 [128.117.8.7] -> [128.112.18.2] 23:28:31 ?TRAP-I-ICMP 004 112 [128.117.8.7] -> [128.112.18.2] 23:28:35 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.112.18.2] 23:28:36 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.112.18.2] 23:28:37 ?TRAP-I-ICMP 004 133 [128.117.8.7] -> [128.112.18.2] 23:28:40 ?TRAP-I-ICMP 004 115 [128.117.8.7] -> [128.112.18.2] 23:28:43 ?TRAP-I-ICMP 004 106 [128.117.8.7] -> [128.112.18.2] The next one is a real hotrod, with 17 quenches in 24 seconds. HOST : 129.93.1.3 : (unlisted) 22:04:31 ?TRAP-I-ICMP 004 113 [129.93.1.3] -> [128.84.252.18] 22:04:35 ?TRAP-I-ICMP 004 110 [129.93.1.3] -> [128.84.252.18] 22:04:36 ?TRAP-I-ICMP 004 116 [129.93.1.3] -> [128.84.252.18] 22:04:36 ?TRAP-I-ICMP 004 127 [129.93.1.3] -> [128.84.252.18] 22:04:37 ?TRAP-I-ICMP 004 140 [129.93.1.3] -> [128.84.252.18] 22:04:37 ?TRAP-I-ICMP 004 151 [129.93.1.3] -> [128.84.252.18] 22:04:38 ?TRAP-I-ICMP 004 146 [129.93.1.3] -> [128.84.252.18] 22:04:38 ?TRAP-I-ICMP 004 132 [129.93.1.3] -> [128.84.252.18] 22:04:39 ?TRAP-I-ICMP 004 105 [129.93.1.3] -> [128.84.252.18] 22:04:51 ?TRAP-I-ICMP 004 113 [129.93.1.3] -> [128.84.252.18] 22:04:52 ?TRAP-I-ICMP 004 143 [129.93.1.3] -> [128.84.252.18] 22:04:52 ?TRAP-I-ICMP 004 165 [129.93.1.3] -> [128.84.252.18] 22:04:53 ?TRAP-I-ICMP 004 170 [129.93.1.3] -> [128.84.252.18] 22:04:53 ?TRAP-I-ICMP 004 157 [129.93.1.3] -> [128.84.252.18] 22:04:54 ?TRAP-I-ICMP 004 140 [129.93.1.3] -> [128.84.252.18] 22:04:54 ?TRAP-I-ICMP 004 127 [129.93.1.3] -> [128.84.252.18] 22:04:55 ?TRAP-I-ICMP 004 102 [129.93.1.3] -> [128.84.252.18] I am told the Craymonsters do in fact something useful with ICMP Source Quench messages. There is some evidence for that in the following, which shows a surge lasting about a minute, but with the quenches mostly spread out at about thirty-second intervals (except the last one), not in terrible spasms like the above. If a Craymonster can be tamed with a quench every thirty seconds or so, they may be pussycats, not monsters, after all. HOST : 128.174.10.48 : NCSAD.ARPA : CRAY-X/MP : 22:15:30 ?TRAP-I-ICMP 004 102 [128.174.10.48] -> [128.84.252.18] 22:16:06 ?TRAP-I-ICMP 004 110 [128.174.10.48] -> [128.84.252.18] 22:16:36 ?TRAP-I-ICMP 004 110 [128.174.10.48] -> [128.84.252.18] 22:16:37 ?TRAP-I-ICMP 004 124 [128.174.10.48] -> [128.84.252.18] The data suggest that a quench policy operating with a relatively long integration time, such as the fuzzball policy and the policy suggested by Raj Jain (the so-called DEC-bit) can indeed be effective. However, it is not at all clear from the above data that the surges are due to a single TCP connection, unless that connection was using window sizes in the 26000-octet range. If multiple connections are involved, an effective quench strategy may need to operate over several simultaneous and concurrent connections and retain state over periods up to a minute or more. The operating system would then have to restrain individual connections as a function of environment variables independent of window modulation by the protocol itself. If it is true that single connections with humungus windows are most prevalent, then TCP window-drawdown strategies such as previously suggested would work peachy-keen. Comments from the host administrators of the above hosts would be welcome. Can somebody describe the Craykitten anti-monster implementation? Dave