Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site nrcvax.UUCP Path: utzoo!watmath!clyde!burl!ulysses!gamma!epsilon!zeta!sabre!petrus!bellcore!decvax!decwrl!pyramid!hplabs!sdcrdcf!psivax!nrcvax!msd From: msd@nrcvax.UUCP (Marc Dye) Newsgroups: net.lan Subject: Re: Sun TCP bug (huge window) Message-ID: <587@nrcvax.UUCP> Date: Sun, 23-Mar-86 00:13:53 EST Article-I.D.: nrcvax.587 Posted: Sun Mar 23 00:13:53 1986 Date-Received: Tue, 25-Mar-86 04:57:54 EST References: <225@mc0.UUCP> Reply-To: msd@nrcvax.UUCP (Marc Dye) Organization: Network Research Research Corp. Oxnard,CA Lines: 131 Summary: Two additional potential bugs in Sun TCP release 2.2 <> The article referenced referred to three pieces of information which correlate well enough to symptoms we have seen to merit at least a mention of ours. First, a noticeable functional difference between an older Sun configuration (at least from a networking standpoint). Second, a difference in Sun OS version, the potentially problematic one being 2.2. Third, a difference in Ethernet controller, the potentially problematic one being Sun's own (vs. the 3Com 3C400). As a point of reference, NRC sells FUSION, a networking product which runs on various operating systems, link layer technologies, CPU configurations, and provides various protocols (including TCP/IP). The nadir of the hardware venue is a 3Com 3C500/3C501 controller which has a single receive/xmit buffer; this renders it deaf to the network for non-trivial amounts of time. This 'feature' also exercises some of the finer points of network implementations. One of the uglier ones is the 4.2BSD dynamic retransmission / RTT algorithm, but I digress.... Networking implementations based on 4.2BSD are all somewhat different it seems. We had (courtesy of Sun via their Catalyst program) a loaned system; this was a Sun model 150 (one of the old square black variety with the battleship-type keyboard). The 150 was Multibus- based, used the 3Com 3C400 Ethernet controller, and ran a *really old* version of the Sun 4.2 O/S (1.0 I think). We recently took delivery of a new Sun 2/130 which has an integral Sun-designed Ethernet controller. It came with a distribution of Sun 4.2 O/S version 2.0 and an upgrade for version 2.2. For some time we had been getting field reports of customers having peculiar problems with various incantations of 4.2 networking. What was most surprising was that some of these were Sun workstations, since our software had cut its teeth on that variety. We could never reproduce these problems on the Sun 150 we had. They all all got delivered to us the day we got the new Sun 2/130. The first problem scenario relates to the Sun getting confused about what data has been acknowledged from the remote host. This correlates to a most peculiar sequence of packets generated by the Sun, which weren't generated by the older Sun 150. The following is an excerpt from an NRC field technical report: "The most noticable symptom of this problem is a hung connection. This problem occurs as a function of inability of Ethernet controllers to receive 100% of their network traffic. In other words, the dumber the controller, the more this is likely to happen. The most severe case I've seen is on an IBM XT w/ a 3C500 controller; 'telnet' to the Sun and then a 'vi' of a non-trivial file usually can't get through two full screen paints before hanging. At this point, you can still type the 'telnet' escape character (usu. '^]'), do a 'close' and the local host seems ok. Actually, the connection may still be lying around *forever* consuming a socket. Analysis of the network traffic shows something like: 0)- login, get your fortune, blah, blah, ... 1)- ask 'vi' to paint a whole screen 2)- PC has an open window of 1024 bytes 3)- Sun (for some strange reason) sends *two* packets: 1023 bytes then 1 byte 4)- PC misses the second (1 byte) packet (and hence fails to acknowledge that byte in the stream) 5)- Sun (for some strange reason) presumes that the 1 byte packet *has* been acknowledged; this leaves the connection in a permanently desynchronized state, since the PC won't let the Sun go ahead since (as far as the PC is concerned) Sun hasn't ever sent the 1 byte, yet the Sun will never retransmit the 1 byte since it thinks it was acknowledged (and has probably destroyed it's copy of that byte) 6)- 'telnet' close at this point succeeds in the PC->Sun direction since that direction is still synched up; usually this causes a retransmission of whatever will fit in the now empty (1K) window *except that one damn byte*; these retransmissions never succeed and eventually the Sun will reset the connection thinking the PC dead; if the PC catches the reset, the socket will be liberated, otherwise it will live forever " Note also that even though the PC represents the worst case (we have around), this behavior will eventually occur on all of the systems we have. We don't (unfortunately) have two Sun's to try it between. Someone out there who does and is interested can send me some mail and I will send some test scenarios to try. The second problem scenario has to do with offering the Sun a 1023 byte window (again from the same NRC report): " The problem had some similar trappings. This time it was FTP which was unable to receive certain files in ASCII mode from a Sun. On investigation, it proved to be the case that the problem was that (under certain data conditions), our FTP was asking for 1023 bytes of data rather the usual 1024. This seemed to hose the Sun right away as he promptly sent a malformed but effective reset packet. ... Note again that I tried this test with these same files with the old Sun and it's O/S and it did not fail. The new Sun does the same as the one at . " In this case, changing the maximum offered window made the problem go away (i.e. it's the value 1023, not oddness or something). In the first case, varying the window size didn't seem to matter in the long run. The "malformed but effective reset packet" contained a SYN flag and a TCP maximum segment size option with a maximum size of 0 bytes, in addition to the RST flag and appropriate sequence numbers. Neither of these problems existed in the old Sun 150 implementation. To paraphrase David Plummer: the world is a jungle and networking contributes many animals. Marc S. Dye Vice President, Research and Development Network Research Corporation via Eventual Express -> 923 Executive Park Drive Suite C Salt Lake City, UT 84117 U.S.A. or -> 2380 Rose Avenue Oxnard, CA 93030 U.S.A. via 'N' Bell Systems -> (801) 266-9194 or (805) 485-2700 via USENET -> ihnp4!nrcvax!msd {hplabs,sdcsvax}!sdcrdcf!psivax!nrcvax!msd ucbvax!calma!nrcvax!msd ARPANET -> calma!nrcvax!msd@UCBVAX.BERKELEY.EDU +----------------------------------------------------+ | *BADGES*? WE DON'T NEED NO STINKIN' BADGES!!! | +----------------------------------------------------+