Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!rice!sun-spots-request From: ehrhart@aai8.istc.sri.com (Tim Ehrhart) Newsgroups: comp.sys.sun Subject: Re: le: missed packet problem Keywords: SunOS Message-ID: <8903061521.AA05630@aai8.> Date: 14 Mar 89 13:15:13 GMT Sender: usenet@rice.edu Organization: Sun-Spots Lines: 49 Approved: Sun-Spots@rice.edu Original-Date: Mon, 06 Mar 89 07:21:16 PST X-Sun-Spots-Digest: Volume 7, Issue 198, message 4 of 13 > Well, trouble hit Friday when the client just up and died (went down to > the monitor prompt '>'). Tried rebooting it ... > > >b > > EEPROM boot device ... le(0,0,0) > Using IP Address 128.155.2.94 = 809B025E > Booting from tftp server at 128.155.2.83 = 809B0253 > Downloaded 126056 bytes from tftp server. > Using IP Address 128.155.2.94 = 809B025E > le: missed packet > le: missed packet > le: missed packet > No bootparam server responding; still trying > le: missed packet > le: missed packet I experienced the same problems when I upgraded to 4.0 months ago. After much head scratching and wire sniffing here is what I discovered: We had some VMS/VAXen on the wire running both DECnet and TCP/IP. There are various version of TCP/IP available for VMS, so your mileage may vary. But nonetheless, most of them ~seem~ to be based on the PD version of RPC from Sun. What appears to happen is that when the client is requesting his bootparam server (which corresponds to an indirect RPC request from the portmapper to bootparamd), the portmapper process running on the VAX sends back the wrong response. If it can't satisfy the request, it should simply NOT ANSWER, instead it sends back an RPC error message. (I can't remember exactly what the message was, it has been a while, but I think it was "RPC service unavailable"). We have/had quite of few of these beasts, so the poor diskless was inundated with bogus RPC replies from the VAXen. The client didn't like this, so it proceeded to send ICMP messages back to the VAXen ????. Just about at the timeout of the request, the appropriate file server would FINALLY respond (about 9ms later), but the client timed out his request, dropped the repsonse packet from the file server, which then started the process all over again. Try to prove this by isolating your client and it's file server from the rest of the net and attempting the boot again. This is simple for me to do because we make copious use of multi-port boxes. In lieu of this, get out either tcpdump or etherfind and watch for all packets coming and going to/from the affected client. It was AMAZING to watch how fast the VAXen were pummeling the poor client (reply time was about ~1ms), then finally about 9ms later the file server replied. In my case, the file server was a Sun-4 on the same multi-port box right beside the client, and the VAXen were on distant parts of our campus ethernet. Tim Ehrhart ehrhart@spam.istc.sri.com SRI International