Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!rice!sun-spots-request
From: ehrhart@aai8.istc.sri.com (Tim Ehrhart)
Newsgroups: comp.sys.sun
Subject: Re: le: missed packet problem
Keywords: SunOS
Message-ID: <8903061521.AA05630@aai8.>
Date: 14 Mar 89 13:15:13 GMT
Sender: usenet@rice.edu
Organization: Sun-Spots
Lines: 49
Approved: Sun-Spots@rice.edu
Original-Date: Mon, 06 Mar 89 07:21:16 PST
X-Sun-Spots-Digest: Volume 7, Issue 198, message 4 of 13

> Well, trouble hit Friday when the client just up and died (went down to
> the monitor prompt '>').  Tried rebooting it ...
> 
> >b
> 
> EEPROM boot device ... le(0,0,0)
> Using IP Address 128.155.2.94 = 809B025E
> Booting from tftp server at 128.155.2.83 = 809B0253
> Downloaded 126056 bytes from tftp server.
> Using IP Address 128.155.2.94 = 809B025E
> le: missed packet
> le: missed packet
> le: missed packet
> No bootparam server responding; still trying
> le: missed packet
> le: missed packet

I experienced the same problems when I upgraded to 4.0 months ago. After
much head scratching and wire sniffing here is what I discovered:

We had some VMS/VAXen on the wire running both DECnet and TCP/IP. There
are various version of TCP/IP available for VMS, so your mileage may vary.
But nonetheless, most of them ~seem~ to be based on the PD version of RPC
from Sun. What appears to happen is that when the client is requesting his
bootparam server (which corresponds to an indirect RPC request from the
portmapper to bootparamd), the portmapper process running on the VAX sends
back the wrong response. If it can't satisfy the request, it should simply
NOT ANSWER, instead it sends back an RPC error message.  (I can't remember
exactly what the message was, it has been a while, but I think it was "RPC
service unavailable"). We have/had quite of few of these beasts, so the
poor diskless was inundated with bogus RPC replies from the VAXen. The
client didn't like this, so it proceeded to send ICMP messages back to the
VAXen ????. Just about at the timeout of the request, the appropriate file
server would FINALLY respond (about 9ms later), but the client timed out
his request, dropped the repsonse packet from the file server, which then
started the process all over again.

Try to prove this by isolating your client and it's file server from the
rest of the net and attempting the boot again. This is simple for me to do
because we make copious use of multi-port boxes. In lieu of this, get out
either tcpdump or etherfind and watch for all packets coming and going
to/from the affected client. It was AMAZING to watch how fast the VAXen
were pummeling the poor client (reply time was about ~1ms), then finally
about 9ms later the file server replied. In my case, the file server was a
Sun-4 on the same multi-port box right beside the client, and the VAXen
were on distant parts of our campus ethernet.

Tim Ehrhart			ehrhart@spam.istc.sri.com
SRI International