Path: utzoo!bnr-vpa!bnr-fos!bigsur!bnrgate!is2!marmen From: marmen@is2.bnr.ca (Rob Marmen 1532773) Newsgroups: comp.sys.apollo Subject: Re: Doamin on Ethernet problem Summary: Some suggestions Keywords: dn3500 dn10000 ethernet dds crash Message-ID: <128@bnrgate.bnr.ca> Date: 23 Oct 89 19:58:26 GMT References: <119@bnrgate.bnr.ca> <1127@cernvax.UUCP> Sender: news@bnrgate.bnr.ca Lines: 38 In article <1127@cernvax.UUCP>, achille@cernvax.UUCP (achille petrilli) writes: > The node will loose contact with the network, both at DDS and tcp/ip level, > rtstat -dev shows enormous numbers for 'no resources', some 20000 per second > (yes, twenty thousand !), but the node is not receiving even 20 per second > (we checked that with an ethernet analyzer). > > We traced down the problem to be related to dn3xxx to dn10k interactions. > A way to reproduce the problem, 100 %, is to do from the dn3xxx: > > ls -l //dn10k > > This will slowly start telling you that does not find some directories (that > are there) and the number of 'no resources' will skyrocket. > Now the dn3xxx is gone. The dn10k is instead perfectly happy. > > Achille Petrilli > Cray & PWS Operations I would check the following: 1) Ethernet microcode revision level. You should be running a version no earlier than March of this year. The previous micocode was very buggy. The code is stored in /sys/ethernet8_microcode. 2) Does the number of crc and misalignment errors skyrocket as well? If so, then it may be microcode, or you have a bad connection between the two machines. I have seen drops (using utp) which technically checkout o.k., but because of a loose wire or connection, will generate lots of bad packets. rob... -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- | Robert Marmen marmen@bnr.ca OR | | Bell Northern Research marmen%bnr.ca@cunyvm.cuny.edu | | (613) 763-8244 My opinions are my own, not BNRs |