Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!gem.mps.ohio-state.edu!tut.cis.ohio-state.edu!bloom-beacon!eru!luth!sunic!mcsun!cernvax!achille From: achille@cernvax.UUCP (achille petrilli) Newsgroups: comp.sys.apollo Subject: Doamin on Ethernet problem Keywords: dn3500 dn10000 ethernet dds crash Message-ID: <1127@cernvax.UUCP> Date: 20 Oct 89 11:01:27 GMT References: <119@bnrgate.bnr.ca> Reply-To: achille@cernvax.UUCP (achille petrilli) Organization: CERN European Laboratory for Particle Physics, CH-1211 Geneva, Switzerland Lines: 37 Hi there, we are experiencing lot of problems on ours, ethernet based, Apollos. The problem has been seen both on 3500 and 3000 with ethernet as primary (and only in most cases) network. The node will loose contact with the network, both at DDS and tcp/ip level, rtstat -dev shows enormous numbers for 'no resources', some 20000 per second (yes, twenty thousand !), but the node is not receiving even 20 per second (we checked that with an ethernet analyzer). We are running sr10.1 on one of the nodes we've been investigating more. There some 30 machines on ethernet mostly running 9.7, plus 2 dn10000 (sr10.1) and 1 3500 (sr10.1) and 1 3000 (sr10.1, secondary network). We traced down the problem to be related to dn3xxx to dn10k interactions. A way to reproduce the problem, 100 %, is to do from the dn3xxx: ls -l //dn10k This will slowly start telling you that does not find some directories (that are there) and the number of 'no resources' will skyrocket. Now the dn3xxx is gone. The dn10k is instead perfectly happy. Has anybody seen that ? Is there any patch available ? Our SSR cannot reproduce the problem in their place (don't have a dn10k and they run on ring) and we are a little bit unconfortable in letting them try these things on our net :-) The work-around we've found is to NOT access any dn10k from dn3xxx ethernet based nodes (which of course defeats the whole purpose of networking). For the time being it's OK given the small number of dn10ks and of sr10 machines, but what can we do in a few months time when everybody will go sr10 ? Help !!! Thanks in advance, Achille Petrilli Cray & PWS Operations