Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!psuvax1!rutgers!att!chinet!les
From: les@chinet.chi.il.us (Leslie Mikesell)
Newsgroups: comp.sys.att
Subject: Re: 3B2/600 questions
Keywords: RFS
Message-ID: <1989Nov29.051458.2014@chinet.chi.il.us>
Date: 29 Nov 89 05:14:58 GMT
References: <367@ai.etl.army.mil> <1989Nov15.155346.25197@eci386.uucp> <1262@atha.AthabascaU.CA> <1989Nov22.162738.10433@chinet.chi.il.us> <18366@bellcore.bellcore.com>
Reply-To: les@chinet.chi.il.us (Leslie Mikesell)
Distribution: na
Organization: Chinet - Chicago Public Access UNIX
Lines: 49

In article <18366@bellcore.bellcore.com> scj@navaho.cc.bellcore.com (Steve Johnson) writes:

>It seems that packets NOT
>destined for RFS or even our boxes *somehow* cripple our (nearly new,
>current software) 3B2/(600,700,1000*) machines into panic'ing (stream
>resource related panics).  Tuning is not an issue (for instance, /etc/crash
>shows no failures for strstat, no other errors either, either from crash
>or other daemons and logs).

Interesting.  We used to have the same problem when we had a mix of
Starlan 1.0 and 3.2 (URP/OSI) machines on the same net.  It seemed to
happen most often when we had the lan manager running (it watched
both protocols) and particularly when the 3B2's would access their
tape drives.  My guess is that the length of a packet is interpreted
incorrectly due to the mix of protocols and the kernel buffer pool
is overrun.

>>Anyway, you should not have to restart RFS because any single node
>>goes away.

>Agreed, you *should* not have to restart.  But, on the same LAN as
>mentioned above, *with* a secondary properly defined, our entire RFS
>network often comes down, HARD, when the primary server RFS crashes. 
>Additionally, ( serious, but we are :-) with RFS OVERALL) we often have
>to REBOOT the primary server after an RFS crash to gain any RFS sanity.

I saw something like that a few times under 1.0 starlan on the 3b2's,
mostly triggered by someone disconnecting a wire and leaving an
unterminated run.  Since installing 3.2 and changing to the newer
network hub units we haven't had that sort of problem (several months
now).  Rebooting the primary server isn't too serious if you have
set up a secondary machiner.

>Tests done in *isolation* from the rest of our normal LAN neighbors show
>*near complete* RFS stability.  We believe that RFS *just cannot cope*
>with the MANY protocols and other network traffic on the LAN.  Storms
>are sometimes, but not always an issue here.

I suspect that this relates more to the network drivers than RFS.

>I really do like the overall stability of AT&T 3B2 products, both
>hardware and software, but in case some offense is taken, I'm
>putting on asbestos shorts! ;-)

I'd like to seem more discussion of these kinds of real-world problems
out in the open.  Why should anyone be offended?

Les Mikesell
 les@chinet.chi.il.us