Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!amdcad!ames!mailrus!tut.cis.ohio-state.edu!ukma!rutgers!ucla-cs!admin.cognet.ucla.edu!casey
From: casey@admin.cognet.ucla.edu (Casey Leedom)
Newsgroups: comp.sys.apollo
Subject: Re: some questions for the gurus.
Message-ID: <16020@shemp.CS.UCLA.EDU>
Date: 16 Sep 88 10:12:43 GMT
References: <8809141502.AA01500@testnode.mit.edu>
Sender: news@CS.UCLA.EDU
Reply-To: casey@cs.ucla.edu (Casey Leedom)
Organization: UCLA
Lines: 40

In article <8809141502.AA01500@testnode.mit.edu> krowitz@testnode.MIT.EDU
 (David Krowitz) writes:
> Why would a random user *need* to shut down a node anyhow?

  Opps!  I suddenly realize I may have been arguing on the wrong side all
this time ... :-) I shut down my node sometimes two or three times a day
as it gets locked in some weird state or another.  A typical case is our
gateway (also an Apollo) goes down and I have several telnet/rsh
connections running through it.  For some reason this really screws my
node up and I'm forced to reboot.  (This is happening a lot recently - I
think our Apollo gateway is running out of mbufs or something similar.)

  If I couldn't shut my machine down I'd be in a tough spot.  I have the
root password, but what about all the other users?  Since I'm Mr. Support
(can you really believe that? :-)), I'd be getting calls constantly to
come over and reboot so-n-so's node.  Ack!!!

  Seriously, I fully agree with David that shut should be reserved for
root people just to prevent accidents.  However, since we do need to
reboot the nodes so often ...  This need may go away when we bring up
SR10 (we just got our tape a couple of days ago - yeah!), but since I
have the hanging problem on broken TCP connections with TCP3.1 and as far
as I know SR10's TCP is identical, I doubt if it will go away entirely ...

  I'll certainly endorse David's suggestions that shut warn you about
things like:

> 1) processes CRP in from another node
> 2) diskless partners that were currently booted off the disk
> 3) files that were opened from other nodes
> 4) if the node is a gateway

Casey

P.S.  Does anyone know exactly why the broken TCP connections hang a
    node and what can be done about them?  Or even what to do about an
    Apollo node being used as a gateway that seems to be reacting
    unfavorably to increased usage (but it should be pointed out that
    this increased usage hasn't even begun to make the gateway run
    slowly, etc.).