Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!usc!brutus.cs.uiuc.edu!psuvax1!psuvm!w0l From: W0L@PSUVM.BITNET (Bill Lasher) Newsgroups: comp.sys.sgi Subject: Re: fsck Message-ID: <89313.153421W0L@PSUVM.BITNET> Date: 9 Nov 89 20:34:21 GMT Organization: Penn State University Lines: 56 Some of you may have been following the fsck question I posted last week. Thanks to help from several of you, including some people at SGI, I finally decided the REAL problem was our system administration. One of the people at SGI thought the following might be of interest to others, and suggested I post it. The original note follows: ========================================================================= Date: 9 November 1989, 14:16:04 EST From: Bill Lasher (814) 898-6391 W0L at PSUVM Subject: Re: fsck, init state 3 To: dunlap at sgi.sgi.com In-Reply-To: dunlap%bigboote.csd AT sgi.com -- Thu, 9 Nov 89 11:06:55 PST Our most recent problem (the RPC timeout) I think was caused by the way we implemented the nightly reboot. We scheduled them 5 minutes apart, figuring that would be enough time. I found out today that one machine was still in the process of restarting when the YP server he was communicating with started to reboot. This caused the system to hang. Rebooting did in fact clear things up, but it took some time. Part of the problem is that the time on each machine is not exactly the same (a diference of a couple of minutes). We are going to set all machines to the same time, and change the reboot interval to 10 minutes. I think we got thrown off the track because running fsck nightly changed the total time it took for the systems to reboot, and things just happened to work out O.K. Also, we probably weren't patient enough earlier to let reboot do it's thing; when reboot didn't work, we tried fsck, which did work because it took longer to finish up, and by the time it was done the network wasn't as busy (or something like that.) I think we were also in a hurry to get things fixed, and as a result got sloppy (ie, running fsck without unmounting, etc.). Some of our problems may come back, but we will handle each of them separately as they occur, and try to be more careful. I suspect some of the earlier problems (the full disks, hung spool queues) showed up because we were letting the systems run for a week at a time without rebooting, and things just got a little messy. We had planned from the beginning to have them reboot every night, but we had too many other things going on to get it implemented. We'll just take it from here and see what happens. Best regards, Bill ======================================================================== END OF ORIGINAL NOTE You may not follow all the details, but you probably get the general idea. I think it's a good example of what can happen when an experienced computer user gets his first UNIX/networked system. Bill "If I knew what I was doing, I wouldn't have had to ask the question!"