Path: utzoo!attcan!utgpu!watmath!egvideo!edhew From: edhew@egvideo.UUCP (Ed Hew) Newsgroups: comp.unix.xenix Subject: Re: init's untimely death. Summary: Tracking down the init assassin Keywords: init, terminate, enable, disable Message-ID: <2045@egvideo.UUCP> Date: 24 Jun 89 04:00:00 GMT References: <1989Jun21.114506.1378@tapa.uucp> Reply-To: edhew@egvideo.UUCP (Ed Hew) Distribution: na Organization: A Box in the Basement, Kitchener, Ontario, Canada Lines: 90 I had originally replied to this via email, however it occurs to me that perhaps someone else may have similar problems, or better yet, have resolved them. In article <1989Jun21.114506.1378@tapa.uucp> larry@tapa.uucp (Larry Pajakowski) writes: > Perhaps someone can shed some light on a perplexing problem we are having. > We have a Compaq 386/20 running Xenix-386 2.3.1 with Excelan TCP/IP V3.5 > and Xenix-Net 1.2. I was in a similar (lack of) light several months ago, (shortly after our conversion to 2.3.1). The major difference was that it wasn't TCP/IP, it was uucp (kind of) causing me headaches. > > About once a week more or less init dies. After that of course the machine > slowly grinds to a halt and must be powered off. I now have a script running > periodically which checks for init and reboots after doing a ps if there is no > init. Ok that keeps it alive but why? My scenario was as follows: I'd leave this system just humming along and go to work. I'd return late that night and find my system ground to a halt. Nobody was cleaning up defunct processes and my process table was full. Hence the system was effectively dead. init had somehow been assasinated. Of course, I'd discover this after I logged on with my *non*root account, so I couldn't even do a proper shutdown. With no init, I have no getty, and can get no login. I'd log on on tty01; the original getty would at least let me do that and replace itself with my login shell, but, then.... log off to log on as root, and... well...... [arghhh! where's that switch? ....sure like fsck, ummhmmmmm]. RTFM says something like: "shutdown can only be run in the foreground by root". After a couple of weeks of fruitless testing and surmization (sp?), I turned the process accounting on. Well, let's be honest, I always had the proc accounting on, I just decided to look at it. 1/2 :-) Sure enough, init was exiting for some reason right when I had a cron task disable and enable the tty that had an attached uuxqt happening, processing news. Some background is required here. The disable/enable was a workaround to a problem whereby DTR wasn't (for some still unresolved reason) being raised after polling our host for news. So, we simply cron'd a script to disable/ enable the TBit tty every 15 minutes if nobody was on it at the time. That solved the (no DTR) problem, but then the above occured. The disable/ enable was assassinating init. Process accounting says so. Now we check to make sure uuxqt isn't running at that time as well. Haven't had a problem since. I can also tell you that the above results have been manually recreated on this site. Sometimes. It's not consistent. Arghhhh! There is a missing factor here. I don't know what it is. > I've talked both to SCO and Excelan. Neither has been able to help much. > It may be slightly worse under heavy TCP/IP load but then I've had it happen > on an idle machine. There have been power line glitch monitors on the power > and we have run diagnostics over a weekend with no indications of any problem. > The only other clue is 2 kernel panics over the last 3 months "Free inode > isnt't". In my case: A thought: I wonder if this could be related to the old problem in pre-2.2.x releases where the docs warned us that using a disable/enable sequence without separating them by at least a 1 minute interval was asking for trouble. All I can suggest is that you check out the above info; check out what your process accounting tells you. Find out what's happening when init dies, and prevent it from happening. If you ever find out *why* this happens, please email me. Right now I am still using a workaround. I'd rather find a *fix*. > I would appreciate hearing from anyone with some ideas either by email or > phone. Many Thanks. Hope this helps. > > Larry Pajakowski > Abbott Labs. ...!ddsw1!abtcser!larry 1-312-937-1153 --ed {edhew@egvideo.uucp} Ed. A. Hew Authorized SCO Technical Trainer Xeni/Con Corporation work: edhew@xenicon.uucp -or- ..!{uunet!}utai!lsuc!xenicon!edhew home: edhew@egvideo.uucp -or- ..!{uunet!}watmath!egvideo!edhew # I haven't lost my mind, it's backed up on floppy around here somewhere!