Path: utzoo!attcan!uunet!samsung!zaphod.mps.ohio-state.edu!van-bc!ubc-cs!uw-beaver!cornell!ken From: ken@gvax.cs.cornell.edu (Ken Birman) Newsgroups: comp.sys.isis Subject: Re: isis start up probs on SUN4 Keywords: isis log file Message-ID: <37894@cornell.UUCP> Date: 28 Feb 90 15:18:29 GMT References: <1200@swbatl.UUCP> Sender: nobody@cornell.UUCP Reply-To: ken@gvax.cs.cornell.edu (Ken Birman) Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 37 In article <1200@swbatl.UUCP> jmd@swbatl.UUCP (03) writes: > > I have been using isis for several months without problems >on a sun 3/260 platforms however as we near closer to implentation i have ported >to the sun 4/390 servers and am getting the following start up probs I can't >seem to figure out. Could any of you take a look and point me in the >right direction. Below is the log file. help! > >****************LOG FILE ********************** >Mon Feb 26 13:32:54 1990 >ISIS release V1.2, June 1989 >Site is now coming up, site-id 13, isis_dir <./13.logdir> >Detect site-failure after: 60 secs >calvin (13/128): -- panic -- >isis monitoring process at this site has crashed! >... etc (remainder is not relevant to problem) The message "monitoring process at this site has crashed" means that the process called "isis", namely the one that starts the system up, either panicked or died with a core dump after starting protos (who made this log file, which looks healthy) and before telling it if the restart was partial or total. I have seem something like this recently from someone else, but not at Cornell. He actually had a core image from bin/isis that showed the system as having crashed in bcopy() called right after a gethostbyname call. The arguments to the bcopy were completely wrong. My impression was that SUN might have changed the data structure returned by gethostbyname, but that person didn't get back to me on the explanation of the crash; perhaps he discovered an error in his /etc/hosts file that explained the problem. It should be easy to fix this, since the bug can be localized to a single bcopy call (assuming your problem is the same problem). Odd that it doesn't happen at Cornell, though. I would like to fix this, so if you can figure out what went wrong please let me know. Or, we can track it down offline... Ken