Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uwm.edu!zaphod.mps.ohio-state.edu!rpi!batcomputer!cornell!ken From: ken@gvax.cs.cornell.edu (Ken Birman) Newsgroups: comp.sys.isis Subject: Re: isis start up probs on SUN4 Message-ID: <38015@cornell.UUCP> Date: 2 Mar 90 01:22:17 GMT References: <1200@swbatl.UUCP> <37894@cornell.UUCP> Sender: nobody@cornell.UUCP Reply-To: ken@gvax.cs.cornell.edu (Ken Birman) Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 37 In article tc@oxtrap.aa.ox.com (Tse Chih Chao) writes: > ... >A suggestion for the isis group is to add the timestamps for >the proto's core dumps in the log file. Although the timestamp of the >file will help, but not always, if there are several dumps in the file. > >I have been working on crash problems for running isis on a DEC 3100 >(Ultrix V2.2). After some research/experiments, it now points >to Ultrix UDP related or name service problems (not an isis problem).... I should probably amplify on Tse Chih's comments. Her system has the odd property that non-ISIS messages are sometimes delivered to ISIS UDP sockets. This happens regardless of the port numbers, and seems to be due to a bug in the Ultrix system or, perhaps, its implementation of something call the TCP domain service. Tse Che discovered, somewhat painfully, that ISIS is not very immune to receiving random garbage on its input channels. With her help we have found some work-arounds for ISIS V2.0, but as she points out this one "feature" provoked quite a range of crashes -- sometimes ISIS couldn't allocate enough memory for the incoming "message", sometimes it couldn't reconstruct it, etc. Usually the shutdown is fairly graceful and hence there is no core image. I have never seen this on a non-Ultrix system, or on Ultrix on anything but the 3100 workstation. Two minor details: the message "Detect failure after: 60 seconds" is just telling you the setting of the "-f" argument to protos, or the default value for this parameter. And, protos logs do include timestamps. If a message is logged after a delay of more than 1 minute since the prior message, there will always be a line "... time is now xx:xx:xx". We'll give some thought to improving our logging facility, although not in time for the V2.0 beta release. Ken