Path: utzoo!attcan!uunet!wuarchive!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!cica!iuvax!uceng!krishnan From: krishnan@uceng.UC.EDU (Ramaswamy Krishnan) Newsgroups: comp.sys.hp Subject: Re: 9000/370 Problems... Message-ID: <5550@uceng.UC.EDU> Date: 20 Jul 90 13:24:39 GMT References: <13484@udenva.cair.du.edu> Organization: University of Cincinnati Lines: 148 In article <13484@udenva.cair.du.edu> news@udenva.cair.du.edu (netnews) writes: > ... > Since about March we have been expirencing system crashes. Generally the > system panics about a parity error ,dumps stuff to the console, then > hangs. > > Our hp people here have replaced everything in the box, and we still have > errors. It seems to be crashing during a compile ( at least that what it > was doing last... ). Yes - similar symptoms to what we had here in May. But since the error mesgs you mention here seem generic for a system crash, I am not sure if it is the same kind. Our configuration : An 840 runing 7.0 with a 7963B 0.9Gig diskbox and 4 7935 (400Meg each). It all started to happen one fine day in May - a couple of months after we went to 7.0 - I had not changed anything much during that period. Here is what happened : Even as I was working, the system slowed down - and after a few minutes of such slow activity, it came to a state where even my cursor wouldn't move. And after a couple of minutes the system rebooted itself. The message in the adm file was similar to (sorry for a listing the whole mesg - but I am doing so in the hope that some HP-UX guru there may use it) : =============== Jul 16 12:05 trap type 15, pcsq.pcoq = 0.49434, isr.ior = 0.1c PANIC: please wait for core dump to complete. @(#)9245XA HP-UX (sys.A.B7.00.3L/S800) #1: Mon Oct 30 17:59:05 PST 1989 panic: (display==0xb000, flags==0x0) Data segmentation fault PC-Offset Stack Trace (read across, most recent is 1st): stktrc: can't find rp 0x000d0f78 0x000d1160 0x000d12ec 0x00082854 0x000790a8 0x00049434 End Of Stack sync'ing disks (90 buffers to flush): 90 76 67 54 43 34 26 22 19 14 11 7 4 1 0 buffers not flushed 0 buffers still dirty dumping 25165824 bytes to dev 0x207, offset 18326 ... Dump successfully completed. Beginning I/O System Configuration. cio_ca0 address = 8 hpib0 address = 0 disc0 lu = 0 address = 0 disc0 lu = 1 address = 1 disc0 lu = 2 address = 2 disc0 lu = 3 address = 3 mux0 lu = 0 address = 1 hpib0 address = 2 lpr0 lu = 1 address = 0 lpr0 lu = 0 address = 1 tape1 lu = 0 address = 3 tape1 lu = 1 address = 4 lpr1 lu = 2 address = 5 instr0 lu = 0 address = 6 instr0 lu = 2 address = 2 mux0 lu = 1 address = 3 lan0 lu = 0 address = 4 gpio0 lu = 0 address = 5 hpib0 address = 6 disc0 lu = 4 address = 0 disc0 lu = 5 address = 1 disc0 lu = 6 address = 2 disc0 lu = 7 address = 3 hpib0 address = 7 lpr0 lu = 3 address = 1 tape1 lu = 2 address = 3 lpr1 lu = 4 address = 5 instr0 lu = 1 address = 7 mux0 lu = 2 address = 8 mux0 lu = 3 address = 9 mux0 lu = 4 address = 10 mux0 lu = 5 address = 11 I/O System Configuration complete. Configure called Beginning Subsystem Initialization nsnsipc0 initialized nsrfa0 initialized Subsystem Initialization Complete Beginning Filesystem Initialization ufs initialized nfs initialized Filesystem Initialization Complete @(#)9245XA HP-UX (sys.A.B7.00.3L/S800) #1: Mon Oct 30 17:59:05 PST 1989 real mem = 25165824 lockable mem = 17342464 avail mem = 19243008 using 614 buffers containing 2514944 bytes of memory =============== So basically it crashed because of some data segmentation fault and rebooted itself. Well, I found that when this happened every morning, the pathalias was running - pathalias should have been done in 5 mins at night, but would carry on till morning. Then came a hint that pathalias might be indeed the problem as it seemed to be stuck somewhere and was just hogging memory. So I replaced pathalias with a new version and the system stopped crashing. Yes - I took a core dump along with the pathalias version we were running and also the maps and shipped them to HP about 2 months back. They are yet to call us back. Incidentally, the crash occurred again this week (as the log above shows) and this time it was not pathalias - some one was running a large program. So I feel that it is something to do with memory utilization - not hardware. Any HP-UX gurus listening and can shed some light (at least someone who can make my confidence in HP support build up) ? > The one thing that seems to happen 90% of the time after the crash the LED > marked 1 on the 7963 flashes constantly. Does this mean anything? Hmm.. I did not notice that - but could it be just that the disk is not clean and/or is getting fsck'd when rebooting? > Anyway we are getting near the end of the rope on this box, we've replaced > the power coming in ( Which HP keeps insisting thats our problem... ) I wouldn't spend a dime on that power stuff if I were you - it could be another goose chase that the response center folks had to come up with. Though we have had some help from the HP folks on this net at times, I guess we haven't chanced into 'that right person' in the response center yet who would boost my confidence that 'they know their bugs'. > Anyone have any similar expirences??? > ---------------------- > Randy Welch UUCP : ...!ncar!scicom!bldr!randy (work) Thanks in advance for any more light some HP-UX guru can shed on this. -- Ramaswamy Krishnan Krishnan@UC.EDU (ARPA) College of Engineering uceng!krishnan (UUCP) Univ. of Cincinnati krishnan@ucbeh (BITNET)