Path: utzoo!attcan!uunet!wuarchive!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!cica!iuvax!uceng!krishnan
From: krishnan@uceng.UC.EDU (Ramaswamy Krishnan)
Newsgroups: comp.sys.hp
Subject: Re: 9000/370 Problems...
Message-ID: <5550@uceng.UC.EDU>
Date: 20 Jul 90 13:24:39 GMT
References: <13484@udenva.cair.du.edu>
Organization: University of Cincinnati
Lines: 148

In article <13484@udenva.cair.du.edu> news@udenva.cair.du.edu (netnews) writes:
> ...
> Since about March we have been expirencing system crashes.  Generally the
> system panics about a parity error ,dumps stuff to the console, then
> hangs.
>
> Our hp people here have replaced everything in the box, and we still have
> errors.  It seems to be crashing during a compile ( at least that what it
> was doing last... ).

Yes - similar symptoms to what we had here in May.  But since the error
mesgs you mention here seem generic for a system crash, I am not sure if
it is the same kind.

Our configuration : An 840 runing 7.0 with a 7963B 0.9Gig diskbox and 4
                    7935 (400Meg each).

It all started to happen one fine day in May - a couple of months after
we went to 7.0 - I had not changed anything much during that period.

Here is what happened :

Even as I was working, the system slowed down - and after a few minutes
of such slow activity, it came to a state where even my cursor wouldn't
move.  And after a couple of minutes the system rebooted itself.
The message in the adm file was similar to (sorry for a listing the whole
mesg - but I am doing so in the hope that some HP-UX guru there may use it) :

===============
Jul 16 12:05
trap type 15, pcsq.pcoq = 0.49434, isr.ior = 0.1c

PANIC:  please wait for core dump to complete.
@(#)9245XA HP-UX (sys.A.B7.00.3L/S800) #1: Mon Oct 30 17:59:05 PST 1989
panic: (display==0xb000, flags==0x0) Data segmentation fault

PC-Offset Stack Trace (read across, most recent is 1st):
stktrc: can't find rp
  0x000d0f78  0x000d1160  0x000d12ec  0x00082854  0x000790a8  0x00049434
End Of Stack

sync'ing disks (90 buffers to flush): 90 76 67 54 43 34 26 22 19 14 11 7 4 1
0 buffers not flushed
0 buffers still dirty

dumping 25165824 bytes to dev 0x207, offset 18326 ...
Dump successfully completed.
Beginning I/O System Configuration.
cio_ca0 address = 8
   hpib0 address = 0
      disc0 lu = 0 address = 0
      disc0 lu = 1 address = 1
      disc0 lu = 2 address = 2
      disc0 lu = 3 address = 3
   mux0 lu = 0 address = 1
   hpib0 address = 2
      lpr0 lu = 1 address = 0
      lpr0 lu = 0 address = 1
      tape1 lu = 0 address = 3
      tape1 lu = 1 address = 4
      lpr1 lu = 2 address = 5
      instr0 lu = 0 address = 6
      instr0 lu = 2 address = 2
   mux0 lu = 1 address = 3
   lan0 lu = 0 address = 4
   gpio0 lu = 0 address = 5
   hpib0 address = 6
      disc0 lu = 4 address = 0
      disc0 lu = 5 address = 1
      disc0 lu = 6 address = 2
      disc0 lu = 7 address = 3
   hpib0 address = 7
      lpr0 lu = 3 address = 1
      tape1 lu = 2 address = 3
      lpr1 lu = 4 address = 5
      instr0 lu = 1 address = 7
   mux0 lu = 2 address = 8
   mux0 lu = 3 address = 9
   mux0 lu = 4 address = 10
   mux0 lu = 5 address = 11
I/O System Configuration complete.
Configure called
Beginning Subsystem Initialization
   nsnsipc0 initialized
   nsrfa0 initialized
Subsystem Initialization Complete
Beginning Filesystem Initialization
   ufs initialized
   nfs initialized
Filesystem Initialization Complete
@(#)9245XA HP-UX (sys.A.B7.00.3L/S800) #1: Mon Oct 30 17:59:05 PST 1989
real mem = 25165824
lockable mem = 17342464
avail mem = 19243008
using 614 buffers containing 2514944 bytes of memory

===============

So basically it crashed because of some data segmentation fault and
rebooted itself.

Well, I found that when this happened every morning, the pathalias was
running - pathalias should have been done in 5 mins at night, but would
carry on till morning.  Then came a hint that pathalias might be indeed
the problem as it seemed to be stuck somewhere and was just hogging
memory.

So I replaced pathalias with a new version and the system stopped crashing.

Yes - I took a core dump along with the pathalias version we were running
and also the maps and shipped them to HP about 2 months back.  They are
yet to call us back.  Incidentally, the crash occurred again this week
(as the log above shows) and this time it was not pathalias - some one
was running a large program.

So I feel that it is something to do with memory utilization - not hardware.

Any HP-UX gurus listening and can shed some light (at least someone who
can make my confidence in HP support build up) ?

> The one thing that seems to happen 90% of the time after the crash the LED
> marked 1 on the 7963 flashes constantly.  Does this mean anything?

Hmm.. I did not notice that - but could it be just that the disk is not
clean and/or is getting fsck'd when rebooting?

> Anyway we are getting near the end of the rope on this box, we've replaced
> the power coming in ( Which HP keeps insisting thats our problem... )

I wouldn't spend a dime on that power stuff if I were you - it could be
another goose chase that the response center folks had to come up with.

Though we have had some help from the HP folks on this net at times, I
guess we haven't chanced into 'that right person' in the response center
yet who would boost my confidence that 'they know their bugs'.


> Anyone have any similar expirences???
> ----------------------
> Randy Welch               UUCP    :  ...!ncar!scicom!bldr!randy  (work)


Thanks in advance for any more light some HP-UX guru can shed on this.

--
Ramaswamy Krishnan				Krishnan@UC.EDU  (ARPA)
College of Engineering				uceng!krishnan   (UUCP)
Univ. of Cincinnati				krishnan@ucbeh   (BITNET)