Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!uunet!sco!rogerk
From: rogerk@sco.COM (Roger Knopf 5502)
Newsgroups: comp.unix.xenix
Subject: Re: PANIC - Non-recoverable kernal page fault...
Keywords: sco xenix 2.2.3
Message-ID: <2821@scorn.sco.COM>
Date: 1 Mar 90 02:46:07 GMT
References: <77688@tut.cis.ohio-state.edu>
Sender: news@sco.COM
Reply-To: rogerk@sco.COM (Roger Knopf 5502)
Organization: The Santa Cruz Operation, Inc.
Lines: 78


In article <77688@tut.cis.ohio-state.edu> Mowgli Assor <mowgli@cis.ohio-state.edu> writes:
>Well, here it goes again! <Sigh> Recently, we have had a rash of page fault
>panics (4 in 5 days). We have not changed anything for weeks! So, can someone
>explain a few things to me? First off, what each of the various codes below
>stands for (I would presume PC is program counter, KSP is some form of stack
>pointer, but as for the rest - ?). I do not know much about the Intel arch-
>itecture, as far as what specific registers do & what the various traps stand
>for. Any information here would be appreciated.
 
You got PC and KSP right, usually (and for your purposes) none of the
rest are important. Only the driver writer would care. PC is really
the important one.

>The machine we are using is an IBM PS/2 Model 80, 6Meg RAM, ~100Meg HD, w/3
>Digiport smart modem boards. Thursday, we got the following error messages:
>
>Trap 0000000E in SYSTEM     error = 00000000
>   pc  = 00000020:0001D40D
>Panic - Non-recoverable Kernal Page Fault
>
>Trap 0000000E in SYSTEM     error = 00000000
>   pc  = 00000020:0001D40D
>Panic - Non-recoverable Kernal Page Fault
>
>Now, the fact that it seems to die with the PC in the same place each time
>makes me very suspicious. Of course, it is likely that only SCO can tell me
>where the OS is dying (as far as what program causes it).
 
You have discovered an important clue. You are in luck though because
you too can find out where it is dying:

1. Write down the PC (you did that).
2. Bring up your system (OK, I know this is obvious but....)
3. type the command "adb /xenix"
   You will get the adb prompt "* "
4. Type the command (use the offset from the PC in _your_ register dump):

	1d40d?ia

   You should get something like:

	_iostart+45 	ld ax,ax

   If it looks more like:

	00020:0001d40d	ld ax,ax

   then your kernel is stripped. Edit the file "/usr/sys/conf/link_xenix"
   and take the "strip xenix" line out, make and install a new kernel,
   wait for the problem to happen again and do this procedure.

What this tells you is that iostart is the routine it died in. With any
luck it will give be something that is recognizable and can localize
it to either the sco kernel or some add-on driver. If you can call 
SCO Support and tell them this up front, whoever you talk to will
love you forever. Makes it so much easier to figure out whats going on.

>Has anyone else ever had a problem remotely like this? As I said, we have not
>changed any of our hardware or software setup within the last month, & yet
>this only started happening last Thursday. Any help would be greatly ap-
>preciated!

Yeah, we had this on a _very_important_ production system in house. It
started when the air conditioning broke and cleared up after it was
fixed. Clearly HW related. That doesn't mean that it is always HW
related and especially when the PC is always the same. PC always the
same is almost always software.

Hope this helps,

Roger Knopf
SCO Consulting Services
---
-- 

"His potential clients were always giving him the business."
	--Robert Thornton