Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!oliveb!pyramid!prls!philabs!micomvax!zap!fortin
From: fortin@zap.UUCP (Denis Fortin)
Newsgroups: comp.unix.microport
Subject: --- A System V/AT crash ---
Summary: At 10MHz, System V/AT crashes regularly at clkint1+9...
Message-ID: <445@zap.UUCP>
Date: 16 May 88 20:31:51 GMT
Reply-To: fortin@zap.UUCP (Denis Fortin)
Organization: (none), Montreal QC, Canada
Lines: 69
Posted: Mon May 16 16:31:51 1988

Here is a Microport System V/AT, hardware-related question for all of the
`crash dump' fans out there in Usenet-land...

	Remember a while ago, I was complaining how my Microport System V/AT
would crash when running at 10MHz on my machine?  Well, someone suggested the
following:

> To find the routine in the kernel which caused the panic, you do this:
> 
> 	nm -x /system5 >/tmp/xxxxx  (dump list of kernel to file)
> 
> Now, go looking for the address you panic'd at.  You put the 'cs' and 'ip'
> values together to get this number (code segment & instruction pointer).  
> In this case, you get 0x0208005807.
> 
> Find the routine which has the largest address LESS THAN the panic address.
> This is the routine which was executing when the system crashed.
> [...]
> If the routine is NOT 'rmsd' then please post the name of the routine 
> as it's probably a new one... and might give all us net.gurus some ideas!

Well, after many tribulations, I finally got around to trying my system
at 10MHz for a reasonnable length of time *without* the memory card in
it.  It runs much better than before (i.e. no more NMI message and it
doesn't crash after 30 seconds), but it still DOES seem to want to
crash all over.

After running for a while (anywhere between 5 minutes and an hour), I seem
to get the a crash dump very similar to the following fairly consistently.

	user=0x10
	cs=0x200 ds=0x220 es=0x220 ss=0x200 di=0x0 si=0x5BE0
	bp=0x37C bx=0x0 dx=0xA1 cx=0x0 ax=0x7 ip=0xEAF flags=0x246
	trap type 0xD
	err = 0x1173
	stack frame address = 2208B6A
	400, 8, 0, FFFF, 0, 0, 0, 3ff, 11, 200
	0, 88, 89e2, 220, 400, a, 0, 200, 0, 0
	0, 400, 11, 200, 0, aa, 8a62, 220, 88, 3
	3f9, 0, 1a9, 6, 3f9, 1, 204, 5, 3f9, 2
	26c, 6, 3f9, 3, 295, 1, 3f9, 4, 2af, 5
	3f9, 5, 2b7, 1, 3f9, 6, 2dc, 2, 3f9, 7
	307, 4, 3fa, 0, 30b, 6, 3fa, 1, 326, 1
	3fa, 2, 350, 0, 3fa, 3, 358, 0, 3fa, 4
	359, 0, 3fa, 5, 38c, 0, 3fa, 6, 392, 4
	3fa, 7, 3a9, 0, 3fb, 0, 3aa, 7, 3fb, 1

So...  I did what was suggested, and according to `nm -x /system5`, the
closest thing to 02000eaf is 02000ea6, and that is "clkint1" in "trap.s".

Does this give you net.gurus any idea what might be wrong?  

						Denis, hopeful.

PS. How does one go about looking at the code around "clkint1" in the
    kernel?  sdb?  crash?  adb isn't there...

PPS. Is there any way to force a "crash dump" that might be investigated
     with "crash" afterwards?

--
Denis Fortin
fortin@zap.uucp                         | Real-Time Systems Group
philabs!micomvax!zap!fortin             | CAE Electronics Ltd
fortin%zap.uucp@Larry.McRCIM.McGill.EDU | The opinions expressed above are mine