Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!oliveb!pyramid!prls!philabs!micomvax!zap!fortin From: fortin@zap.UUCP (Denis Fortin) Newsgroups: comp.unix.microport Subject: --- A System V/AT crash --- Summary: At 10MHz, System V/AT crashes regularly at clkint1+9... Message-ID: <445@zap.UUCP> Date: 16 May 88 20:31:51 GMT Reply-To: fortin@zap.UUCP (Denis Fortin) Organization: (none), Montreal QC, Canada Lines: 69 Posted: Mon May 16 16:31:51 1988 Here is a Microport System V/AT, hardware-related question for all of the `crash dump' fans out there in Usenet-land... Remember a while ago, I was complaining how my Microport System V/AT would crash when running at 10MHz on my machine? Well, someone suggested the following: > To find the routine in the kernel which caused the panic, you do this: > > nm -x /system5 >/tmp/xxxxx (dump list of kernel to file) > > Now, go looking for the address you panic'd at. You put the 'cs' and 'ip' > values together to get this number (code segment & instruction pointer). > In this case, you get 0x0208005807. > > Find the routine which has the largest address LESS THAN the panic address. > This is the routine which was executing when the system crashed. > [...] > If the routine is NOT 'rmsd' then please post the name of the routine > as it's probably a new one... and might give all us net.gurus some ideas! Well, after many tribulations, I finally got around to trying my system at 10MHz for a reasonnable length of time *without* the memory card in it. It runs much better than before (i.e. no more NMI message and it doesn't crash after 30 seconds), but it still DOES seem to want to crash all over. After running for a while (anywhere between 5 minutes and an hour), I seem to get the a crash dump very similar to the following fairly consistently. user=0x10 cs=0x200 ds=0x220 es=0x220 ss=0x200 di=0x0 si=0x5BE0 bp=0x37C bx=0x0 dx=0xA1 cx=0x0 ax=0x7 ip=0xEAF flags=0x246 trap type 0xD err = 0x1173 stack frame address = 2208B6A 400, 8, 0, FFFF, 0, 0, 0, 3ff, 11, 200 0, 88, 89e2, 220, 400, a, 0, 200, 0, 0 0, 400, 11, 200, 0, aa, 8a62, 220, 88, 3 3f9, 0, 1a9, 6, 3f9, 1, 204, 5, 3f9, 2 26c, 6, 3f9, 3, 295, 1, 3f9, 4, 2af, 5 3f9, 5, 2b7, 1, 3f9, 6, 2dc, 2, 3f9, 7 307, 4, 3fa, 0, 30b, 6, 3fa, 1, 326, 1 3fa, 2, 350, 0, 3fa, 3, 358, 0, 3fa, 4 359, 0, 3fa, 5, 38c, 0, 3fa, 6, 392, 4 3fa, 7, 3a9, 0, 3fb, 0, 3aa, 7, 3fb, 1 So... I did what was suggested, and according to `nm -x /system5`, the closest thing to 02000eaf is 02000ea6, and that is "clkint1" in "trap.s". Does this give you net.gurus any idea what might be wrong? Denis, hopeful. PS. How does one go about looking at the code around "clkint1" in the kernel? sdb? crash? adb isn't there... PPS. Is there any way to force a "crash dump" that might be investigated with "crash" afterwards? -- Denis Fortin fortin@zap.uucp | Real-Time Systems Group philabs!micomvax!zap!fortin | CAE Electronics Ltd fortin%zap.uucp@Larry.McRCIM.McGill.EDU | The opinions expressed above are mine