Path: utzoo!attcan!uunet!dino!sharkey!msuinfo!midway!iitmax!thssdwv From: thssdwv@iitmax.IIT.EDU (David William Vrona) Newsgroups: comp.unix.i386 Subject: Re: "PANIC: kernel mode trap. Type 0x0000000E" msg in 386/ix 2.0.2 ????? Keywords: unix, hardware Message-ID: <3986@iitmax.IIT.EDU> Date: 19 Jul 90 01:58:27 GMT References: <9480@bunny.GTE.COM> <45326@ism780c.isc.com> Reply-To: thssdwv@iitmax.iit.edu (David William Vrona) Organization: Illinois Institute of Technology Lines: 43 In article <45326@ism780c.isc.com> darryl@ism780c.UUCP (Darryl Richman) writes: >In article <9480@bunny.GTE.COM> jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: >"My machine has been crashing about twice daily for the last week or so. >"The msg in the subject line shows up with all the register contents just >"before it crashes. I have contacted my vendor, they contacted ISC, and >"all they were able to tell me was that it the problem is a hardware >"error. Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and >"everything appears to be OK. Does anyone have any idea of what a type >"0x0000000E error means? This might help me to narrow down the >"alternatives. Any pointers will be appreciated. Thanks, > >You can do a bit of tracing yorself to see what is going on. A trap E >is a page fault--which usually means that there is a bad pointer being >followed in the kernel. You can discover what routine within the kernel >is causing the problem by noting the EIP value in the register dump, >and after rebooting, do "nm -vexp /unix | sort >/tmp/foo". Then edit >/tmp/foo and look for the first 5 digits or so of the EIP value...the >greatest address less than or equal to your EIP value is the routine >that was executing. > >An even easier way to do this is to configure your kernel >with the kernel debugger. When the panic occurs, you will drop into the >debugger. Type "stack" to see a stack backtrace. You will also see the >instruction that caused the fault. This will give you much more information >with which to use to get an answer out of your reseller, and ultimately, >ISC. > >A "hardware error" means nothing. Either your vendor misunderstood the >reply or hasn't pushed very hard on your behalf. Unix tends to be a >much harder test of the hardware than the vendor's diagnostics; we had >a case where a certain vendor was shipping cards that worked fine under >DOS and passed all of their tests just fine, but would never send an >interrupt; needless to say, Unix found this out quickly. When discussing >a problem like this, it is extremely important to pass along as much >information about your configuration as possible--all of the boards, >their interrupt and DMA numbers, how much memory, the make, model, and >geometry of the disks (if they are involved), whose motherboard, any >coprocessors, and so on. All of these things tend to interact. It's a hardware problem. Exact same thing happened to me. Took me two months to realize it was a noisy (electrically that is) power supply. Borrow a supply from a friend before you knock yourself out with all the other stuff.