Path: utzoo!censor!geac!torsqnt!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!mit-eddie!uw-beaver!ubc-cs!kiwi!zaphod!parker From: parker@zaphod.Berkeley.EDU (Ross Parker) Newsgroups: comp.unix.ultrix Subject: System crashing.. HELP! Message-ID: <2080@kiwi.mpr.ca> Date: 6 Mar 90 12:40:37 GMT Sender: news@eric.mpr.ca Reply-To: parker@zaphod.Berkeley.EDU (Ross Parker) Lines: 154 System: Microvax-II with Emulex QD-32 disk controller and two Fujitsu Eagle disk drives. 13 Mb memory. Running Ultrix 3.0. System supports perhaps 15-20 interactive logins, and perhaps 20 PCs connected via Sun's PC-NFS. The PCs access files using standard NFS on the Microvax. Symptom: One user on a PC can try to bring up a particular file under WordPerfect (version 5.0 or 5.1) on the PC, and, without fail, cause the Microvax to instantly crash. This problem just started happening. No system changes, either hardware or software, have taken place for a number of months. The user does not have a problem with any other files, nor does any other user cause the system to die. The system is also used for NFS operations from other Vaxen, and from some Sun systems, and no problems occur. The crash symptoms are (on the console): Trap Type 9, code = 803771ff, pc = 80034ca0 panic: Protection fault and in the error log: ********************************* ENTRY 29. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS ERROR EVENT OS EVENT TYPE 104. CONTROLLER ERROR SEQUENCE NUMBER 0. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Mon Mar 5 13:44:28 1990 PST OCCURRED ON SYSTEM waters SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 ----- UNIT INFORMATION ----- UNIT CLASS ADAPTER/CONTROLLER UNIT TYPE UDA50A CONTROLLER NO. UNIT NO. 0. ERROR SYNDROME CONTROLLER ERROR ********************************* ENTRY 30. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS ERROR EVENT OS EVENT TYPE 200. PANIC SEQUENCE NUMBER 5. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Mon Mar 5 13:42:26 1990 PST OCCURRED ON SYSTEM waters SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 PANIC MESSAGE Protection fault ********************************* ENTRY 31. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS ERROR EVENT OS EVENT TYPE 109. EXCEPTION/FAULT SEQUENCE NUMBER 4. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Mon Mar 5 13:42:26 1990 PST OCCURRED ON SYSTEM waters SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 ----- UNIT INFORMATION ----- ERROR SYNDROME PROTECTION FAULT ********************************* ENTRY 32. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS OPERATIONAL EVENT OS EVENT TYPE 250. ASCII MSG SEQUENCE NUMBER 7. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Mon Mar 5 13:42:40 1990 PST OCCURRED ON SYSTEM waters SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 MESSAGE done ********************************* ENTRY 33. ********************************* ----- EVENT INFORMATION ----- EVENT CLASS OPERATIONAL EVENT OS EVENT TYPE 250. ASCII MSG SEQUENCE NUMBER 6. OPERATING SYSTEM ULTRIX 32 OCCURRED/LOGGED ON Mon Mar 5 13:42:39 1990 PST OCCURRED ON SYSTEM waters SYSTEM ID x08000000 SYSTYPE REG. x01010000 FIRMWARE REV = 1. PROCESSOR TYPE KA630 MESSAGE syncing disks... Now this certainly looks like a probable bad controller, right? Well, we've replaced the controller with a new one, and get an identical problem... down to identical register values in the register dump. We've also run DEC diagnostics and the system passes with no problems, other than (and this I'm mildly worried about) the disk controller... however, I believe the controller is failing because it's a non-DEC controller, and DEC's diags are expecting a KDA50. The controller (the new one) passed the vendor's diags, and the diags included scanning the disks. No problems were found anywhere. In addition, we can read and write any file on both disks locally (not via NFS) and no problems occur, so the problem is possibly related to NFS rather than to disk driver code or whatever. Perusing the resultant crash dumps has not given me much enlightenment, however, I'm not an expert at that, and have misplaced my list of magic incantations to have adb show anything useful. Perhaps someone can enlighten me? Care to bite, George? I'm sure you've done this numerous times. If anyone can shed any light on this, it'd be *much* appreciated. This Monday, the system went down about 7 times before this particular user called us to say that each crash happened exactly when she tried to access this file! Thanks, Ross Parker parker@mpre.mpr.ca (604)293-5495 uunet!ubc-cs!mpre!parker