Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!uflorida!helios.iec.ufl.edu!dkf From: dkf@helios.iec.ufl.edu (Dan FitzPatrick) Newsgroups: comp.sys.tahoe Subject: HCX-9: "df" dumps core w/ "Illegal instruction" Message-ID: <23515@uflorida.cis.ufl.EDU> Date: 11 Jun 90 20:24:21 GMT Sender: news@uflorida.cis.ufl.EDU Reply-To: dkf@helios.iec.ufl.edu (Dan FitzPatrick) Organization: UF Integrated Electronics Center Lines: 114 SUMMARY OF PROBLEM: Commands such as "df" and "w" result in the following message, some commands followed by a core dump: machine% w Illegal instruction "vmstat" locks the console. Other than these *minor* problems, the system is happily chugging away performing its file and mail server duties. System: HCX-9 running HUX/UX 3.0C SUMMARY OF CORRECTIVE STEPS TAKEN: 1) The disk drive with the root file system had experienced some corruption recently. It was first assumed that this possibly had corrupted some of the /dev entries. The drive was re-formatted and reloaded with a known working version of / and /usr. The problem persisted. 2) Run the HCX System Level Tests (specifically sys401). The result of these diagnostics were: The "sys401" program came up with 63 errors. 62 of which had the same "Illegal Instruction" message - no test diagnostic message, i.e., the test exited before that point. However, the "fpp3" test, Exited with a "data compare error" and identified the probable source of the failure as the FPP hardware. It was not able to distinguish between the Floating Summ (FS) or Floating Multiply (FM) boards. The system was rebooted, paying careful attention to the console messages and the following flashed by: FPP POC dsk(4,0,0,0)/fppoc ? CP FPP POC error 0004 So, I guess this kinda pinpoints some problems with either the FS or FM boards of the FPP hardware because they not passing the power-on-confidence checks. However, the Console Processor Reference manual states that when this test fails, the CP assumes the FPP hardware does not exist (implies that the FPP hardware is disabled). This might also imply that the only way to detect FPP hardware problems, other that running diagnostics, is by noting the above message on full boots or by sensing that the system was running a bit sluggish. There being logical conflicts, proceed a bit further to step number... 4) Run the HCX CPU and Memory Standalone Diagnostics tests - actually all the tests in the "fall_s" script. The results here were similar: The /fppoc test completed with an Error Code (on the control board) of 0x53 which implies a error with single precision floating point mulitplies (the actual LED values top-to-bottom were 10100011 to avoid interpretation/(documentation) error which indicates a bit order of 45673210 top-to-bottom). OK, so the FPP hardware at this point would be highly suspect. But some vague areas remain, so go one more step... 5) Physically remove the FPP hardware, and for added measure disable the FPP hardware with the "y100" Console Processor command. Rerun the HCX CPU and Memory Standalone Diagnostics tests, this time using the "all_s" script which does not run any FPP hardware diagnostics. Assumption: Removing the FPP hardware required no setting of jumpers, dip switches, or whatever. This was essentially verified with the HCX Processor System Installation Manual. Well, this time all the tests passed with flying colors. Went to full boot the system and it comes up successfully but the problem STILL REMAINS. QUESTIONS: 1) Is only physically removing the FPP hardware all that is required? i.e., the installation manual indicates no additional steps for the installation of these optional products, so removal should be just as easy, correct? I am assuming here that on a cold boot, the system actually tests for the presence of the hardware and enables it through the completion of a successful test. 2) If the FPP hardware is not suspect, then what would be causing the diagnostics to indicate that it was? I would (like to) assume that the standalone diagnostics tests that must be passed prior to those that test the FPP hardware would rule anything else like this out. 3) Where is the actual source of the message "Illegal Instruction" I have run strings on the OS and did not find it here. However, the System Level tests did identify it as a SIGILL signal. I anyone has had similar experiences with this or other Tahoe machines, or have any advice, I would very much appreciate hearing from you. Thanks in advance. --Dan -- Dan FitzPatrick dkf@iec.ufl.edu 339 Larsen Hall, Integrated Electronics Center University of Florida, Gainesville, FL 32611 (904) 392-8935