Xref: utzoo comp.sys.dec:3639 comp.os.vms:28270 Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!zephyr.ens.tek.com!tektronix!sequent!upba!dsndata!unocss!dent From: dent@unocss.unomaha.edu (dent) Newsgroups: comp.sys.dec,comp.os.vms Subject: Help? Dead VAX 11/750 ... Message-ID: <3035@unocss.unomaha.edu> Date: 24 Jul 90 03:06:51 GMT Organization: U. of Nebraska at Omaha Lines: 51 Hello... The Student Chapter of ACM here at the Univ. of NE at Omaha owns a VAX 11/750, w/ 8 Megs RAM, Floating Point Accellerator, 2 RM05's, an RM03, a DMF32, and a DELUA. (also a TS11, plus other misc. parts) We've been running VMS on this system for about 4 weeks, until "suddenly" we started getting errors about a corrupted memory cache. The 750 would then start restarting itself randomly, dropping to the monitor '>>>' prompt with the PC register displayed, as well as error code 04 which is "Interrupt stack not valid or unable to read SCB". Sometimes the machine was able to re-run VMS for a little while, but eventually did it all again. Then, if that wasn't enough, the 750 refused to even give the monitor prompt when turned on. As it stands now, the machine prints one '%' on the console when you flip it on, and then hangs. As far as I know, it's supposed to print that 1st '%' when it starts the microcode check, and then either an error message (meaning the microcode is bad), or a 2nd '%' if it is good. Then it [normally] puts you in the monitor. All of these problems seemed to align chronologically with a board swap we had just performed: we took out a DZ11 board and replaced it with the DMF32 mentioned above (which does DMA, so it doesn't seem likely that it would contribute to an interrupt problem...) Because of the timing, however, we yanked the DMF32 back out, but the problem was still there. I was flipping through the "VAX Hardware Handbook" (published 1982) and noticed that: "Interrupting devices on the UNIBUS are directly vectored through the System Control Block (SCB)." so I wound up moving the UNIBUS terminator directly after the memory cards, to in effect remove all of the UNIBUS activity. No change. I put the terminator back in the last UNIBUS slot, and then took out all but 1 memory card. No change. I swapped the remaining memory card with each of the other 7 in turn; no change in any case. We're kind of at the end of our rope now; it really doesn't seem like UNIBUS has anything to do with the problem. Let me also add that we took out all of the CPU boards and reseated the socketed chips, also with no effect. (The 750 did sit in a warehouse for a while before UNO-ACM aquired it..) Does the 1-but-not-2 '%' indicate that the microcode test itself is faulty? Has anyone else run into this kind of "random reset" problem? Why should the problems all start after a few weeks of flawless performance? Any help that anyone would be able to offer would be /greatly/ appreciated by the members of UNO-ACM here! :-) -/ Dave Caplinger /--------------------------------------------------------- President, Student Chapter of ACM at the University of Nebraska at Omaha acmpres@zeus.unomaha.edu ..!uunet!unocss!dent ACMPRES@UNOMA1