Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!munnari!otc!metro!basser!plexus!physiol!johnd From: johnd@physiol.su.oz (John Dodson) Newsgroups: comp.unix.wizards Subject: Re: Ultrix1.2-uVaxII crashing - Help requested Message-ID: <59@physiol.su.oz> Date: Sun, 16-Aug-87 18:48:31 EDT Article-I.D.: physiol.59 Posted: Sun Aug 16 18:48:31 1987 Date-Received: Tue, 18-Aug-87 03:41:13 EDT References: <1988@batcomputer.tn.cornell.edu> Organization: Physiology Dept., Univ. of Sydney, NSW, Australia Lines: 61 Summary: long memory cables on uvaxII's ? & another problem In article <1988@batcomputer.tn.cornell.edu>, hurf@batcomputer.tn.cornell.edu (Hurf Sheldon) writes: > > I have been getting the appended messages preceding a crash on > a uVaxII running Ultrix1.2 with the following hardware: . . . > The only consistent things I see are the mser (memory system error > register), the caer (cpu error address) and the daer (dma error address) but never with the same value are they ! ... 'cos that would have indicated a bad board or location > The mser should be loaded with bits saying what the error is but I cannot > find explanations in Ultrix for what they are - BSD has ka630.h (thanks > Chris). The fact that the caer/daer are always the same makes me think > there is a dma i/o problem and that in turn points to a disk controller > problem or the dequna as the random times would seem to rule out the > video and the dhv's as the crashes don't correlate at all with their use. > > I would appreciate: > A; definitions of the terms in the error message- ie sumpar, etc read the KA630-AA CPU Module Users Guide (DEC ref. EK-KA630-UG-001) Architecture section. (don't know what "sumpar" is tho' !) > B; hints on where in the documentation to find out more as above > C; Any concrete interpretations of the data presented. when there are random memory errors I immediately suspect LONG Private Memory Interconnect cables... when I say long I mean they should be so short they will only just fit between the boards. (3cm between connectors seems the max length) This problem is particularly prevalent with OEM memories and early (NEC memory chips) versions of the KA630... at least that is "in my experience" ! > D; Suggestions on how to approach a problem like this as above WHILE I'M HERE... is anyone aware of a problem with early KA630's where the TOY clock after a power fail leaves the VRT bit set but clears the clock memory ? it means 4.3 comes up with a date near the epoch !(Jan 1970) and fsck rebuilds all the "SUMMARY" information. (we have added a check in the ka630.c file to check for a zero'ed clock & use file system time but it would be nice to fix it properly !) DEC Australia are currently charging $10,000 (Australian) for a KA630 board swap ! so getting it fixed that way is out of the question ! John Dodson ACSnet: johnd@physiol.su.oz.au most other places ! : seismo!munnari!physiol.su.oz!johnd