Xref: utzoo comp.sys.dec:4383 comp.unix.ultrix:5140 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!sdd.hp.com!decwrl!bacchus.pa.dec.com!shodha.enet.dec.com!alan From: alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) Newsgroups: comp.sys.dec,comp.unix.ultrix Subject: Re: 8800 crashing way too often Summary: Broken VAX 8800. Message-ID: <1908@shodha.enet.dec.com> Date: 31 Oct 90 15:41:12 GMT References: Followup-To: comp.sys.dec Organization: Digital Equipment Corp. - Colorado Springs, CO. Lines: 100 In article , stergios@portia.Stanford.EDU (Stergios) writes: > > [ Customer has a VAX 8800 crashing very frequently. ] I have a VAX 8800 that crashed 96 days ago. That's the last time it was down. The time before that was 80 days. The I/O configuration is three VAXBIs with two KDB50s (2 RA90s each) and CIBCA with HSC70 and a bunch of disks. There's a DEBNI and DMB32 in there somewhere. This kind of uptime seems to be typical for my system. Use it for comparison purposes. > > Quite a number of dec people have and still are looking into the > problem. Every board has been replaced, even a new bi bus installed. > dec software engineering is leaning towards a problem in the mscp > code. Is it the same error each time, a different one? Which one Panic, machine check or "it stops". What version of ULTRIX are you running? If V4.0 has you installed and booted the mandatory upgrade? Any non-DEC devices on the VAXBI or KDB50s? Is there a UNIBUS on the system? Does it have anything important on it? Could it be replaced by a native VAXBI device? > > Weve installed and ran 8 different dec supplied debuggers inside the > kernel. Each one never tells what the problem is, only what the > problem is not. Progress, I suppose. > > It originally took a couple months to escalate the problem to the > point where we got attention. Now we have attention to the point of > twice weekly meetings with dec sales staff regarding our 8800 > crashing. lots-o-fun, but we still have a poorly performing machine. A couple of months feels to long for me, but it depends on the situation. > > There is talk of replacing out kdb50's with HSE's in the hope that the > problem will disappear. This seems reasonable, I guess, but sounds > like a desperation move at this point. Find the problem first. There is one out there somewhere and it is findable. > > Now we are starting to talk replacement systems (this is another story > all together, probably worse, and I wont air that kind of laundry in > public) and dec is pushing a 5500 at us. I dont think the 5500`s > q-bus is going to take the beating our 8800 does. we are currently > running a 5400 as an optional machine to the 8800, and the poor little > thing is choking. I refuse to install ada and a number of other > packages on it becuase of its performance so far under our > environment. This does not make our clients any happier: a machine > not runinng the necessary software is not any better than a crashed > machine, and we have plenty of both. Actaully most of the interesting I/O on a DECsystem 5500 will stay off the Q-bus unless you insist upon using KDA50s for most of the disks. A couple of gigabyte SCSI disks and DSSI disks should be very impressive. A VAX 8800 is good for moving bits between disk and memory, but a well configured DECsystem 5500 should be able to do better. You'll need more memory to make up for the VAX to RISC switch. > > Are there any other buses or solutions available on the 5500? I'm > asking here cause I've already been told "there is this neat way to > hook up a ra92 as a swap disk avoiding the qbus that gives an extra > M/s" by the sales types. An extra M/s over the qbus is not going to > cut it for us. There are three places to connect disks to a DECsystem 5500; one or more KDA50s on the Q-bus, the DSSI adapter and the SCSI adapter. The only place >>>I<<< know of to connect an RA{anything} is the KDA50. Find out what your sales critter is talking about. If you go to a DECsystem 5500 you'll almost certainly want to switch from the RA{anything} to RFs or RZs or least put move some of the I/O load off the RAs. > > What good is a maintenance contract? Are we being too lenient with DEC > by letting them drag this out as far as they have? I'll put it this way. You've been very patient. I wouldn't have been that patient. Of course it also depends on what level of support you have. An 8 hour a day, 5 days a week Basic support contract is a very different beast from 24x7 DECsupport. Each contract has time limits for how long things are allowed to "drag out". I don't think any of them are months though. > > Any and all suggestions welcome. > Tell us more about the errors in the hopes that we might recognize the problem from previous experience. > sm > stergios@jessica.stanford.edu -- Alan Rollow alan@nabeth.enet.dec.com