Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site gatech.CSNET Path: utzoo!linus!gatech!spaf From: spaf@gatech.CSNET (Gene Spafford) Newsgroups: net.bugs.4bsd Subject: Re: mchk 2 --- tbuf error on 750 running 4.2 BSD Message-ID: <725@gatech.CSNET> Date: Thu, 1-Aug-85 01:15:01 EDT Article-I.D.: gatech.725 Posted: Thu Aug 1 01:15:01 1985 Date-Received: Thu, 1-Aug-85 08:09:55 EDT References: <83@zeta.UUCP> <654@gatech.CSNET> <2496@sun.uucp> Reply-To: spaf@gatech.UUCP (Gene Spafford) Distribution: net Organization: The Clouds Project, School of ICS, Georgia Tech Lines: 42 Keywords: microcode L0003 750 DEC In article <2496@sun.uucp> dcmartin@sun.UUCP (David C. Martin) writes: >Okay, I will. I already mailed John, but perhaps this could be rehashed >one more time. The problem does lie in the L0003 board, but the solution >is easy. VMS has microcode to alleviate these parity problems, and >using the /boot program which reads microcode off the disk, the problem >can be easily solved. Mike Karels wrote up a patch and we have been running >it at UC Berkeley for quite some time with favorable results. If there is >sufficient need, I will dig this up for those of you who need it, the microcode >loading program was previously posted to the NET, so check your archives for >that. > Nope, that isn't the whole fix. The microcode fix only cures about 1/4 to 1/3 of the tbuf crashes (from our experience with the 3 750s in our lab). I installed the microcode-loading boot just about a week after the machines came in, and it didn't cure the problem. The new microcode fixes a different bug that causes tbuf faults. Also, before anyone posts something about how the whole thing can be cured by a patch to the machine check processing code -- I know about that patch too, and it doesn't fix the problem. To repeat, the problem is a well known HARDWARE problem, and if your field service people don't believe it, tell them to call the Ultrix support center for confirmation; everybody there should know all about the problem. Most of the old boards with the bad lot of chips (I have been told that the only way to identify some of them is to unsolder the chips and read the lot numbers off the bottom) have been replaced or installed in VMS systems where the problem will go unnoticed. Unfortunately, some field service people don't know about the problem, or blame it on Unix (because they don't understand). One site I know of had the field engineer swap out the L0003 board twice, and the problem didn't go away. He claimed that it had to be Unix, and as a non-supported product he was not responsible for anything else. The problem was that the two boards he swapped out were spares that had been sitting at the local office for months, and they had the faulty chips. Don't let this happen to you! -- Gene "4 months and counting" Spafford The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332 CSNet: Spaf @ GATech ARPA: Spaf%GATech.CSNet @ CSNet-Relay.ARPA uucp: ...!{akgua,allegra,hplabs,ihnp4,linus,seismo,ulysses}!gatech!spaf