Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site gatech.CSNET Path: utzoo!watmath!clyde!burl!ulysses!gatech!spaf From: spaf@gatech.CSNET (Gene Spafford) Newsgroups: net.bugs.4bsd Subject: Re: mchk 2 --- tbuf error on 750 running 4.2 BSD Message-ID: <654@gatech.CSNET> Date: Thu, 25-Jul-85 11:55:14 EDT Article-I.D.: gatech.654 Posted: Thu Jul 25 11:55:14 1985 Date-Received: Fri, 26-Jul-85 23:48:19 EDT References: <83@zeta.UUCP> Reply-To: spaf@gatech.UUCP (Gene Spafford) Organization: The Clouds Project, School of ICS, Georgia Tech Lines: 46 Summary: In article <83@zeta.UUCP> jeb@zeta.UUCP (John Berry) writes: > >We are running VAX 11/750's with UNIX 4.2 BSD. We have just had DEC >install REV 7 of the L0003 board, which we hoped would clear up the >mchk 2 --- tbuf error problems. Well it has not. Can anyone out there >in network land give me any insight to what is happening. DEC cannot >find any problems when they run diagnostics. This is an old and frustrating problem. I've had it show up on at least 4 750's I've worked with. The problem is, indeed, with the L0003 board. Let me tell you how it has been explained to me (if anyone has a more detailed explanation, please let us know). DEC obtains chips for the L0003 board from a couple of different sources. I'm not sure if they subcontract the board out to another firm or not, but they end up with two different versions of the board which are identical in stated specs and (almost) identical in appearence. As far as acceptance goes, both versions of the board behave identically under VMS and all the regular field service diagnostics. HOWEVER, under Unix, due to the way certain things are done and timed, one version of the board will repeatedly generate tbuf parity faults that cannot be recovered from. The fix is to replace the board with a copy of the other version. Once we did that, our 750's in the lab which crashed an average of 10 times a day have only encountered one tbuf fault in 6 months. To get a good board may require many swaps and trials, because I have heard someone claim that you can't identify one of the bad boards except by unsoldering chips and looking at the lot numbers on the underside. I don't know the specific chips or how to identify which version of the board you have. Supposedly, this problem is well known in the Ultrix support group and some field service offices (along with the RA81 read/write board glitch and the Rev4/RL02 problem, and others) as one of the strange problems that only shows up when using Unix. Have your field service people contact the Ultrix support group. It is possible that the Ultrix group may even know of a supply of working L0003 boards for exactly this situation. Best of luck! -- Gene "4 months and counting" Spafford The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332 CSNet: Spaf @ GATech ARPA: Spaf%GATech.CSNet @ CSNet-Relay.ARPA uucp: ...!{akgua,allegra,hplabs,ihnp4,linus,seismo,ulysses}!gatech!spaf