Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!lll-lcc!lll-crg!topaz!harvard!cmcl2!philabs!nyit!rick From: rick@nyit.UUCP (Rick Ace) Newsgroups: net.unix-wizards Subject: Re: strange problems (looking for help) Message-ID: <231@nyit.UUCP> Date: Mon, 28-Apr-86 10:16:13 EDT Article-I.D.: nyit.231 Posted: Mon Apr 28 10:16:13 1986 Date-Received: Fri, 2-May-86 06:59:04 EDT References: <279@entropy.UUCP> Organization: NYIT Computer Graphics Lab., Old Westbury, N.Y. Lines: 79 > I wonder if anyone recognizes the following symptoms as symptoms of > something concrete I can try to fix. We are running 4.3BSD on a > VAX11/785. The disks are 3 RA81s on a single UDA. The uda device > driver is version 6.12 from Berkeley (9/16/85) which seems to be equal > to or derived from a DEC driver from January 84. I am not getting any > kernel error messages at all. Here is symptom number 1: > > % ls -l data > -rw-r--r-- 1 pcraig 4480000 Apr 17 11:19 data > > % cmp data data > data data differ: char 1777665, line 30650 > > % cmp data data > data data differ: char 1654785, line 28531 > > % cmp data data > data data differ: char 1683457, line 28955 ... > All of our symptoms could be explained by bad reads. That is, if we > don't always get the same data off the disk when we read it we would > get the symptoms we're getting. However, we have never gotten any sort > of disk read error messages on the console or anywhere else. Thanks. > > Steve Hubert > Dept. of Stat., U. of Wash, Seattle > {decvax,ihnp4,ucbvax!lbl-csam}!uw-beaver!entropy!hubert > hubert%entropy@uw-beaver.arpa Sounds like flaky hardware. Trouble is figuring out which piece of gear is the culprit. Here are some ideas: 1. The UDA50 is sick. See if Field Service will swap it for a spare and try your experiments again. 2. Another peripheral on the UNIBUS with the UDA50 is misbehaving and corrupting the data transfer between the UDA50 and the UBA. Try your experiment after removing all UNIBUS devices except the UDA50 (remember to install grant cards and NPG jumpers where necessary). I've seen a malfunctioning UNIBUS device make trouble for its neighbors before! 3. The UNIBUS DD11 backplane has a problem. This is a bit of a pain to troubleshoot unless you have a spare backplane. Or, if your backplane is in two or more sections, shorten it to one section and run the experiment, then try another one of the sections. 4. The UBA or the UNIBUS cable is malfunctioning. Again, ask Field Service to swap as much gear as they can. 5. Other unix wizards suggested possible problems in the KA785 CPU and the memory controllers; these are also suspect. Ask Field Service to check the revision level of your CPU hardware, and to apply any FCOs that you don't already have (i.e., get your money's worth for your service contract). If I read the DEC PDP11 Bus Handbook correctly, it appears that the data lines on the UNIBUS are not parity-checked. This would explain why you're not seeing any diagnostic printf's from the kernel: UNIBUS data can get mangled undetectably on its journey from the UDA50 to the UBA. The tried-and-true "swap it for a spare" approach is often the most expedient route to solving problems like yours. For what it's worth, we're using an RA81/UDA50 on a Vax-11/780-5 (that's a CPU that was born a 780 but received a 785 CPU transplant later in life) under 4.2bsd with the RIACS UDA50 driver, so such a hardware configuration *can* work. Our UDA50 sits alone on its own UBA because it won't play nice with the boys on the other UBA, tho. ----- Rick Ace Computer Graphics Laboratory New York Institute of Technology Old Westbury, NY 11568 (516) 686-7644 {decvax,seismo}!philabs!nyit!rick