Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!rutgers!ucla-cs!zen!ucbvax!decvax!dartvax!steve From: steve@dartvax.UUCP (Steve Campbell) Newsgroups: comp.bugs.4bsd,comp.unix.wizards Subject: Solution to: Cant access disks on second UDA50 Message-ID: <6842@dartvax.UUCP> Date: Sun, 9-Aug-87 19:09:04 EDT Article-I.D.: dartvax.6842 Posted: Sun Aug 9 19:09:04 1987 Date-Received: Tue, 11-Aug-87 01:46:38 EDT References: <6683@dartvax.UUCP> Reply-To: steve@dartvax.UUCP (Steve Campbell) Distribution: world Organization: Dartmouth College, Hanover, NH Lines: 89 Keywords: unibus uda50 Xref: mnetor comp.bugs.4bsd:482 comp.unix.wizards:3624 In article <6683@dartvax.UUCP> I wrote: >Although conventional wisdom says not to put more than 1 UDA50 per >unibus, we are trying to do just that. We have added a second UDA50 to >the bus and a third-party device called a USI/HRS from a company named >Shitashi which claims to enhance the unibus bandwidth enough to permit >the second uda. The other devices on the unibus are a DEUNA and 2 DZ11s. > >For testing purposes, we moved 2 RA81s from uda0 to uda1, so we have... > >controller uda0 at uba0 csr 0172150 vector udintr >disk ra0 at uda0 drive 0 >disk ra1 at uda0 drive 1 >controller uda1 at uba0 csr 0172550 vector udintr >disk ra2 at uda1 drive 2 >disk ra3 at uda1 drive 3 > >As far as we can tell, the hardware is working just fine. >But ... if we do a large number of accesses to files on any disk >USING PATHNAMES, then do a sync, the 2 disks on the second uda cannot >be accessed, and the command - and terminal - trying to do so hangs >completely. Several people suggested that the configuration specification was the problem; others said no, the config is OK. Ed Gould posted a nice mini- dissertation on how configuration names are mapped. Conclusion: the configuration is OK. Other people suggested adjusting the time delay jumper on the UDA50. [BTW, beware of a typo in the DEC UDA50 Users Manual table that tells how to set that jumper. The pins are mislabeled.] Someone else suggested changing UDABURST in the driver. Conclusion: these adjustments no doubt affect performance, but they were not the cause of my problem. Scott Bradner (harvisr!sob) pointed me in the right direction: > the 4.3 uda driver has a bug that causes the drives on a 2nd controller > to appear to go off line under load, any processes that are accessing those > drives will hang forever. Jean Huens (kulcs!jean) got closer: > I got similar problems on a microvax. We have there (on a Q-bus) an > RQDX (+- uda compatible : same driver) from DEC and a second RQDX > compatible controller (Sigma) with an Fujitsu Eagle. Ocassionaly > processes got hung waiting for the fujitsu. The problem was that the > controller was idle (without outstanding commands) But there were still > request from Unix waiting (looks like interrupt lost or race > condition). I looked in the uda driver from Ultrix 1.2 and saw they > start there a timer which calls the udastart routine regularly. (once a > minute) This cured the problem with the disk. Jean sent me that modified driver. I installed it and ran my standard test that would hang the system. It hung as always... but as soon as the timer that Jean mentions went off, the hung command completed normally. It was spooky, as though there was a little gremlin in there that got poked every minute or so and un-jammed things. Now this jerky operation of the system was not good enough for production work, but it seems to clinch what was causing the problem. I would like to hear an explanation from someone who knows the hardware well. Which leads me to... The final solution to the problem. In one posting Chris Torek wrote: > What makes Steve's problem particularly perplexing is that everything > works at least a little bit. The machine finds the controllers > and drives, and can talk to them a bit, e.g., with raw I/O. Raw > transfers do not really work the I/O system very hard, though, so > I suspect some sort of hardware glitch with `simultaneous' transfers. > > (My first suggestion, of course, was to try my driver....) Well, I hate a smart aleck, especially one who turns out to be right. I tried Chris's driver, and it solves the problem. The new configuration works as well as the old. Very nice work, Chris. Chris's driver prints some identification information at boottime, including the following from my machine: Aug 9 12:34:56 libdev vmunix: uda0: version 5 model 6 ... Aug 9 12:34:56 libdev vmunix: uda1: version 4 model 6 Is that different version number significant to this problem? In conclusion, thanks to all for the help, and especially to Chris Torek for the new driver that doesn't have the bug. And how about a fix from Berkeley for their standard driver? Steve Campbell Dartmouth College