Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!caesar.cs.montana.edu!ogicse!emory!stiatl!rsiatl!jgd From: jgd@rsiatl.UUCP (John G. De Armond) Newsgroups: comp.unix.i386 Subject: Re: Disks Hang Under 2.0.2 SCSI Message-ID: <907@rsiatl.UUCP> Date: 12 Dec 89 19:37:40 GMT References: <654400003@cdp> Reply-To: jgd@rsiatl.UUCP (John G. De Armond) Organization: Radiation Systems, Inc. (a thinktank, motorcycle, car and gun works facility) Lines: 58 In article <654400003@cdp> steve@cdp.UUCP writes: > > >SUMMARY -- README >------- >We have been experiencing regular crashes running under >Interactive 2.0.2 with 3 SCSI disks on an aha1542a. Later in >this message is a script which crashes our machine. The >purpose of this message is to find other people who are >willing to try to replicate these crashes on various >machines. I encourage folks to try out the script, even if >they do not have our exact hardware configuration. This will >help us to better understand the whether the problem lies in >hardware or in 2.0.2. > >DETAILS >------- >The symptom of the crashes is that all processes continue to >run, but any process that goes for the disk hangs. So, getty >prints the login prompt, and accepts a name at login:, but >when it goes to spawn login, the exec hangs the system. >Switch to a different virtual console, and repeat the same >thing. emacs works fine until it tries to auto-save, open >a file, etc... Steve, We have had the same failure here under similiar conditions. Configuration here is an Adaptec host adaptor and 2 380 mb Newbury data drives. Our problem seemed to manifest itself mostly under pathalogical conditions, such as when a bad block is discovered. I've also seen it when I've been running a script similiar to yours designed to hammer a new hard disk before putting it into service. The external symptoms are as you note PLUS I notice that the activity LED on the Adaptec board is stuck on AND the activity LED on one of the drives is on continously. We now have a bit more data in that it occurs on two totally different drive types. Without any investigation other than external observation, I suspect that the problem has to do with either a buffer getting overrun or a problem with a task releasing the scsi bus to another one. The fact that the problem only occurs either when 2 drives are heavily loaded or when an error condition happens - which appears from the LED activity to tie the bus up for a spell - should be a major clue. I absolutely cannot cause this failure by any combination of loading on one drive. John -- John De Armond, WD4OQC | The Fano Factor - Radiation Systems, Inc. Atlanta, GA | Where Theory meets Reality. emory!rsiatl!jgd **I am the NRA** |