Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!dali.cs.montana.edu!caen!sdd.hp.com!spool.mu.edu!munnari.oz.au!goanna!minyos.xx.rmit.oz.au!s861298 From: s861298@minyos.xx.rmit.oz.au (Marc A. Boschma) Newsgroups: comp.sys.nsc.32k Subject: Re: SCSI errors... Message-ID: <1991May18.002921.28531@minyos.xx.rmit.oz.au> Date: 18 May 91 00:29:21 GMT References: <9105161644.AA25790@hplwbc.hpl.hp.com> Organization: img Consultants Lines: 60 culberts@hplwbc.hpl.hp.com (Bruce Culbertson) writes: >> From: daver!uunet!munnari!eyrie.img.uu.oz.au!marcb@mips.com (Marc A. Boschma) >> >> Wow, now that I have Minix 1.3 on the machine (THANKS HEAPS >> Bruce!) I have noticed quite alot of: >> SCSI ok with recovery. code 0x17, logical address 0x >> >> and a few >> >> SCSI failure, key 0x3, code 0x11, log adr 0x31e, sense buf 0xc462 >"Ok with recovery" is a "soft error" -- some random noise or a power >glitch caused a disk operation to a healthy block to fail. Both Minix >and most SCSI disks retry operations which fail. This message means >Minix eventually was successful in performing a disk operation which >initially failed. Minix is trying to say that it is happy and its >file system is intact, but something funny happened with you disk >which you might want to know about. >It is normal and expected that you will see soft errors occasionally >but if you are seeing several a day, you have a problem. A typical >cause is a defect in the disk surface which makes reading the block >unreliable. The standard Minix distribution includes a tool for testing >all the blocks on a disk. Another tool builds a file of all the bad >blocks so that the blocks will not be allocated to files you care about. >If you get frequent retry messages and the block numbers are truly >random, then you have a problem in the drive electronics, its power >supply, or your pc532. Debugging it might require some creativity. The soft errors only occur for a given block once or twice so I hope there is only some noise on the SCSI bus. I'm thinking of doing a low level format and trying again. These problems occured after the machine had been on for about a day. Maybe better cooling is needed. >"SCSI failure" means Minix cannot talk to your disk. This usually >results in a panic. If Minix has been successfully talking to your >disk and then suddenly gets a "SCSI failure", then your file system >is likely to be corrupted. Cross your fingers and run fsck after you >debug and correct the problem. If your file system is really in bad >shape but you are desperate to save your data, you might have some >success with the disk editor "de". fsck has managed to clean it twice now..though I lost 6 blocks somewhere. >> Is it just the driver (remember its Minix 1.3) or could there be a problem >> with the drive (a Mini-Scribe) ? >1.3 has a pretty good SCSI driver, though not perfect. Many people >have used it with Mini-Scirbe drives. I do not think the 1.5h driver >is substantially different from the 1.3 driver. Ok, so I'll start debuging the hardware if the drive doesn't work after the format >Bruce Culbertson