Path: utzoo!attcan!uunet!mcvax!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.unix.microport Subject: Re: How does Microport System V/AT handle bad blocks? Summary: releases prior to 2.3 bungled horribly bad block handling Message-ID: <452@aber-cs.UUCP> Date: 21 Dec 88 00:44:03 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Distribution: eunet,world Organization: CS Dept., University College of Wales, Aberystwyth, UK (Disclaimer: my statements are purely personal) Lines: 52 In article <326@focsys.UUCP> larry@focsys.UUCP (Larry Williamson) writes: In article <460@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes: [ .... io errors on two drive system .... ] [ .... io errors as well .... ] We upgraded to 2.4 and errors have disappeared completely. We also replaced the disk, I couldn't bring myself to trust it. The bad block handling code in 2.3 was horribly braindamaged. It did not recover from soft errors, and then wrote random trash in random blocks. The disk instead you could have truested; it was clearly a case of environmental (dis)adaptation of the format. I'm not sure why, but it seemed that the disk errors grew at an exponential rate. A folksy description of a common problem follows. Winchester disks are very delicate things. If operating temperature changes, etc..., they suffer contraction/expansion of the surfaces, or of the heads etc..., and what was previously recorded may become gibberish. This does not imply that the surface has become damaged though, simply that it has become difficult to read back the recorded format. The sumptoms are an increase of the number of soft errors, and then of hard errors. The cure is to reformat the disk. By the way, never trust a preformatted disk; always reformat it on site, in the place where the machine will be used, in its typical operating conditions. I would therefore suggest that you *very quickly*, get your 2.4 upgrade and install it. The advantage of 2.4 is that bad block handling now is said to be ok. previously if a read from a disk failed, it was not retried at all (even if most errors are soft), and the buffer cache slot that was assigned to the block to be read was not marked invalid. If and when written back to disk, the previous contents of that slot would overwrite the contents of the disk block, with astonishing results. I would also suggest that you verify your backups, you might be surprised by what is on (or not on) those tapes! I would also suggest not to trust the current contents of your disks, unless you check them. Note that I said *contents*, not just *structure*, i.e. some of your files contents may have been corrupted. -- Piercarlo "Peter" Grandi INET: pcg@cs.aber.ac.uk Sw.Eng. Group, Dept. of Computer Science UUCP: ...!mcvax!ukc!aber-cs!pcg UCW, Penglais, Aberystwyth, WALES SY23 3BZ (UK)