Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!swrinde!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!att!fang!tarpit!bilver!bill
From: bill@bilver.uucp (Bill Vermillion)
Newsgroups: comp.unix.sysv386
Subject: Re: Automatic bad sector mapping
Message-ID: <1991Jun16.152902.5249@bilver.uucp>
Date: 16 Jun 91 15:29:02 GMT
References: <BB_-VH@jwt.UUCP> <1991Jun10.025527.10161@jwt.UUCP> <1991Jun10.230223.10316@ico.isc.com>
Organization: W. J. Vermillion - Winter Park, FL
Lines: 58

In article <1991Jun10.230223.10316@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes:
>john@jwt.UUCP (John Temples) writes about automatic remapping:
>> The ESIX implementation catches errors while they're still "soft,"
>> i.e., the error is recoverable.  So remapping occurs with no data loss,
>> as long as the first time a sector has an error it isn't a hard error.
 
>I don't believe this is a common failure characteristic.  Assuming that
>(1) you're running the drive in-spec, and (2) you've mapped out all the bad
>sectors determined by the drive manufacturer [N.B.: This is *NOT* the same
>as bad sectors found by running a r/w test], you shouldn't expect soft
>failures because you're not using any marginal sectors.

In the ESDI world (John's running ESDI on his ESIX as I am) the mapping of
the sectors from the manufacturers list is done automatically - it is read
from the defect list of the supplied drive.   And on big drives - no one is
going to type in a couple of hundred defects willing or perhaps accurately.

 
>For example, a tiny particle can get loose somehow.  If it's just the right
>size to get under the head, it'll take a tiny ding out of the coating on a
>platter...and there's a good chance it'll be small enough to leave you with
>a soft error.  However, you now have at least *two* tiny particles cruising
>around, possibly many more (the original and whatever got dug up).  You can
>see how that one degenerates.  It's only one hypothetical situation; the
>point is that if you start out using only the good sectors of a good disk
>and run in-spec, the sorts of things that can go wrong to produce soft
>errors are almost always (by that I mean something > 90%) precursors to
>a disastrous failure.

Your scenario would point to a drive that has not long to live.   Anytime
you have "particles" inside the drive you are going to loose that drive in
a short time.   At 3600 rpm it won't take long to trash that drive.

The ESIX system tell you when it has recovered the sector and what sector
it was.   It uses ECC to recover from the hard error.   That's why ECC is
used in the first place - whether it is on hard drives or tape drive.
Anytime you get an error that has to be corrected with ECC and you DON"T
block out the problem area you are asking for trouble.

I have a 660 meg ESDI that had about 300 bad sectors (I got it for about
$1000 off because it was just over the limit for that drive).   I have had
about 3 instances of ESIX remapping a bad sector in the 10 months I have
had this current drive running.   They occured in the first 3 months of use
and I have not had any since.   Remember, these are only remapped when a
hard error occurs and ECC is used for recovery.

The system has been running 24 hours per day and usually runs from 20 to 40
Megs a day through the system as a news node.  If there were problems with
their system I feel that I should have found it by now.

>If you run out-of-spec (e.g., non-RLL drives on an RLL controller), you're
>much more likely to see soft errors that stay soft.

Any one who does that gets exactly what they deserve, IMO.

-- 
Bill Vermillion - UUCP: ...!tarpit!bilver!bill
                      : bill@bilver.UUCP