Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!att!tut.cis.ohio-state.edu!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Error rates (was Re: IO buses Message-ID: <11417@pt.cs.cmu.edu> Date: 16 Dec 90 02:44:30 GMT References: <11393@pt.cs.cmu.edu> <3838@bnr-rsc.UUCP> Organization: Carnegie-Mellon University, CS/RI Lines: 46 In article <3838@bnr-rsc.UUCP> bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow) writes: >In article <11393@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >>Creo's 1 TB optical tape holds 10^12 bytes and has "fewer >>than 1 in 10^12" bit errors. The pessimistic reading is as "fewer >>than 8 mistakes per reel". > What does the specified error rates mean? In general, it means that any bit has one chance in 10^12 of being wrong. However, that's merely the standard abstraction, found in marketing literature. When you get down to actually building specific devices, you deal in various error sources, and characterize each. For example, the Creo stores 64 KB of data with 16 KB of ECC, making an 80 KB physical record. So, you would want to know the chance of a given one-record read having an uncorrectable error. Since optical systems are susceptible to dust, you would also want to know the chance that the error was soft, ie. that a reread would succeed. > Assuming we want systems to have "selectable, adjustable amounts of > protection", which component of the system should handling the error > correction code, etc.? This stuff is best done in hardware: if not in the drive, then in the controller. I don't see any reason why that hardware can't allow the software to select from some limited menu. > What facilities should the O/S provide? An error corrected file system? Well, it would be nice if media error rates (particularly corrected- error rates) could be logged in some coherent fashion. This is more a management issue than an applications issue: perhaps a disk is planning to fail, or a tape drive needs cleaning. Arbitrarily high reliability is achieved by replication - for instance, multiple backup tapes, kept in different buildings. How much protection is enough? Well, it's usually figured that the chance of a subsystem mangling data, should be [..hand wave..] less than the chance of some other subsystem mangling it. More reliability than that, is a waste of money. Less reliability than that, is asking to be the goat. Does anyone have a current figure on the bit error rate going _to_ the storage system? -- Don D.C.Lindsay .. temporarily at Carnegie Mellon