Path: utzoo!utgpu!cunews!bnrgate!bigsur!bnr-rsc!bcarh185!schow From: schow@bcarh185.bnr.ca (Stanley T.H. Chow) Newsgroups: comp.arch Subject: Error rates (was Re: IO buses Message-ID: <3838@bnr-rsc.UUCP> Date: 14 Dec 90 17:01:21 GMT References: <11393@pt.cs.cmu.edu> Sender: news@bnr-rsc.UUCP Reply-To: bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow) Organization: BNR Ottawa, Canada Lines: 62 Summary: Followup-To: Keywords: In article <11393@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >In article > pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >>There are even drives, magnetic or not, whose mean undetected error rate >>is of the same order as their capacity, so virtually guaranteeing that >>you get an undetected error every time you make a copy of them. After >>all even a fairly respectable undetected error rate of 1 in 10^12 is >>usually expressed in bits. > >Good point! Creo's 1 TB optical tape holds 10^12 bytes and has "fewer >than 1 in 10^12" bit errors. The pessimistic reading is as "fewer >than 8 mistakes per reel". It doesn't wash to say that one is storing >(say) images, where errors will be unnoticable. Images are usually >stored in some compressed form, and decompression should be a pretty >good error magnifier. > >Rather than expecting perfection, we should probably expect systems >to have selectable, adjustable amounts of protection. Question 1: What does the specified error rates mean? Are we talking 1 (undetected) error after reading 10^12 bits on average? (Even if we just read the same bit 10^12 times) Or does it mean 1 error for every 10^12 bits stored? So that a bit that was read correctly is expected to read correctly forever (or >> 10^12). question 2: Assuming we want systems to have "selectable, adjustable amounts of protection", which component of the system should handling the error correction code, etc.? In the case under discussion (Creo's 1 TB optical tape), it seems clear that neither the drive nor the controller is the right place. The cost of highly reliable storage that is fast is expensive enough that we don't want to include the cost in all applications. We are then left with the choice of either the O/S or the application. question 3: What facilities should the O/S provide? An error corrected file system? Almost no O/S today does this - they all rely on underlying H/W to do the error detection/correction. (Actually, desktop systems like MS-DOS and Amiga do have checksums for every block on disk, but there is still no real attempt to handle the errors). What are the costs of such a "reliable" file systems? Does this mean we have to stop DMA direct to user buffer? qestion 4: If the applications have to handle the error, how can applications be device independent and yet have optimum error correction stratagy? Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!uunet!bnrgate!bcarh185!schow (613) 763-2831 ..!psuvax1!BNR.CA.bitnet!schow Me? Represent other people? Don't make them laugh so hard.