Path: utzoo!attcan!uunet!nuchat!steve From: steve@nuchat.UUCP (Steve Nuchia) Newsgroups: comp.unix.microport Subject: Re: How does Microport System V/AT handle bad blocks? Message-ID: <2689@nuchat.UUCP> Date: 24 Dec 88 19:57:56 GMT References: <460@tarpit.UUCP> <326@focsys.UUCP> <464@tarpit.UUCP> Reply-To: steve@nuchat.UUCP (Steve Nuchia) Organization: Houston Public Access Lines: 59 In article <464@tarpit.UUCP> rd@tarpit.UUCP (Bob Thrush) writes: [concerning microbug phantom disk errors on second drive] >The 2nd disk (that I'm having trouble with) is mostly used as the news >spool directory, so it is definitely getting a whole lot different >activity than it did before the onset of the problems. Each time the From my extensive experience with this problem if it gets you it gets you in proportion to the frequency of write access. News spool is about the worst thing to put out there but I kept mine there because I didn't want the errors eating anything I wanted to keep. Now I'm using Interacteve on Bell Tech. Still have some problems but nothing like Microport. I spent a year and a half of my life working with those clowns. Boy am I a sucker. >problem shows up, I find that each subsequent fsck finds more problems, >usually associated with duplicates in the free list. I wind up >mkfs'ing the news file system to correct(?) the problem. I am usually The problem here is a BUG in FSCK. There is a workaround. I know of at least two people in Microport who have been assigned to fix it, I don't know if either of them made any more progress than I did. The bug is that, for large filesystems, fsck's free block bitmap gets corrupted. The bitmap is built in phase 1, corrupted in phase 2 by an as-yet undiscovered mechanism, and used to rebuild a bad freelist in phase 5/6. Note that it will report a bad freelist on a perfectly good filesystem, then proceed to trash it, if you let it. When it rebuilds a random freelist it uses some blocks assigned to files as freelist chain block, corrupting the files. When some of those blocks fall in directories you really get filesystem hash. The workaround is to run fsck on your filesystem but NOT ALLOW it to REBUILD THE FREELIST. Then run fsck -f on it. The -f option says to just run phase 1 and 5/6, and it can be allowed to rebuild the freelist since it didn't scribble on its bitmap in phase 2. My analysis of the code says that this is a compiler bug, but there is the possibility that it is a subtle architecture dependency in fsck itself. In any case the mechanism appears to involve aliasing of one or more blocks in fsck's "virtual memory" code -- it manages a file-backed buffer pool using some of the most twisted code I've ever laid eyes on. The problem is not sensitive to optimization when compiling fsck. It is extremely sensitive to the size and contents of your filesystem. In my experience filesystems that are small enough to not require a temporary file are safe. >BTW, I got a complete rundown of the meaning of the hard disk i/o >errors from Randy Jarrett who copied a posting <358@uport.UUCP> >by Marc de Groot (then of Microport). When I return from the >holidays, I'll repost that if there is interest. Thanks, Randy >(and Marc). Please do. -- Steve Nuchia South Coast Computing Services uunet!nuchat!steve POB 890952 Houston, Texas 77289 (713) 964 2462 Consultation & Systems, Support for PD Software.