Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!cmcl2!lanl!hc!pprg.unm.edu!cyrus From: cyrus@pprg.unm.edu (Tait Cyrus) Newsgroups: comp.bugs.4bsd Subject: 4.3 Tahoe dump bug Keywords: dump bug Message-ID: <23685@pprg.unm.edu> Date: 18 Dec 88 23:25:43 GMT Organization: U. of New Mexico, Albuquerque Lines: 109 In the process of trying to get the 4.3 Tahoe dump running on a Sun 3 running SunOS 3.X, I, along with others, have run into the following bug (feature) (shown below). >Writing dump file 0 (/research) > DUMP: Date of this level 1 dump: Sat Dec 17 12:59:10 1988 > DUMP: Date of last level 0 dump: Wed Dec 14 19:08:42 1988 > DUMP: Dumping /dev/rxy1g (/research) to /dev/rmt1h on host houdini > DUMP: mapping (Pass I) [regular files] > DUMP: mapping (Pass II) [directories] > DUMP: (This should not happen)bread from /dev/rxy1g [block 58766]: count=24, got=512 > DUMP: (This should not happen)bread from /dev/rxy1g [block 60802]: count=536, got=1024 > . > . > . > DUMP: (This should not happen)bread from /dev/rxy1g [block 372316]: count=1040, got=1536 > DUMP: (This should not happen)bread from /dev/rxy1g [block 378344]: count=24, got=512 > DUMP: More than 32 block read errors from 152660 > DUMP: This is an unrecoverable error. > DUMP: NEEDS ATTENTION: Do you want to attempt to continue?: ("yes" or "no") no > DUMP: The ENTIRE dump is aborted. This error is produced in dumptraverse.c routine bread. I am having a difficult time trying to figure out what the heck this routine is "supposed" to be doing. I say there are several bugs in this routine and that it should look something like the following: bread(da, ba, cnt) daddr_t da; char *ba; int cnt; { int n; if (lseek(fi, (long)(da * dev_bsize), 0) < 0){ msg("bread: lseek fails\n"); } while( cnt ) { n = read(fi, ba, cnt); if( n == 0 ) { msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n", disk, da, cnt, n); broadcast("DUMP IS AILING!\n"); msg("This is an unrecoverable error.\n"); if (!query("Do you want to attempt to continue?")){ dumpabort(); /*NOTREACHED*/ } } cnt -= n; ba += n; } } It currently looks like: bread(da, ba, cnt) daddr_t da; char *ba; int cnt; { int n; loop: if (lseek(fi, (long)(da * dev_bsize), 0) < 0){ msg("bread: lseek fails\n"); } n = read(fi, ba, cnt); if (n == cnt) return; if (da + (cnt / dev_bsize) > fsbtodb(sblock, sblock->fs_size)) { /* * Trying to read the final fragment. * * NB - dump only works in TP_BSIZE blocks, hence * rounds `dev_bsize' fragments up to TP_BSIZE pieces. * It should be smarter about not actually trying to * read more than it can get, but for the time being * we punt and scale back the read only when it gets * us into trouble. (mkm 9/25/83) */ cnt -= dev_bsize; goto loop; } msg("(This should not happen)bread from %s [block %d]: count=%d, got=%d\n", disk, da, cnt, n); if (++breaderrors > BREADEMAX){ msg("More than %d block read errors from %d\n", BREADEMAX, disk); broadcast("DUMP IS AILING!\n"); msg("This is an unrecoverable error.\n"); if (!query("Do you want to attempt to continue?")){ dumpabort(); /*NOTREACHED*/ } else breaderrors = 0; } } Am I misinterpreting what this routine is supposed to be doing? Will my code work? If not, why? Thanks --- W. Tait Cyrus (505) 277-0806 e-mail: cyrus@pprg.unm.edu University of New Mexico Dept of ECE - Parallel Processing Research Group Albuquerque, New Mexico 87131