Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!samsung!munnari.oz.au!metro!news From: szabo_p@maths.su.oz.au (Paul Szabo) Newsgroups: comp.sys.apollo Subject: Bad blocks on DN10000 disk: crash: which files affected? Keywords: bad disk blocks DN10000 lsyserr salvol dex invol Message-ID: <1990Jul18.043746.712@metro.ucc.su.OZ.AU> Date: 18 Jul 90 04:37:46 GMT Sender: szabo_p@maths.su.oz.au (Paul Szabo) Reply-To: szabo_p@maths.su.oz.au (Paul Szabo) Organization: Dept of Applied Mathematics, University of Sydney Lines: 56 We have a DN10000 with two 697MB disks, striped (on the one controller). Lately, the disk(s) have developed bad spots. Attempting to access files stored at these spots occasionally causes the 10000 to crash. My question is: Is there a way of finding out what file is stored at a specific place on the disk? In more detail: The /systest/ssr_util/lsyserr utility gives messages like: 12:32:44 am (AEST) disk error Ctrl_# = 0 Unit_# = 0 Phys daddr = 21176 \ disk operation completed successfully after crc correction \ (OS/disk manager) Above disk chains a multiple-disk group - actual error is on: Ctrl_# = 0 Unit_# = 0 Phys daddr RELATIVE to this drive = 108BB The question is: Is there a way of finding out what file is stored at that place on the disk? I tried the Apollo Response Center, but did not get a positive answer yet. I would be very grateful for any insight. I would like to go to INVOL and add this block to the bad spot list, but first need to know which file is going to be affected. I do not wish to re-install the OS (and user files) from tape. I have tried SALVOL (options -a -s), but the problem is intermittent and I only got one problem file this way, with the message The following disk blocks had driver level I/O errors: /z/x/root/new_users/template_pm.pmthree/user_data vtocx = 211703, uid = 494B011D.5001A581 Error: status code = 0, read error at 10087F (logical), 21174 (physical) Note that there seems to be a discrepancy between the address 21174 reported by SALVOL, and the address 21176 reported by lsyserr (this is the closest I could find). At the suggestion of the Apollo engineer (he hoped that DEX would report UID's, and then I could use /systest/ssr_util/upath) I also ran EX DEX, RUN WIN -ENTIRE. The (hex) address 21176 [ = 8 + 14 * (6 + 15 * 645), since the drive has 1630 Cylinders, 15 Heads, and 14 Sectors] is found in the report Error: (WIN.DEX/Test 170) Read Disk Test, Rev 1.2 Pass 1 Uncorrectable ECC error Error Code = $23 Controller # = 0 Unit # = 0 Cylinder # = 645,$285 Head # = 6,$6 Sector # = 8,$8 Note that the above is just one example of bad blocks, both lsyserr and DEX complain about a dozen of them. My last gripe is: when I tried to access the problem directory .../template_pm.pmthree/user_data, sometimes there were no problems whatsoever (I suppose these correspond the the lsyserr entry 'completed after crc correction') but at other times the 10000 simply crashed... Paul Szabo szabo_p@maths.su.oz