Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!samsung!munnari.oz.au!metro!news
From: szabo_p@maths.su.oz.au (Paul Szabo)
Newsgroups: comp.sys.apollo
Subject: Bad blocks on DN10000 disk: crash: which files affected?
Keywords: bad disk blocks DN10000 lsyserr salvol dex invol
Message-ID: <1990Jul18.043746.712@metro.ucc.su.OZ.AU>
Date: 18 Jul 90 04:37:46 GMT
Sender: szabo_p@maths.su.oz.au (Paul Szabo)
Reply-To: szabo_p@maths.su.oz.au (Paul Szabo)
Organization: Dept of Applied Mathematics, University of Sydney
Lines: 56

We have a DN10000 with two 697MB disks, striped (on the one controller).
Lately, the disk(s) have developed bad spots. Attempting to access files
stored at these spots occasionally causes the 10000 to crash.

My question is: Is there a way of finding out what file is stored at a
specific place on the disk?

In more detail:

The /systest/ssr_util/lsyserr utility gives messages like:
  12:32:44 am (AEST)  disk error
    Ctrl_# = 0  Unit_# = 0    Phys daddr = 21176 \
      disk operation completed successfully after crc correction \
        (OS/disk manager)
    Above disk chains a multiple-disk group - actual error is on:
    Ctrl_# = 0  Unit_# = 0    Phys daddr RELATIVE to this drive = 108BB

The question is: Is there a way of finding out what file is stored at
that place on the disk? I tried the Apollo Response Center, but did not
get a positive answer yet. I would be very grateful for any insight.

I would like to go to INVOL and add this block to the bad spot list, but
first need to know which file is going to be affected. I do not wish to
re-install the OS (and user files) from tape.

I have tried SALVOL (options -a -s), but the problem is intermittent and
I only got one problem file this way, with the message
The following disk blocks had driver level I/O errors:
    /z/x/root/new_users/template_pm.pmthree/user_data
     vtocx = 211703,  uid = 494B011D.5001A581
Error: status code = 0,  read  error at 10087F (logical), 21174 (physical)

Note that there seems to be a discrepancy between the address 21174
reported by SALVOL, and the address 21176 reported by lsyserr (this is
the closest I could find).

At the suggestion of the Apollo engineer (he hoped that DEX would report
UID's, and then I could use /systest/ssr_util/upath) I also ran EX DEX,
RUN WIN -ENTIRE. The (hex) address 21176 [ = 8 + 14 * (6 + 15 * 645),
since the drive has 1630 Cylinders, 15 Heads, and 14 Sectors] is found
in the report 
Error: (WIN.DEX/Test 170) Read Disk Test, Rev 1.2 Pass 1
Uncorrectable ECC error
Error Code = $23          Controller # = 0          Unit # = 0   
Cylinder # =  645,$285    Head # =  6,$6            Sector # = 8,$8


Note that the above is just one example of bad blocks, both lsyserr
and DEX complain about a dozen of them.

My last gripe is: when I tried to access the problem directory
.../template_pm.pmthree/user_data, sometimes there were no problems
whatsoever (I suppose these correspond the the lsyserr entry 'completed
after crc correction') but at other times the 10000 simply crashed...

Paul Szabo       szabo_p@maths.su.oz