Xref: utzoo comp.sys.att:8722 unix-pc.general:4780
Path: utzoo!utgpu!watserv1!watmath!uunet!cs.utexas.edu!wuarchive!mit-eddie!uw-beaver!sumax!polari!rwing!pat
From: pat@rwing.UUCP (Pat Myrto)
Newsgroups: comp.sys.att,unix-pc.general
Subject: Re: Fixdisk problems?
Summary: fixdisk kernel panic
Message-ID: <1054@rwing.UUCP>
Date: 7 Feb 90 16:40:36 GMT
References: <111@spirit.UUCP>
Distribution: na
Organization: Very Little Organization, Seattle WA
Lines: 95

In article <111@spirit.UUCP>, john@spirit.UUCP (John F. Godfrey) writes:
>	... [ edited to reduce length ] ...
> Early last week I installed Fixdisk 2.0 on spirit ... [with a] 67mb
> ST-4096 and DOS-73...  Shortly after installation I received the panic
> message which will follow... [after reboot] ... it paniced again.
> 
> Here is the panic message:
> ----------------------------------------------------------------------
> #WD1010 ST=/Sekg/Err/ EF=/Id?/ cy=710. sc=14. hd=7. dr#=0. MCR2:0x0
> #HDERR ST:51 EF:10 CL:C6 CH:2 SN:E SC:2 SDH:27 DMACNT:FFFF DCRREG:9F
> MCCREG:8300
> 
> panic: Hard disk timeout
> ----------------------------------------------------------------------

It's hard to say - I have seen that sort of panic before, but only
once.  It sounds like the drive wasn't seeking - like the seek mech
was jammed, or something.  I had it happen with a ST 251, and
rebooting didn't help, till the power was cycled - from the sounds it
made, that sort of "kicked" it loose.  It is possible your problems
are of a similar nature.  Even with you changing back to the old
kernel and things appearing to be fixed due to this, its still
possible that it was a coincidence, the operations, reboot cycles, etc
that got done when you restored the old kernel was what restored
sanity.  I have also installed the new fixdisk, and it has been
running fine for over a week, till today, where I got a "kernel
parity" panic.  I didn't copy the message down, but it mentioned a
disk parity error (though nothing was in unix.log).  I am convinced
that occasionally things such as this do happen.  If it happens again,
with the same problem, then I will be concerned.  Obviously things are
running fine now, as the involved system is the one I am typing this
prose on.  Once I had an entry in unix.log appear where it couldn't
read head 0, sector 0 cylinder 0, and bailed out with a "drive not
ready" error - if for real, a very grave symptom.  However, this was
months ago, and after rebooting, it hasn't happened since.  I did
selectively installed the fixdisk, instead of using the provided
Install script (because some stuff in the FIXDISK pkg I don't use
anymore, and because I tend to be leery of Install scripts in general,
especially ones that do such sweeping things as the FIXDISK one must
do).

Following is what *I* would do, if I were in the same situation.  I
probably am going into excess detail, but in this case that might be
preferable than assuming too much.  The procedure I used for
installing the FIXDISK worked for me, and this is being written in
good faith, but since I have no control over how this will be read or
interpreted, *YOU ARE ON YOUR OWN*.  NO CLAIMS ARE MADE AS TO THIS
BEING CORRECT OR BEING FREE OF LOGICAL OR TYPOGRAPHICAL ERRORS, OR
BEING WITHOUT CRITICAL OMISSIONS.

Before writing off the FIXDISK2.0, I would suggest re-trying the
FIXDISK (a different copy of it, if it was a downloaded copy), and
installing it BY HAND, rather than using the Install script - this
allows one to selectively install fixes, and to do it in stages, as I
suggest below, starting with the kernel, which provides most of the
major fixes, other than the uucico (uucico not being relevant if HDB is
installed), and the fix for the occasional corrupted /etc/utmp file.

I suggest you try unarchiving the fixdisk into a work subdir, (its a
cpio archive, and assuming FIXDISK2.0+IN is in the parent subdir, the
command ``cpio -iBcdm <../FIXDISK20+IN'' run as root, into an empty
subdir will extract the contents, preserving the original dates,
perms, and ownership of the files).  If its on the floppies, replace
the "../FIXDISK2.0+IN" with "/dev/rfp021".  In the subdir 'kernel',
unpack the kernel file (`` unpack UNIX3.51m'') and then copy the new
kernel to /UNIX3.51m.  Verify the permissions are at least 754,
owner/group root/sys (depending on how things are set up, you may need
to have world read perms on the kernel).  Follow with ``mv /unix
/unix.old'', (to preserve the old kernel, in case the UNIX3.5?  link
isn't there) and then do ``ln /UNIX3.51m /unix''.  Once the above
steps are done and checked for correctness, do a normal shutdown and
reboot.  If the system comes up OK, and gets past the time interval
where you originally experienced the problems, then I would try
replacing /etc/lddrv/wind.o, /etc/init, /bin/login, and /bin/getty,
etc., MANUALLY, BY HAND, with the files provided in the kernel, utmp,
subdirs, preserving the original versions as /bin/login.old,
/etc/lddrv/wind.o.old, etc.  You can inspect the Install script for
the proper permissions and owner/group to use on each file (most will
be owner=bin, group=bin).  Be sure that after the new init is copied
in, to rm /bin/telinit and then do ``ln /bin/telinit /etc/init'' (some
stuff does look for /bin/telinit, even possibly during reboot
sequence).  After verifying everything is right, again doing the
shutdown and reboot.  If the panics happen again, I have no suggestions.

Perhaps someone can answer - does 3.51 require a new format on the
drive that had previously been formatted with, say, 3.0 or 3.5?

As I said, your mileage may vary, but good luck - just proceed slowly
and carefully.

-- 
pat@rwing                                       (Pat Myrto),  Seattle, WA
                            ...!uunet!pilchuck!rwing!pat
      ...!uw-beaver!uw-entropy!dataio!/
WISDOM:    "Travelling unarmed is like boating without a life jacket"