Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!philabs!cmcl2!harvard!seismo!gatech!akgua!whuxlm!whuxl!houxm!ihnp4!alberta!calgary!ingoldsby
From: ingoldsby@calgary.UUCP (Terry Ingoldsby)
Newsgroups: net.micro.6809
Subject: Newdisk bug fixed
Message-ID: <111@vaxb.calgary.UUCP>
Date: Sat, 10-May-86 20:20:29 EDT
Article-I.D.: vaxb.111
Posted: Sat May 10 20:20:29 1986
Date-Received: Wed, 14-May-86 06:36:44 EDT
Organization: U. of Calgary, Calgary, Ab.
Lines: 67
Keywords: Device driver

After some late night hacking, I think I have been able to locate and
correct the bug in Dave Lewis' `Newdisk' device driver for OS9.  I
hope the following will allow other people to correct their versions.

In Dave's code, he uses the NMI generated by the WD1793 disk controller
as an asynchronous RTS.  The idea is sound but he had a little, itty, bitty
teeny, weeny bug.  But then, smallpox germs aren't too large either.  The
good news is that the vaccination for this bug is very easy.  Dave's code
looks something like this:

NMI.SVC   fix stack so it looks like the NMI never happened
          figure out error code
          effectively do an RTS with the error code in B


WRITE2   LDA #$A2 `Write sector' command
         BSR RWCMDX Execute command
WAITWDRQ BITA >STATREG Wait until controller is
         BEQ WAITWDRQ   ready to transfer data
*
WRTLOOP  LDA ,X+ Get byte from data buffer
         STA >DATAREG Put it in data register
         STB >DPORT Activate DRQ halt function
         BRA WRTLOOP Loop until interrupted
*


RWCMDX   LDX PD.BUF,Y Point to sector buffer
         LDB DPRT.IMG,U Do a side verify using the
         BITB #$40   DPORT image byte as a side
         BEQ WTKCMDX   select indicator
         ORA #8 Compare for side 1
WTKCMDX  STA >COMDREG Issue command to controller
         LDB #$A8 Set up DRQ halt function
         ORB DPRT.IMG,U OR in select bits
         LDA #2 DRQ bit in status register
         RTS


The idea is that someone calls WRITE2 (or a similar READ or VERIFY routine).
This routine then calls the appropriate CMDX routine which tells the disk
controller to execute the command, does some other stuff, and then RTS's
to the LOOP, where it loops until NMI provides the asynchronous RTS out of
WRITE2.  This is usually what happens.   Most successful commands take at
least a few milliseconds to execute and all is well.  It turns out that
unsuccessful commands return much quicker.  Moreover, the delay before
returning depends on many almost random factors (angular position of disk,
clock relations, etc.).  Some error conditions can be detected almost
instantaneously.  For example, a request to write to a write protected
disk can be determined to be an error in a matter of nano or microseconds.

Many of you are now shaking your heads and can see what the problem is.
For those who are still a bit bleary from late night hacking I will
complete the explanation.  If the NMI occurs very quickly then it will
occur before RWCMDX has had a chance to RTS.  This means that the NMI RTS
will take us back to the WRTLOOP, where we will loop forever waiting for
an event that has already occurred.  To fix this bug, simply take the RWCMDX
subroutine and insert it (or the WTKCMDX part of it) where previously there
was a call to the subroutine.

If others have trouble with this, I may (if Dave Lewis grants permission)
post the corrected code at a later date.

Keep hacking!


				Terry Ingoldsby