Path: utzoo!yunexus!ists!jarvis.csri.toronto.edu!mailrus!uunet!cbmvax!daveh
From: daveh@cbmvax.UUCP (Dave Haynie)
Newsgroups: comp.sys.amiga.tech
Subject: Re: Hard disks, DMA vs Non-DMA
Message-ID: <8580@cbmvax.UUCP>
Date: 15 Nov 89 20:56:16 GMT
Article-I.D.: cbmvax.8580
References: <8911150430.AA24506@jade.berkeley.edu>
Organization: Commodore Technology, West Chester, PA
Lines: 94

in article <8911150430.AA24506@jade.berkeley.edu>, GORRIEDE@UREGINA1.BITNET (Dennis Robert Gorrie) says:

> The story goes, DMA is faster.  

You first have to look at the problem you're trying to solve.  The
problem, in this case, is data transfer from a hard disk controller
to the Amiga's main memory.  Except for this transfer mechanism,
there's nothing intrinsically different between DMA and non-DMA 
devices.

For the DMA transfer, a device of some kind requests the Amiga's bus
and transfers a number of words of data to or from the Amiga's main
memory.  Once this transfer is complete, it will probably have to
involve the main CPU, at least to tell the main CPU that it's done.
But the transfer is very efficient, because the CPU isn't involved
during the transfer (eg, no interrupts, no need to push and pop
stacks, etc.), and there's only one bus crossing per word transfer;
data flows only between main memory and the DMA device.

For a non-DMA transfer, the CPU is involved to some degree or another.
At the worst, it works like the Mac's hard disk interface, where the CPU
is required to talk directly to a SCSI chip, and must basically sit and
wait for each byte to be available.  Much better is the GVP approach,
where the SCSI device itself transfers a whole block (or possibly several
blocks) into local memory.  At this point, the CPU is called upon to
transfer that data to or from this local memory.  This transfer requires
two bus crossings for each word; data flows between the main memory
and the CPU, then between the CPU and the local memory (or visa versa).

> But, as many people point out, it sometimes is slower than non-DMA, 
> when there is contention for the bus.  Case in point is the hi-res 
> interlace screen situation where co-processors and your DMA hard disk 
> device are contending for cycles on the coprocessor bus.

DMA to chip memory, unless you really need it, is a bad idea with any
kind of controller, since you can be kept out of chip memory for an
extended period of time.  In order to even start a transfer from the
hard disk controller, you can't have the CPU waiting on chip memory,
for either the DMA or non-DMA controller.  Assuming DMA, the controller
will request the bus from the CPU.  The CPU can grant the bus right
away, but the DMA device can't actually take over the bus until the
CPU finishes its current cycle.  When waiting for chip bus access in
a high-activity display mode, this can be a long wait.  For the non-DMA
device, the CPU will get an interrupt signaling it's needed for a
transfer.  However, it can't service that interrupt until the current
instruction is complete, which of course can't complete until the CPU
has chip bus access.  So in either case, when the CPU's involved in a
delayed access to the chip bus, you have to wait.  As long as the actual
transfer goes to fast memory, you'll only have this initial lag (or
possibly a few of them if the transfer is done in several bits, as it
often is with FIFO based controllers), and you won't see too much DMA
slowdown.  If the transfer is into chip memory, you'll of course see
a rather noticable slowdown.

> Then someone says a 'proper' DMA device is faster, than non-DMA, even for the
> situation above.  How is this so?  What is a 'proper' DMA device?

The main problem you can get into this situation is essentially a flow control
problem.  You have data coming from a hard disk which needs to get stuck into
memory somewhere.  You can have an undetermined length of time to wait for
access to that memory.  If the controller is capable of stopping the flow of
data into the device based on it's success at getting data out of the device,
everything's cool.  Some, like the GVP controller, do this by only dealing in
whole disk blocks.  Others, like the A2091, do this by starting and stopping
the data transfer from the SCSI device itself.  The problem that's been seen
is when a device, be it DMA or non-DMA, isn't capable of starting and stopping
this data flow.  The A2090 is an example of such a device, at least as 
supported by its current software.  When it can't get access to the bus within 
a certain amount of time, its FIFO overruns, and it has to attempt the
transfer all over again.  If it could tell the SCSI device to stop sending
as its FIFO fills up, there'd be no problem (in fact, the A2091 has a smaller
FIFO but works much better, because it can start and stop the data flow).
I'm told that part of this problem is the A2090 support for ST-506.  SCSI is
a rather high-level protocol, with intelligent drives, and it can support
things like start and stop.  ST-506 is a low-level, dumb protocol that must
transfer whole blocks in a fixed amount of time.  You start a transfer
from the disk into the FIFO, then start DMA out of the FIFO.  If the DMA
gets waited too long, the FIFO overruns, and you have to start over
again.

You have to take a look at the particular controller in question.  Any
modern review of a DMA controller should include its performance with
a full bandwidth screen up (eg, 640 across, 4 bitplanes, overscan if you
like).  Modern DMA controllers like the A2091 and the Microbotics
hardframe have no trouble with this situation.

> +-----------------------------------------------------------------------+
> |Dennis Gorrie                 'Chain-Saw Tag...                        |
> |GORRIEDE AT UREGINA1.BITNET                    Try It, You'll Like It!'|
> +-----------------------------------------------------------------------+
-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough