Xref: utzoo comp.sys.amiga:17505 comp.sys.amiga.tech:248
Path: utzoo!mnetor!uunet!cbmvax!daveh
From: daveh@cbmvax.UUCP (Dave Haynie)
Newsgroups: comp.sys.amiga,comp.sys.amiga.tech
Subject: Re: 68030 Questions
Message-ID: <3609@cbmvax.UUCP>
Date: 11 Apr 88 18:30:23 GMT
References: <4937@videovax.Tek.COM>
Organization: Commodore Technology, West Chester, PA
Lines: 61

in article <4937@videovax.Tek.COM>, stever@videovax.Tek.COM (Steven E. Rice, P.E.) says:

> Another possibility is to block the data into (e.g.) 512 byte blocks and
> then arbitrate for the bus once per block.  This drops the bus bandwidth
> occupation to 20% (since one arbitration is insignificant compared to the
> time to transfer 512 bytes as 128 32-bit words).  But the CPU is still
> denied the bus 20% of the time.

First of all, with a better bus design (eg, not the current Amiga bus, but
perhaps a future version that's 32 bits wide), there's zero or very near
zero arbitration time; the bus's owner is determined dynamically on a 
cycle by cycle basis.

Secondly, since the 68020 with cache running only wants the bus 50% or so
of the time, on average, you take your 20% figure and immediately reduce it 
to 10%, on average.  It could be as bad as 20%, it could be as good as
0%, depending on what the CPU is doing.

Now we add a priotity scheme.  If the CPU operation is more important, it
gets the bus for any cycles it needs, and the DMA device gets whatever it
wants from the remaining 50% of the bus.  And that's assuming that the bus
is limited to CPU bus speeds.  It's pretty simple to make DMA devices run
nybble or page mode cycles that the CPU can't keep up with, but most
memory systems can be designed with this in mind for nearly free.  So with
DMA going with a nybble transfer, you're now down to less than 5% of the
bus bandwidth for that transfer.  VME and non-Apple NuBus both do things
like this.

> Given just a single hard disk transfer as you have described it, DMA into
> a dual-port buffer avoids losing 20% of the CPU's processing capability.
> That seems worthwhile to me!

But you're still missing the point.  The CPU has to stop what it's doing to
transfer the data by hand.  If it did that JUST as efficiently as the DMA
device, you'd still be loosing whatever CPU time you claim is being eaten
by the DMA transfer, 20% or whatever (keep in mind this 20% figure only
applies during an actual transfer).  If the DMA transfer happens twice as
fast as the CPU could transfer the data, then I'm gaining in CPU speed,
even though I'm kicking the CPU off the bus for awhile.  DMA transfers on
the Amiga bus with a 68020 go twice as fast as the 68020 could possibly
transfer them.  68000 based CPU transfers are more like 1/4th the speed of
the DMA device.  My point is that someone has to do the work of transfer
unless you can live with the data exactly where it's dumped in your
shared memory scheme.  If you know there's no transfer required, share the
memory, but if there is, and especially if the memory can be used as is,
once it reaches it's destination (like NewFS), DMA wins.  

There's actually a test case of this available in the Amiga world.  As I've
already mentioned, the A2090 controller uses a FIFO and DMA to complete it's
transfer, and achieves about 625K Bytes/Second.  There's a new SCSI 
controller out there, from a company called Great Valley Peripherals, that
uses an I/O chip DMA to shared RAM (4K of static RAM on-board, so once
you're in sync I suspect there will rarely be a collision between the
CPU and the peripheral chip).  I don't have any benchmarks on this new board,
but I guarantee it'll be slower.

> 					Steve Rice
-- 
Dave Haynie  "The B2000 Guy"     Commodore-Amiga  "The Crew That Never Rests"
   {ihnp4|uunet|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
		"I can't relax, 'cause I'm a Boinger!"