Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!nrl-cmf!cmcl2!husc6!mit-eddie!uw-beaver!tektronix!tekcrl!tekfdi!videovax!stever
From: stever@videovax.Tek.COM (Steven E. Rice, P.E.)
Newsgroups: comp.sys.amiga
Subject: Re: 68030 Questions
Message-ID: <4822@videovax.Tek.COM>
Date: 4 Feb 88 19:04:54 GMT
References: <10170@ccicpg.UUCP> <3246@cbmvax.UUCP>
Reply-To: stever@videovax.Tek.COM (Steven E. Rice, P.E.)
Organization: Tektronix Television Systems, Beaverton, Oregon
Lines: 60
Summary: DMA is the *SLOW* way to go!

In article <3246@cbmvax.UUCP>, daveh@cbmvax.UUCP (Dave Haynie) writes:

[ discussion of, among other things, DMA to fast ram ]

> This happens all the time with things like hard disk drives.  It sure does
> hurt the 68000's speed, but consider the alternative.  You've got to get
> that disk data into memory somehow.  If you make the 68000 go and read it
> from an I/O port somewhere, you're running several memory cycles per data
> transfer.  I mean, instruction fetch, I/O fetch, instruction fetch, write to
> RAM, instruction fetch, test and branch, something like that.  Once a DMA
> driven controller is set up (simple, nothing like setting up the blitter),
> you have a bus arbitration, then one word transferred by the controller per
> memory cycle.  If you're a 68020, you may even run a little from cache after
> the arbitration.  So this is much faster than possible without DMA.

This is true for a 68000 or 68010, and perhaps even for a 68020 or 68030 on
a 16-bit-wide bus.  However, for best performance you want to put the DMA
peripherals on one side of a dual-ported memory and let the CPU do the
data moving.  Why?  The reasons are as follows:

  1. Most DMA peripherals are incredibly sluggish.  An example is the
     LANCE, an Ethernet interface chip.  It transfers data in blocks of
     eight 16-bit words.  The *minimum* time to perform this transfer is
     4.8 microseconds, with no-wait-state memory.  Add arbitration time
     to this and it becomes more like 5.1 microseconds.  And if you can't
     complete a memory cycle in less than 105 nanoseconds, each cycle
     (remember, there are eight of them!) gets longer in 100-nanosecond
     steps.

     To keep up with the Ethernet, the LANCE will arbitrate for the
     bus about every 12.8 microseconds, tying it up for 5.1 microseconds
     minimum.  This is about 40% of the bus bandwidth.

  2. On a 32-bit bus, the 68020 can move data very efficiently -- once the
     instructions have been loaded into the cache, the only thing on the
     bus will be (32-bit) data transfers.  Even with reasonably slow
     memory (180-nanosecond access, 300-nanosecond cycle time), this means
     that the 68020 can transfer data twice as fast as a LANCE running
     on 100-nanosecond access memory.

If you dual-port the LANCE memory properly (32 bits wide to the 68020,
16 bits wide to the LANCE), you can move the data from the dual-ported
memory *while* the LANCE is transferring other data into it, thus
achieving an effective doubling of the transfer rate and freeing the
bus for other purposes the rest of the time.

The same thing applies to hard disks, too.  The 68020 can sustain a
48 Mbit/second transfer rate.  Typical hard disks run at 5 to 10 Mbit/
second rates.  Unless the hard disk interface is fast as greased
lightning *and* 32 bits wide, the 68020 or 68030 can move the data
faster!

So, for maximum performance, hide your peripherals behind dual-ported
memory, and then mark those pages as "non-cacheable."

					Steve Rice

-----------------------------------------------------------------------------
* Every knee shall bow, and every tongue confess that Jesus Christ is Lord! *
new: stever@videovax.tv.Tek.com
old: {decvax | hplabs | ihnp4 | uw-beaver}!tektronix!videovax!stever