Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!nrl-cmf!cmcl2!husc6!mit-eddie!uw-beaver!tektronix!tekcrl!tekfdi!videovax!stever From: stever@videovax.Tek.COM (Steven E. Rice, P.E.) Newsgroups: comp.sys.amiga Subject: Re: 68030 Questions Message-ID: <4822@videovax.Tek.COM> Date: 4 Feb 88 19:04:54 GMT References: <10170@ccicpg.UUCP> <3246@cbmvax.UUCP> Reply-To: stever@videovax.Tek.COM (Steven E. Rice, P.E.) Organization: Tektronix Television Systems, Beaverton, Oregon Lines: 60 Summary: DMA is the *SLOW* way to go! In article <3246@cbmvax.UUCP>, daveh@cbmvax.UUCP (Dave Haynie) writes: [ discussion of, among other things, DMA to fast ram ] > This happens all the time with things like hard disk drives. It sure does > hurt the 68000's speed, but consider the alternative. You've got to get > that disk data into memory somehow. If you make the 68000 go and read it > from an I/O port somewhere, you're running several memory cycles per data > transfer. I mean, instruction fetch, I/O fetch, instruction fetch, write to > RAM, instruction fetch, test and branch, something like that. Once a DMA > driven controller is set up (simple, nothing like setting up the blitter), > you have a bus arbitration, then one word transferred by the controller per > memory cycle. If you're a 68020, you may even run a little from cache after > the arbitration. So this is much faster than possible without DMA. This is true for a 68000 or 68010, and perhaps even for a 68020 or 68030 on a 16-bit-wide bus. However, for best performance you want to put the DMA peripherals on one side of a dual-ported memory and let the CPU do the data moving. Why? The reasons are as follows: 1. Most DMA peripherals are incredibly sluggish. An example is the LANCE, an Ethernet interface chip. It transfers data in blocks of eight 16-bit words. The *minimum* time to perform this transfer is 4.8 microseconds, with no-wait-state memory. Add arbitration time to this and it becomes more like 5.1 microseconds. And if you can't complete a memory cycle in less than 105 nanoseconds, each cycle (remember, there are eight of them!) gets longer in 100-nanosecond steps. To keep up with the Ethernet, the LANCE will arbitrate for the bus about every 12.8 microseconds, tying it up for 5.1 microseconds minimum. This is about 40% of the bus bandwidth. 2. On a 32-bit bus, the 68020 can move data very efficiently -- once the instructions have been loaded into the cache, the only thing on the bus will be (32-bit) data transfers. Even with reasonably slow memory (180-nanosecond access, 300-nanosecond cycle time), this means that the 68020 can transfer data twice as fast as a LANCE running on 100-nanosecond access memory. If you dual-port the LANCE memory properly (32 bits wide to the 68020, 16 bits wide to the LANCE), you can move the data from the dual-ported memory *while* the LANCE is transferring other data into it, thus achieving an effective doubling of the transfer rate and freeing the bus for other purposes the rest of the time. The same thing applies to hard disks, too. The 68020 can sustain a 48 Mbit/second transfer rate. Typical hard disks run at 5 to 10 Mbit/ second rates. Unless the hard disk interface is fast as greased lightning *and* 32 bits wide, the 68020 or 68030 can move the data faster! So, for maximum performance, hide your peripherals behind dual-ported memory, and then mark those pages as "non-cacheable." Steve Rice ----------------------------------------------------------------------------- * Every knee shall bow, and every tongue confess that Jesus Christ is Lord! * new: stever@videovax.tv.Tek.com old: {decvax | hplabs | ihnp4 | uw-beaver}!tektronix!videovax!stever