Path: utzoo!yunexus!ists!jarvis.csri.toronto.edu!mailrus!uunet!cbmvax!daveh From: daveh@cbmvax.UUCP (Dave Haynie) Newsgroups: comp.sys.amiga.tech Subject: Re: Hard disks, DMA vs Non-DMA Message-ID: <8580@cbmvax.UUCP> Date: 15 Nov 89 20:56:16 GMT Article-I.D.: cbmvax.8580 References: <8911150430.AA24506@jade.berkeley.edu> Organization: Commodore Technology, West Chester, PA Lines: 94 in article <8911150430.AA24506@jade.berkeley.edu>, GORRIEDE@UREGINA1.BITNET (Dennis Robert Gorrie) says: > The story goes, DMA is faster. You first have to look at the problem you're trying to solve. The problem, in this case, is data transfer from a hard disk controller to the Amiga's main memory. Except for this transfer mechanism, there's nothing intrinsically different between DMA and non-DMA devices. For the DMA transfer, a device of some kind requests the Amiga's bus and transfers a number of words of data to or from the Amiga's main memory. Once this transfer is complete, it will probably have to involve the main CPU, at least to tell the main CPU that it's done. But the transfer is very efficient, because the CPU isn't involved during the transfer (eg, no interrupts, no need to push and pop stacks, etc.), and there's only one bus crossing per word transfer; data flows only between main memory and the DMA device. For a non-DMA transfer, the CPU is involved to some degree or another. At the worst, it works like the Mac's hard disk interface, where the CPU is required to talk directly to a SCSI chip, and must basically sit and wait for each byte to be available. Much better is the GVP approach, where the SCSI device itself transfers a whole block (or possibly several blocks) into local memory. At this point, the CPU is called upon to transfer that data to or from this local memory. This transfer requires two bus crossings for each word; data flows between the main memory and the CPU, then between the CPU and the local memory (or visa versa). > But, as many people point out, it sometimes is slower than non-DMA, > when there is contention for the bus. Case in point is the hi-res > interlace screen situation where co-processors and your DMA hard disk > device are contending for cycles on the coprocessor bus. DMA to chip memory, unless you really need it, is a bad idea with any kind of controller, since you can be kept out of chip memory for an extended period of time. In order to even start a transfer from the hard disk controller, you can't have the CPU waiting on chip memory, for either the DMA or non-DMA controller. Assuming DMA, the controller will request the bus from the CPU. The CPU can grant the bus right away, but the DMA device can't actually take over the bus until the CPU finishes its current cycle. When waiting for chip bus access in a high-activity display mode, this can be a long wait. For the non-DMA device, the CPU will get an interrupt signaling it's needed for a transfer. However, it can't service that interrupt until the current instruction is complete, which of course can't complete until the CPU has chip bus access. So in either case, when the CPU's involved in a delayed access to the chip bus, you have to wait. As long as the actual transfer goes to fast memory, you'll only have this initial lag (or possibly a few of them if the transfer is done in several bits, as it often is with FIFO based controllers), and you won't see too much DMA slowdown. If the transfer is into chip memory, you'll of course see a rather noticable slowdown. > Then someone says a 'proper' DMA device is faster, than non-DMA, even for the > situation above. How is this so? What is a 'proper' DMA device? The main problem you can get into this situation is essentially a flow control problem. You have data coming from a hard disk which needs to get stuck into memory somewhere. You can have an undetermined length of time to wait for access to that memory. If the controller is capable of stopping the flow of data into the device based on it's success at getting data out of the device, everything's cool. Some, like the GVP controller, do this by only dealing in whole disk blocks. Others, like the A2091, do this by starting and stopping the data transfer from the SCSI device itself. The problem that's been seen is when a device, be it DMA or non-DMA, isn't capable of starting and stopping this data flow. The A2090 is an example of such a device, at least as supported by its current software. When it can't get access to the bus within a certain amount of time, its FIFO overruns, and it has to attempt the transfer all over again. If it could tell the SCSI device to stop sending as its FIFO fills up, there'd be no problem (in fact, the A2091 has a smaller FIFO but works much better, because it can start and stop the data flow). I'm told that part of this problem is the A2090 support for ST-506. SCSI is a rather high-level protocol, with intelligent drives, and it can support things like start and stop. ST-506 is a low-level, dumb protocol that must transfer whole blocks in a fixed amount of time. You start a transfer from the disk into the FIFO, then start DMA out of the FIFO. If the DMA gets waited too long, the FIFO overruns, and you have to start over again. You have to take a look at the particular controller in question. Any modern review of a DMA controller should include its performance with a full bandwidth screen up (eg, 640 across, 4 bitplanes, overscan if you like). Modern DMA controllers like the A2091 and the Microbotics hardframe have no trouble with this situation. > +-----------------------------------------------------------------------+ > |Dennis Gorrie 'Chain-Saw Tag... | > |GORRIEDE AT UREGINA1.BITNET Try It, You'll Like It!'| > +-----------------------------------------------------------------------+ -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough