Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!cbmvax!daveh From: daveh@cbmvax.commodore.com (Dave Haynie) Newsgroups: comp.sys.amiga.hardware Subject: Re: GVP Trade-in Message-ID: <14192@cbmvax.commodore.com> Date: 4 Sep 90 20:34:28 GMT References: <589@oregon.oacis.org> <38CP09P@dri.com> <02048.002057@thiger.UUCP> Reply-To: daveh@cbmvax (Dave Haynie) Organization: Commodore, West Chester, PA Lines: 161 In article <02048.002057@thiger.UUCP> skraw@thiger.UUCP (Stephan von Krawczynski) writes: >>> True DMA (ala the A2090) is VERY important to me. GVP claims >>> this board is. Anybody have one that can comment? >1. GVP does not DMA to amiga-memory. That's ture, at least with the original GVP board. I don't know the details of the new one. >2. why is DMA so important for you? it is generally slower than the >processor-method (lets call it this way). That's ABSOLUTELY INCORRECT. But sometimes a misconception amoung the uninformed. You have to understand the problem you're trying to solve to get a true picture of what's happening. And that problem is, how can you efficiently transfer data from the SCSI chip into system memory. The simplest approach would, of course, be to have the CPU wait on the SCSI chip and copy over every single byte as it's available. This is basically what the pre-IIfx Macintoshes all do; they pretend the SCSI chip is slow, 8-bit wide memory, and wait for each byte to become available in a tight CPU copy loop. This is a loosing proposition from the start, however. The most common form of SCSI transfer, asynchronous SCSI, runs at up to 1.5 MB/s (Megabytes per second). Your A2000 bus runs at about 3.5 MB/s. If you run it at only 8 bits/transfer, that's cut down to 1.75 MB/s. However, using the CPU to do the copying, at best, you need two byte reads and one word write to get a word from SCSI chip into memory. That's a maximum speed of 1.17 MB/s for the transfer (neglecting any overhead from the transfer loop, which will be non-zero), and during that transfer, the CPU gets to do NOTHING but copy the data from the SCSI chip. This can't even keep up with SCSI, so we throw it out for anything but the cheapest controllers (the original TrumpCard and the original C Ltd controllers worked this way). The next approach would be to funnel two SCSI bytes into one word and do the same wait-copy approach. This would yield a maximum transfer rate of 1.75 MB/s, which will keep up with asynchronous SCSI at its fastest. However, this wait-copy approach has severe drawbacks. It gets the data into memory extremely fast, since nothing but the copy can happen until the data is in memory. But you may actually be WAITING for the data from the SCSI drive, for seeks or other times at which you're not getting it in at full speed. This kind of scheme may APPEAR to be the fastest transfer, since in using polled I/O instead of interrupts, there's never any lag at the end of the transfer, but you sacrifice your SYSTEM speed for hard disk speed. Overall, things will be slower, since you actually end up wasting time in wait states, waiting on the hard disk. Single tasking systems like the Macintosh or PC might take this OK, since they have nothing else to do anyway, but it's no good for an Amiga. C Ltd's Kronos and Supra's WordSync both funnel SCSI into the 16 bit data path, though I don't know if they hog the bus as described or not. The third approach is to add a buffer to your CPU copy approach. In this system, you have the SCSI chip itself conduct a DMA-like transfer into some private controller memory (many SCSI chips provide a counter output to make the hardware for this easy). Once the transfer is complete, the controller interrupts the CPU, which does a fast memory-to-memory copy of the acquired data block, all at once. The transfer speed here is still the same 1.75 MB/s as in the previous method, however, it always occurs at full speed. The preceived disk speed may be a bit slower, since the actual transfer doesn't start until the block is fully read, but the overall SYSTEM goes faster, since no time is wasted in wait states. The GVP controller does this. The final method is true DMA. While DMA controllers can use a full block buffering method, most use some kind of FIFO, which tends to be more efficient with DMA. The DMA controller can transfer data at the full 3.5 MB/s, although asynchronous SCSI can only manage 1.5 MB/s. The CPU will set up the DMA controller with a destination for any number of SCSI blocks, and then the DMA controller takes over. When the FIFO is near full, it requests the bus, transfers data at full speed, and then gives the bus back when the FIFO empties until the FIFO is near full again. The actual data gets into memory not much differently than the buffered CPU copy approach (eg, no wait states), but for the same amount of data transferred, the DMA device uses 1/2 the bus time. The other 1/2 is available to the CPU, so CPU work actually gets done during the transfer, even if waiting is required. A2090[a], A2091, A3000, and Microbotics Hardframe work this way (the A3000 DMA controller, by the way, runs at around 20 MB/s on a 25MHz A3000). >you win nothing because you have a whole lot of DMA going on already inside >the system The other DMA in the system is a completely different kind of DMA. Hard disk controllers run on the Fast/Expansion bus just like the CPU, while "Amiga" DMA is this special slot-allocated bus sharing that only takes place on the Chip bus. The two are unrelated; in fact, as far as the Fast or Chip bus is concerned, there's no difference between CPU access or access by a DMA driven expansion device such as a hard disk controller. >and processor's running into heavy troubles sometimes, e.g. >harddisk-performance is very low while using overscan-graphics (just to >mention an example). The only case in which overscan-graphics cause a problem is in the case of the 2090[a] controllers. And this has nothing to do with DMA. The effect of overscan with many bitplanes on is to tie up the chip bus for long periods of time. If the CPU is trying to access Chip memory at this time, it gets wait stated and can do nothing until a retrace comes along. DMA or non-DMA, you have the same problem here -- getting the CPU away from waiting on Chip RAM, either by interrupt in the non-DMA case or by bus request in the DMA case. While the hard disk controller is waiting for CPU/bus access, there is still SCSI activity coming in. Unless your controller fully buffering, as in my third case, it must tell SCSI to stop sending data. The problem with the A2090[a] is that it didn't know how to do this. At least part of that problem was due to it's support of ST-506. At the time, ST-506 was the primary interface, SCSI was simply added because it was easy. ST-506 is slower than SCSI, and being a dumb interface, can't be stopped. So for whatever reasons, the A2090[a] controllers didn't know how to tell SCSI to stop sending when they couldn't get the bus fast enough, so they don't work well with heavy chip bus activity. This IS NOT a general problem with DMA! The A2091, A3000, Hardframe, and most likely any other DMA driven controller will work as well with overscan, if not better, than any non-DMA device. The manufacturers of non-DMA devices often try to mislead you by claiming "DMA problems" and implying they pertain to all DMA driven controllers, not just the A2090[a] (which, by the way, haven't been made for awhile). >well, how about "transfer rates up to 4MB/SEC synchronous" (gvp). in fact >i have never understood this one. what do they mean? 4MBytes/sec? As I mentioned previously, asynchronous-mode SCSI transfers run at about 1.5 MB/s, tops. All SCSI devices out there run in asynchronous mode, and many don't handle synchronous mode. That's one of the reasons that the non-DMA controllers have managed, so far, to appear as fast or, parasitically, faster, than the DMA controllers -- as long as the SCSI transfers aren't faster than your transfer mechanism can handle, the speed of SCSI is the limiting factor. In synchronous mode, the SCSI bus uses a clock to coordinate the transfer, yielding potential transfers of 2 MB/s to 5 MB/s, depending on the clock. There's also a fast synchronous mode as part of the SCSI-2 spec which has a top speed of 10 MB/s. Synchronous won't make much difference in most single drive situations, anyway, since the raw speed of data coming off the disk is still around 1.25-1.5 MB/s, at best. But if you have multiple devices on the SCSI bus, and when faster devices are available, controllers that don't saturate the Amiga at 1.5 MB/s (eg, DMA controllers) will go noticably faster than non-DMA controllers. >i have never seen a controller/hd-combination reaching this. Like I said above, you probably won't. Just yet. And the raw SCSI transfer speed is only part of the equation -- your interrupt lag, device driver efficiency, system load, etc. all add to effective disk speed. >4MBits/sec = 512kBytes/sec. seems to be more like the truth, but is an >absolutely ridiculous value No, it's 4 MegaBYTES per Second. That's trivial compared to the speed of the A3000's main bus or Zorro III bus, though not bad for the kind of peripheral bus SCSI is supposed to be. As I mentioned, the drives themselves are still catching up to this, and most A2000-class SCSI controllers are caught up short. The A3000 can hit around 1.2 MB/s through the filesystem with an asynchronous SCSI device, since it's fast DMA and fast bus arbitration are practically invisible when compared to the speed of SCSI. And keep in mind, while that DMA is happening, you're only using about 7.5% of the 3000's main bus bandwidth (a full speed synchronous SCSI could take 25% if it could be sustained). >>Jimmy Liberato liberat@dri.com >stephan von krawczynski -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Get that coffee outta my face, put a Margarita in its place!