Xref: utzoo comp.sys.amiga:17090 comp.sys.amiga.tech:160 Path: utzoo!mnetor!uunet!tektronix!tekcrl!tekfdi!videovax!stever From: stever@videovax.Tek.COM (Steven E. Rice, P.E.) Newsgroups: comp.sys.amiga,comp.sys.amiga.tech Subject: Re: 68030 Questions Message-ID: <4937@videovax.Tek.COM> Date: 31 Mar 88 17:39:33 GMT References: <4890@videovax.Tek.COM> <3507@cbmvax.UUCP> Reply-To: stever@videovax.Tek.COM (Steven E. Rice, P.E.) Organization: Tektronix Television Systems, Beaverton, Oregon Lines: 131 Summary: We're sort of edging toward agreement. . . In article <3507@cbmvax.UUCP>, Dave Haynie (daveh@cbmvax.UUCP) writes: > in article <4890@videovax.Tek.COM>, stever@videovax.Tek.COM (Steven E. Rice, P.E.) says: >> >> Dave Haynie's (daveh@cbmvax) most recent article was number >> <3394@cbmvax.UUCP>. In it, he cast aspersions on the poor, struggling >> LANCE and suggested that real systems do 32-bit DMA. Well, maybe -- >> but if you want to use Ethernet, the LANCE is about the only way to >> go, slow or no! > > Calm down! That's not what I said. I said that in very high > bandwidth-consuming operations, such as hard disk interfacing, where the > transfer between an I/O device and CPU addressable main memory can be sent > in large atoms, is best served by DMA, even in a 68020 or 68030 system. I > also said that in systems where transfers must occur in small atoms or at > relatively slow speed (like perhaps networks or things which must be > highly interactive), the I/O scheme to shared CPU memory was a good idea. I think there is still some misunderstanding here. When I mention dual- ported memories, I am speaking of memory that is "CPU addressable main memory"! It just happens to also be shared (on a cycle-by-cycle basis) with some other device, which could be an I/O device or another CPU. The Amiga implements a form of "shared" memory -- chip memory. The CPU gets access to chip memory on a shared basis, arbitrated cycle by cycle. Another form of "shared" memory is seen on the A2620 (?) card -- the 68020 CPU. The 68020 will have 2 or 4 megabytes of 32-bit wide memory which no one can deny it access to. Thus, if DMA is occurring to "main" memory, the 68020 may not be blocked at all. Carrying the idea one step further simply removes more limitations from the system, giving the CPU unrestricted access to the system bus and immediate access to any memory that is not in use during that memory cycle. >> In a perfect world, 32-bit DMA with a 512-byte assembly buffer and >> fast-as-a-speeding-bullet burst transfers would be possible. In real >> life, we have to make do with what we can buy. (Commodore can build >> what it needs; the economics in the Television Test and Measurement >> market are different than those in the personal computer market.) > > That's true, Commodore can build what it needs for those cases. The 16 bit > wide DMA driven hard disk controller on the 16 bit bus delivers around 625K > bytes/second with the Fast FileSystem. Fast FileSystem allows DMA from the > hard disk directly to the target memory, not intermediate buffers used. I > believe that any peripheral going this fast wants DMA. It's fully extensible > to a 32 bit machine, though a _conservative_ 32 bit machine rates that's > 2.5 megabytes/second thoughput (not even getting to things like burst > transfers, which are ideally suited to DMA transfers). If you're LAN is only > going 2.5 megabits/sec, that's certainly overkill and extra cost. Ethernet is 10 megabits/sec. > Which seems to make sense even today; most Amiga hard drives are DMA driven, > most Amiga LANs are CPU driven via shared RAM DMA. In the case of Ethernet I/O, transmissions are packetized with quite a bit of protocol overhead. Thus, the data to be transmitted must be broken into chunks no larger than the largest legitimate packet and shipped out one packet at a time. To do this, the CPU is going to have to move the data anyway -- it has to configure it in a form the I/O device can use. In this case, the copy from what you might consider "main" memory to "shared" memory is free. Starting with the FFS rate of 625K bytes/second and doubling that for a 32-bit bus gives 1.25 megabytes/second. This translates to a 10 megabit/second transfer rate, which is the same as the Ethernet. Using your figure of 2.5 megabytes per second gives 20 megabits/second throughput. But our CPU bus bandwidth is about 100 megabits/second (approximately 330 nsec main memory cycle time [not *access* time -- *cycle* time]). Thus, a 2.5 megabyte/second disk transfer would occupy only 20% of the bus bandwidth. If the disk DMA is transferring into unshared main memory, the CPU will just have to wait. At 2.5 megabytes/second (assuming 32-bit transfers), the disk will request one memory access every 1.6 microseconds. One possibility is to arbitrate for the bus for each transfer. Looking at the timing diagrams in the Motorola 68020 manual, one finds that there is a minimum of 1/2 clock period and a maximum of 1 clock period from the end of clock state S5 until Bus Grant* is asserted. There is also a note in paragraph 5.2.7.4 which says that "all asynchronous inputs to the MC68020 are internally synchronized in a maximum of two cycles of the system clock." This implies that the minimum to resume processing is 1 clock cycle. There is probably one additional cycle needed for the CPU to resume driving the address and data lines. Assuming a memory cycle time of 330 ns (which is what ours is) with 240 ns read or write access time, each 32-bit word transferred would hold the CPU bus for one arbitration time (1/2 to 1 clock cycles, or 30 to 60 ns in a 16.7 MHz system) plus one transfer time (240 ns) plus one bus relinquishment time (1 to 2 clock cycles, or 60 to 120 ns) plus one driver turnon time (1 clock cycle, or 60 ns). The minimum time required would be 390 ns, the maximum time would be 480 ns, and the mean time would be 435 ns. 435 ns out of 1.6 us is 27.2% of the bus bandwidth occupied. But not only is 27.2% of the bus bandwidth occupied, the CPU is denied the bus 27.2% of the time! This translates directly into throughput reduction. Another possibility is to block the data into (e.g.) 512 byte blocks and then arbitrate for the bus once per block. This drops the bus bandwidth occupation to 20% (since one arbitration is insignificant compared to the time to transfer 512 bytes as 128 32-bit words). But the CPU is still denied the bus 20% of the time. If, however, the disk data is DMAed into dual-ported memory, it can deny an access to the CPU a *maximum* of 20% of the time, and then only if the CPU is fetching all of its instructions from the shared memory! In actual operation, it is likely to be much less than that. There is also no reason the receiving process cannot use the data directly from the dual-ported memory, although in many cases there will be at least one copy between initial transfer and use of the data. >> There is another thought, too -- if you have only one DMA device, you >> could argue that it shouldn't make much difference if it DMAs into >> system RAM or into a dual-ported buffer. If you have more than one >> device contending for the system bus, however, multiple dual-ported >> buffers are a clear win. > > Not unless you have multiple CPUs to read them. Given just a single hard disk transfer as you have described it, DMA into a dual-port buffer avoids losing 20% of the CPU's processing capability. That seems worthwhile to me! Steve Rice ----------------------------------------------------------------------------- * Every knee shall bow, and every tongue confess that Jesus Christ is Lord! * new: stever@videovax.tv.Tek.com old: {decvax | hplabs | ihnp4 | uw-beaver}!tektronix!videovax!stever