Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!rutgers!cbmvax!daveh From: daveh@cbmvax.commodore.com (Dave Haynie) Newsgroups: comp.sys.amiga.advocacy Subject: Re: (Video) Hardware Idiots ? Message-ID: <22459@cbmvax.commodore.com> Date: 14 Jun 91 16:28:15 GMT References: <22368@cbmvax.commodore.com> <1991Jun12.232718.2373@mintaka.lcs.mit.edu> <22392@cbmvax.commodore.com> <1991Jun13.175041.15679@cs.mcgill.ca> Reply-To: daveh@cbmvax.commodore.com (Dave Haynie) Organization: Commodore, West Chester, PA Lines: 68 In article <1991Jun13.175041.15679@cs.mcgill.ca> tinyguy@cs.mcgill.ca (Yeo-Hoon BAE) writes: >In article <22392@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes: >>In article <1991Jun12.232718.2373@mintaka.lcs.mit.edu> rjc@geech.gnu.ai.mit.edu (Ray Cromwell) writes: >>>In article <22368@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes: >>> So the max transfer rate, with 0 wait states is 10mb/sec (theoretical) on >>>the fast ram bus at @25mhz. >>Well, that's to fast RAM, which actually does have wait states. You can't run >>zero wait states on a 25MHz 68030 using DRAM, at least not without extra magic. > This is what I was puzzled for a while(still am). Why does 030 needs > MUCH faster DRAM to get it's 0 state, compare to 386? I know that on > 386(at least from it's manual), you can get a 0 w/s by using 80ns > interleaved memories. This was at 20MHz, but it seems that 20MHz 030 > won't go even close to achieving that! Am I mistaken? Are the 386 board > manufacturers not telling the truth? I don't know the '386. I know the '030 real well. At 20MHz, a 0 wait state 68030 cycle happen in 100ns. That's two clock cycles, 50ns each. So, without getting deeply into a DRAM design, it's plain from the fact that 100ns DRAM cycle in 190ns and 80ns DRAM cycle in 160ns that you can't built a 0 wait state memory subsystem out of either part for a 68030 processor. Now, assume we had a 20MHz 68000 (they don't exist, let's pretend). A 20MHz 68000's minimal cycle would be four clocks, 200ns total. In this case, you might be able to build a 0 wait state memory system with 100ns parts, definitely with 80ns parts. Using bank interleaving, where you have two separate DRAM control systems, one for odd words, one for even words, you can build a memory system that runs at access time, rather than cycle time, as long as the CPU accesses alternate banks. In this case, your 20MHz 68030 is still too fast for 0 wait state operation with 100ns DRAMs, but you'd need more work before deciding if you could live with 80ns parts (it won't, actually, the 20MHz 68030 has a worst case access time requirement of just under 50ns without wait states). Other tricks, such a using page mode DRAM and a page detection circuit, can make it look like you're close to 0 wait states. The fast page access time on most 80ns DRAM is around 35ns-40ns, which would be no real problem for 20MHz 68030, no wait states. So I build my page detection circuit. It runs at two clocks for on-page hits, five clocks when a page is open and we miss, and four clocks to open a new page. If you always hit, you're running 0 wait states. If you always miss, you're running with 3 wait states. Now, if I were in the Marketing Department, I would certainly assume that the customer is only going to run code that's mainly on-page, and therefore claim that the system is generally 0 wait state. I could even have benchmarks constructed that prove this. However, there's absolutely no certainty that, when running real code, you'll be anywhere near 0 wait states. Consider even a basic program, which has code, stack, and data in different places. One single instruction could be fetched from one page, access the stack on a different page, and then write out data to a third page. If that actually happened, this system would take 15 clocks for the three accesses. A dumb, non-page-mode memory system using the same CPU and memory would take 12. We actually have page detection logic on the A3000, it works with static column memory. You get 5 clocks for a random access, 3 for a page hit, and 7 for a page miss. We didn't find any speed improvement with it running real programs. Some benchmarks make it look like an improvement, some like a degredation. At present, there's a bug that makes it unusable in conjunction with hard disk DMA, but we had pretty much dismissed it before we found the bug, at least running typical AmigaOS code (we would certainly fix it, anyway, should the opportunity arise). -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "This is my mistake. Let me make it good." -R.E.M.