Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!apple!voder!pyramid!cbmvax!daveh From: daveh@cbmvax.commodore.com (Dave Haynie) Newsgroups: comp.arch Subject: Re: "zero wait states" Message-ID: <10772@cbmvax.commodore.com> Date: 11 Apr 90 02:27:26 GMT References: <1990Mar30.222138.13886@jarvis.csri.toronto.edu> <719@optis31.UUCP> <10758@portia.Stanford.EDU> Reply-To: daveh@cbmvax (Dave Haynie) Organization: Commodore, West Chester, PA Lines: 77 In article <10758@portia.Stanford.EDU> dhinds@portia.Stanford.EDU (David Hinds) writes: >In article <719@optis31.UUCP>, zepf@optis31.UUCP (Tom Zepf) writes: >> What PC advertisements mean by "zero wait states" is really "zero IBM-XT >> wait states". If you do a few simple calculations, you can see that >> 80-100 ns. DRAMS don't stand a chance of producing REAL zero wait state >> performance on anything like a 286 or 386. In fact, it is not clear to >> me that the cached memory PCs run with REAL zero wait states either! I don't know if the basis is really PC-XT, that sounds silly. But it does make lots of sense that these machines aren't running anything close to 0 wait state memory. First of all, if you had real 0WS memory, you could throw out any cache on a PC machine -- the only other advantage of a cache would be in a multiprocessing system. >So, a 16MHz processor has a cycle time of 62.5 ns, giving 125 ns for a >read access with no wait states. However, I think a "x ns" DRAM takes >"2x ns" for an access anyway, because the address lines are multiplexed >and are strobed on two successive clock cycles. Well, a DRAM has a cycle time roughly twice that of its row access time (what the "100ns" on a 100ns DRAM really means), but that's not based on the fact that addresses are multiplexed, it's based on another parameter called "precharge time". Simply put, you get data out of your 100ns DRAM 100ns after you strobe in the row address (assuming column times are all done correctly). Before you can next strobe in another address, you have to wait about 80ns. That's what the memory requires. Then reality sets in, and you have to figure how to design a system that gets addresses out and multiplexed fast enough. >With interleaving, one wait state is hidden by address pipelining. Interleaving can hide the precharge time, which does probably correspond to 1 wait state or more, depending on the system and the memory. But interleaving isn't perfect; it only goes faster if you're always addressing every other bank. In the worst case, it is no faster than non-interleaved memory. And it requires twice the number of devices and support logic. Marketroids may claim it's zero wait state memory, and it may even look that way sometimes, but it really isn't. Another cool way to achieve faster access from plain old memories is to use page-mode or static column parts. A 100ns or 80ns DRAM generally has a column access time of 30ns-50ns. Using static column parts really makes life simple, because it eliminates the need to build a column address strobe at a critical point. The additional complexity is of course in support logic that will detect a page hit (old row address == new row address). This can go much faster than a simple interleaved memory system, probably dropping normal access time by a few wait states. And it doesn't require any additional devices. The downside is that a page miss will run slower than the basic memory cycle, since the memory controller will have to fit a precharge cycle in during CPU access rather than CPU recycle time. > On the subject of RAM chip specifications, there is something funny >about quoting RAM speeds in multiples of 10 ns, with processor speeds in >integral MHz numbers. For example, I know that 100 ns memory is fast >enough for 16MHz with 0 wait states when interleaved. But this actually >requires like 94 ns memory, by my calculations. Is there some implicit >tolerance in the RAM chip speeds, like are they always rounded up to the >nearest 10 ns? The memory speeds are rather arbitrary. The DRAM rating really doesn't tell you that much anyway. In fact, you can find parts with the row address time the same, but other vital numbers considerably different. And the match between memory and any particular CPU depends on all kinds of factors -- CPU speed, CPU memory interface, special CPU access modes (eg, like "Burst" mode), wait state granularity, memory subsystem architecture, etc. > -David Hinds > dhinds@popserver.stanford.edu -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough