Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!apple!voder!pyramid!cbmvax!daveh
From: daveh@cbmvax.commodore.com (Dave Haynie)
Newsgroups: comp.arch
Subject: Re: "zero wait states"
Message-ID: <10772@cbmvax.commodore.com>
Date: 11 Apr 90 02:27:26 GMT
References: <1990Mar30.222138.13886@jarvis.csri.toronto.edu> <719@optis31.UUCP> <10758@portia.Stanford.EDU>
Reply-To: daveh@cbmvax (Dave Haynie)
Organization: Commodore, West Chester, PA
Lines: 77

In article <10758@portia.Stanford.EDU> dhinds@portia.Stanford.EDU (David Hinds) writes:
>In article <719@optis31.UUCP>, zepf@optis31.UUCP (Tom Zepf) writes:

>> What PC advertisements mean by "zero wait states" is really "zero IBM-XT
>> wait states". If you do a few simple calculations, you can see that
>> 80-100 ns. DRAMS don't stand a chance of producing REAL zero wait state
>> performance on anything like a 286 or 386. In fact, it is not clear to
>> me that the cached memory PCs run with REAL zero wait states either!

I don't know if the basis is really PC-XT, that sounds silly.  But it does
make lots of sense that these machines aren't running anything close to 
0 wait state memory.  First of all, if you had real 0WS memory, you could
throw out any cache on a PC machine -- the only other advantage of a 
cache would be in a multiprocessing system.  

>So, a 16MHz processor has a cycle time of 62.5 ns, giving 125 ns for a 
>read access with no wait states.  However, I think a "x ns" DRAM takes 
>"2x ns" for an access anyway, because the address lines are multiplexed 
>and are strobed on two successive clock cycles.  

Well, a DRAM has a cycle time roughly twice that of its row access time
(what the "100ns" on a 100ns DRAM really means), but that's not based on
the fact that addresses are multiplexed, it's based on another parameter
called "precharge time".  Simply put, you get data out of your 100ns DRAM
100ns after you strobe in the row address (assuming column times are all
done correctly).  Before you can next strobe in another address, you have
to wait about 80ns.  That's what the memory requires.  Then reality sets
in, and you have to figure how to design a system that gets addresses
out and multiplexed fast enough.  

>With interleaving, one wait state is hidden by address pipelining.  

Interleaving can hide the precharge time, which does probably correspond
to 1 wait state or more, depending on the system and the memory.  But 
interleaving isn't perfect; it only goes faster if you're always addressing 
every other bank.  In the worst case, it is no faster than non-interleaved
memory.  And it requires twice the number of devices and support logic.
Marketroids may claim it's zero wait state memory, and it may even look
that way sometimes, but it really isn't.

Another cool way to achieve faster access from plain old memories is to use
page-mode or static column parts.  A 100ns or 80ns DRAM generally has a 
column access time of 30ns-50ns.  Using static column parts really makes
life simple, because it eliminates the need to build a column address
strobe at a critical point.  The additional complexity is of course in
support logic that will detect a page hit (old row address == new row
address).  This can go much faster than a simple interleaved memory system,
probably dropping normal access time by a few wait states.  And it doesn't
require any additional devices.  The downside is that a page miss will
run slower than the basic memory cycle, since the memory controller will
have to fit a precharge cycle in during CPU access rather than CPU recycle
time.

>    On the subject of RAM chip specifications, there is something funny
>about quoting RAM speeds in multiples of 10 ns, with processor speeds in
>integral MHz numbers.  For example, I know that 100 ns memory is fast
>enough for 16MHz with 0 wait states when interleaved.  But this actually
>requires like 94 ns memory, by my calculations.  Is there some implicit
>tolerance in the RAM chip speeds, like are they always rounded up to the
>nearest 10 ns?

The memory speeds are rather arbitrary.  The DRAM rating really doesn't
tell you that much anyway.  In fact, you can find parts with the row
address time the same, but other vital numbers considerably different.
And the match between memory and any particular CPU depends on all kinds
of factors -- CPU speed, CPU memory interface, special CPU access modes
(eg, like "Burst" mode), wait state granularity, memory subsystem
architecture, etc.

> -David Hinds
>  dhinds@popserver.stanford.edu


-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough