Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!husc6!panda!genrad!decvax!mcnc!duke!srt
From: srt@duke.UUCP (Stephen R. Tate)
Newsgroups: net.arch
Subject: Re: VERY LARGE main memories
Message-ID: <8546@duke.duke.UUCP>
Date: Wed, 10-Sep-86 13:07:55 EDT
Article-I.D.: duke.8546
Posted: Wed Sep 10 13:07:55 1986
Date-Received: Thu, 11-Sep-86 06:00:33 EDT
References: <2017@sdcsvax.UUCP> <884@gilbbs.UUCP> <289@petrus.UUCP> <12930@amdcad.UUCP>
Organization: Duke University CS Dept.; Durham, NC
Lines: 42
Summary: Decoding isn't the problem (but buffering is)

In article <12930@amdcad.UUCP>, philip@amdcad.UUCP (Philip Freidin) writes:
> Unfortunately, at this point I would like to apply some reality to the
> discussion. Rather than talk about your 40 bit address memories, lets
> look at something trivial: 64kw. this needs 16 bits of address. With
> your 2 level decode (one of inverters, and the second of and gates to
> do word select) you have 32 address select lines coming into the second
> level, address and address complement. each of these must drive 32k and
> gates!  I dont know of any logic familly with a drive capability to support
> that type of load. Your typical ttl has a drive capability of from 10 to 20
> loads.  Also, another fly in your fast decode ointment is that the way and
> gates are implemented in many logic families precludes building a 16 input
> and gate as a single level. Cmos is limited to about 4 levels, and TTL and
> ECL have similar limits. To build bigger and gates, you end up with a tree
> structure inside your and gate.
> 
> --Philip Freidin

First off, I was talking about decoding *bank* addresses, not individual
word addresses.  If you wanted 1GB of memory, and used 1Mb chips, you would
have, say, 256 banks of 1Mb x 32 bit words.  (If you have this much memory,
I hope memory accesses are done more than a word at a time, but ignore this
for now....)  Now that's only 8 bits for a bank address, and I have seen 8
input NAND gates.  (7430 or something like that....)  Each of these bank
address lines need only drive one input per bank (32 chips), which means
that they only have to drive 256 inputs.  Much less than your 32k figure,
but still unreasonable.  Obviously, the address lines need to be buffered.
Using TTL with a fanout of, say, 16, you only need one level of buffering
(since 16*16 = 256).  Now you're three levels deep for a propogation delay
of about 40-50ns.  Still not a terribly unreasonable time.

Anyway, another problem to consider is buffering all the address lines below
the bank address lines.  These have to be run to every chip, and in the
example above, there are 32*256 = 8192 chips in all.  You're going to have
to be real careful with buffering here.....   So it's not the decode
circuitry that takes time, it's the buffering for reasonable fan-out.
Incidentally, CMOS has a *huge* fanout.  That is, CMOS outputs to CMOS
inputs (no mixing).


-- 
Steve Tate			..!{ihnp4,decvax}!duke!srt