Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!husc6!panda!genrad!decvax!mcnc!duke!srt From: srt@duke.UUCP (Stephen R. Tate) Newsgroups: net.arch Subject: Re: VERY LARGE main memories Message-ID: <8546@duke.duke.UUCP> Date: Wed, 10-Sep-86 13:07:55 EDT Article-I.D.: duke.8546 Posted: Wed Sep 10 13:07:55 1986 Date-Received: Thu, 11-Sep-86 06:00:33 EDT References: <2017@sdcsvax.UUCP> <884@gilbbs.UUCP> <289@petrus.UUCP> <12930@amdcad.UUCP> Organization: Duke University CS Dept.; Durham, NC Lines: 42 Summary: Decoding isn't the problem (but buffering is) In article <12930@amdcad.UUCP>, philip@amdcad.UUCP (Philip Freidin) writes: > Unfortunately, at this point I would like to apply some reality to the > discussion. Rather than talk about your 40 bit address memories, lets > look at something trivial: 64kw. this needs 16 bits of address. With > your 2 level decode (one of inverters, and the second of and gates to > do word select) you have 32 address select lines coming into the second > level, address and address complement. each of these must drive 32k and > gates! I dont know of any logic familly with a drive capability to support > that type of load. Your typical ttl has a drive capability of from 10 to 20 > loads. Also, another fly in your fast decode ointment is that the way and > gates are implemented in many logic families precludes building a 16 input > and gate as a single level. Cmos is limited to about 4 levels, and TTL and > ECL have similar limits. To build bigger and gates, you end up with a tree > structure inside your and gate. > > --Philip Freidin First off, I was talking about decoding *bank* addresses, not individual word addresses. If you wanted 1GB of memory, and used 1Mb chips, you would have, say, 256 banks of 1Mb x 32 bit words. (If you have this much memory, I hope memory accesses are done more than a word at a time, but ignore this for now....) Now that's only 8 bits for a bank address, and I have seen 8 input NAND gates. (7430 or something like that....) Each of these bank address lines need only drive one input per bank (32 chips), which means that they only have to drive 256 inputs. Much less than your 32k figure, but still unreasonable. Obviously, the address lines need to be buffered. Using TTL with a fanout of, say, 16, you only need one level of buffering (since 16*16 = 256). Now you're three levels deep for a propogation delay of about 40-50ns. Still not a terribly unreasonable time. Anyway, another problem to consider is buffering all the address lines below the bank address lines. These have to be run to every chip, and in the example above, there are 32*256 = 8192 chips in all. You're going to have to be real careful with buffering here..... So it's not the decode circuitry that takes time, it's the buffering for reasonable fan-out. Incidentally, CMOS has a *huge* fanout. That is, CMOS outputs to CMOS inputs (no mixing). -- Steve Tate ..!{ihnp4,decvax}!duke!srt