Path: utzoo!utgpu!water!watmath!clyde!rutgers!cmcl2!husc6!mailrus!umix!uunet!steinmetz!sunset!oconnor From: oconnor@sunset.steinmetz (Dennis M. O'Connor) Newsgroups: comp.arch Subject: Re: target caching Keywords: page mode Message-ID: <9631@steinmetz.steinmetz.UUCP> Date: 20 Feb 88 16:03:57 GMT Sender: news@steinmetz.steinmetz.UUCP Reply-To: sunset!oconnor@steinmetz.UUCP Organization: GE Corporate R&D Center Lines: 55 An article by lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) says: ] The TF-1 people at IBM intend to use an interesting trick to simplify ] their CPU. ] DRAMs can be purchased that have "page mode" - that is, you can access the ] next-address value much more quickly than a randomly addressed value. This ] is because each random access can leave a large number of bits in a long ] register (say, 1024 bits, in the case of a 1Mb RAM). A page-mode access just ] shifts the register. ] So, the TF-1 CPU chip will expect another 32 bits of instruction every 20ns. ] As long as the PC just upcounts, they claim that page-mode RAMs will be fast ] enough. ] When the CPU decides to branch, of course, there's trouble. They solve this ] by keeping a cache of the instruction streams at 32 recent branch targets. ] If the target PC hits, then they fetch instructions from the cached stream, ] until the RAMs have done their random access, and are ready to page-mode ] again. Well, it's may be interesting but it's not original. GE's own RPM40 already does this ( but better (IMHO) than you describe ), and I believe the AMD29000 gives you the CHOICE of doing something like this. That memory system is not going to be simple, by the way : branches are not your ownly problem. You need to handle crossing page boundaries in your RAM as well. But that's doable. As described, it's also not going to be Rad-Hard. Dynamic never is. ] I haven't studied the recent RAM offerings well enough to count the cycles, ] and critique the speed expectations. I guess it sounds fine, and it does ] sound simple. But, there's a major catch: it's a Harvard architecture. The ] memory is code-only, so that grubby data won't spoil the code's pipelined ] perfection. (Humor mode on) That's not a catch, that's a FEATURE! (HM off). Seriously folks, at 200MBytes/sec of JUST instruction fetch, you weren't thinking of sharing that nice, simple, unidirectional instruction bus with messy old bi-directional data, were you? ] I know that some recent RAM chips are dual-ported, supposedly so that a ] processor can write image data through the random port, while a graphics ] screen is being refreshed through the page-mode port. Would these chips ] allow the TF-1 trick to work in non-Harvard designs ? No. The "TF-1 trick" (which was the "RPM40 trick" and the "29000 trick" FIRST, BTW) needs a Harvard architecture, to provide sufficient bandwidth and, more importantly, to separate nice regular simple instruction-stream behavior from complex semi-random data access. ] -- ] Don lindsay@k.gp.cs.cmu.edu CMU Computer Science -- Dennis O'Connor oconnor@sunset.steinmetz.UUCP ?? ARPA: OCONNORDM@ge-crd.arpa "Nuclear War is NOT the worst thing people can do to this planet."