Path: utzoo!utgpu!water!watmath!clyde!rutgers!cmcl2!husc6!mailrus!umix!uunet!steinmetz!sunset!oconnor
From: oconnor@sunset.steinmetz (Dennis M. O'Connor)
Newsgroups: comp.arch
Subject: Re: target caching
Keywords: page mode
Message-ID: <9631@steinmetz.steinmetz.UUCP>
Date: 20 Feb 88 16:03:57 GMT
Sender: news@steinmetz.steinmetz.UUCP
Reply-To: sunset!oconnor@steinmetz.UUCP
Organization: GE Corporate R&D Center
Lines: 55

An article by lindsay@K.GP.CS.CMU.EDU (Donald Lindsay) says:
] The TF-1 people at IBM intend to use an interesting trick to simplify
] their CPU.

] DRAMs can be purchased that have "page mode" - that is, you can access the
] next-address value much more quickly than a randomly addressed value.  This
] is because each random access can leave a large number of bits in a long
] register (say, 1024 bits, in the case of a 1Mb RAM). A page-mode access just
] shifts the register. 
] So, the TF-1 CPU chip will expect another 32 bits of instruction every 20ns.
] As long as the PC just upcounts, they claim that page-mode RAMs will be fast
] enough.
] When the CPU decides to branch, of course, there's trouble. They solve this
] by keeping a cache of the instruction streams at 32 recent branch targets.
] If the target PC hits, then they fetch instructions from the cached stream,
] until the RAMs have done their random access, and are ready to page-mode
] again.

Well, it's may be interesting but it's not original. GE's own RPM40
already does this ( but better (IMHO) than you describe ), and I believe
the AMD29000 gives you the CHOICE of doing something like this.

That memory system is not going to be simple, by the way :
branches are not your ownly problem. You need to handle crossing
page boundaries in your RAM as well. But that's doable.

As described, it's also not going to be Rad-Hard. Dynamic never is.

] I haven't studied the recent RAM offerings well enough to count the cycles,
] and critique the speed expectations. I guess it sounds fine, and it does
] sound simple. But, there's a major catch: it's a Harvard architecture.  The
] memory is code-only, so that grubby data won't spoil the code's pipelined
] perfection.

(Humor mode on) That's not a catch, that's a FEATURE! (HM off).
Seriously folks, at 200MBytes/sec of JUST instruction fetch,
you weren't thinking of sharing that nice, simple, unidirectional
instruction bus with messy old bi-directional data, were you?

] I know that some recent RAM chips are dual-ported, supposedly so that a
] processor can write image data through the random port, while a graphics
] screen is being refreshed through the page-mode port. Would these chips
] allow the TF-1 trick to work in non-Harvard designs ? 

No. The "TF-1 trick" (which was the "RPM40 trick" and the "29000
trick" FIRST, BTW) needs a Harvard architecture, to provide sufficient
bandwidth and, more importantly, to separate nice regular simple
instruction-stream behavior from complex semi-random data access.
] -- 
] 	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science

--
	Dennis O'Connor 	oconnor@sunset.steinmetz.UUCP ??
				ARPA: OCONNORDM@ge-crd.arpa
    "Nuclear War is NOT the worst thing people can do to this planet."