Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!amdcad!tim
From: tim@amdcad.AMD.COM (Tim Olson)
Newsgroups: comp.arch
Subject: Re: Performance increase - a suggestion
Message-ID: <19993@amdcad.AMD.COM>
Date: 17 Jan 88 20:43:01 GMT
References: <8843@steinmetz.steinmetz.UUCP> <221@imagine.PAWL.RPI.EDU>
Reply-To: tim@amdcad.UUCP (Tim Olson)
Organization: Advanced Micro Devices
Lines: 31
Keywords: bandwidth datapath 128

In article <221@imagine.PAWL.RPI.EDU> userfe0e@mts.rpi.edu (George Kyriazis) writes:
| You read 128 bits at a time, and
| (assuming a 32-bit CPU) you feed the CPU with 32 bits at 4 times the
| speed.  You might as weel feed all this data in shift registers, and then 
| shift data out to the CPU.  Assuming 64K*1 chips, you will need 128 chips
| therefore 64K*128 = 8 megs.  There are also 64K*4 chips, so minimum memory
| can go down by a factor of 4.  A quite feasible approach for a medium
| sized computer (close to a micro).

Or you can simply interleave the memories 4 ways, and reduce the path
between memory and the instruction bus to 32 bits.

|   The only problem when doing that is jump instructions.  Assume that memory
| operates at its fastest possible speed.  If you meet a jump instruction
| in the middle of the 128-bit word, you'll have to (more or less) execute
| all the rest up till the end of the fetch.  Some RISC CPU's have done this
| but for only one instruction.  Can a compiler succesfully put 3 useful
| instructions after the jump??  Maybe it sounds too cheap: "After any jump
| the CPU executes at most three instructions after it". (Actually it turns
| out that the jump has to be in the first of the 4 instructions in the 128-bit
| word, so as the memory can get the right address.) 

It is very hard to effectively schedule more than 1 instruction after a
delayed-branch, and even more prohibitive to force branches to occur
only at 4-instruction boundaries.  A solution to this problem is to
throw away the instructions coming from memory, sourcing them instead
from a cache while the instruction stream is restarted.  This is what
the Branch-Target Cache is for on the Am29000.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)