Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!amdcad!tim From: tim@amdcad.AMD.COM (Tim Olson) Newsgroups: comp.arch Subject: Re: Performance increase - a suggestion Message-ID: <19993@amdcad.AMD.COM> Date: 17 Jan 88 20:43:01 GMT References: <8843@steinmetz.steinmetz.UUCP> <221@imagine.PAWL.RPI.EDU> Reply-To: tim@amdcad.UUCP (Tim Olson) Organization: Advanced Micro Devices Lines: 31 Keywords: bandwidth datapath 128 In article <221@imagine.PAWL.RPI.EDU> userfe0e@mts.rpi.edu (George Kyriazis) writes: | You read 128 bits at a time, and | (assuming a 32-bit CPU) you feed the CPU with 32 bits at 4 times the | speed. You might as weel feed all this data in shift registers, and then | shift data out to the CPU. Assuming 64K*1 chips, you will need 128 chips | therefore 64K*128 = 8 megs. There are also 64K*4 chips, so minimum memory | can go down by a factor of 4. A quite feasible approach for a medium | sized computer (close to a micro). Or you can simply interleave the memories 4 ways, and reduce the path between memory and the instruction bus to 32 bits. | The only problem when doing that is jump instructions. Assume that memory | operates at its fastest possible speed. If you meet a jump instruction | in the middle of the 128-bit word, you'll have to (more or less) execute | all the rest up till the end of the fetch. Some RISC CPU's have done this | but for only one instruction. Can a compiler succesfully put 3 useful | instructions after the jump?? Maybe it sounds too cheap: "After any jump | the CPU executes at most three instructions after it". (Actually it turns | out that the jump has to be in the first of the 4 instructions in the 128-bit | word, so as the memory can get the right address.) It is very hard to effectively schedule more than 1 instruction after a delayed-branch, and even more prohibitive to force branches to occur only at 4-instruction boundaries. A solution to this problem is to throw away the instructions coming from memory, sourcing them instead from a cache while the instruction stream is restarted. This is what the Branch-Target Cache is for on the Am29000. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)