Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!bloom-beacon!think!ames!hao!boulder!sunybcs!bingvaxu!leah!itsgw!batcomputer!pyramid!prls!mips!mash From: mash@mips.UUCP (John Mashey) Newsgroups: comp.arch Subject: Re: Horizontal pipelining Message-ID: <986@winchester.UUCP> Date: Tue, 24-Nov-87 16:06:09 EST Article-I.D.: winchest.986 Posted: Tue Nov 24 16:06:09 1987 Date-Received: Sat, 28-Nov-87 07:37:19 EST References: <201@PT.CS.CMU.EDU> <388@sdcjove.CAM.UNISYS.COM> <988@edge.UUCP> <958@winchester.UUCP> <11444@sci.UUCP> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 51 In article <11444@sci.UUCP> kenm@sci.UUCP (Ken McElvain) writes: >In article <958@winchester.UUCP>, mash@mips.UUCP (John Mashey) writes: >I agree that cache [or TLB] hit rates will almost certainly go down. >However, miss penalties will also drop. It is quite possible that >a cache fill could happen in the time it takes for the barrel >to turn around. >A ten stage barrel processor running at 25Mhz would easily allow >over 300ns for a cache fill before it cost another instruction slot. >The performance limit here is likely to be the bandwidth of the >cache fill mechanism. ^^^^^^^^^^^^^^^^^^^^^^ yes. I believe that there is more interference than you might think, although it would be nice to see simulation numbers, since I don't have any. Let's try a few quick assumptions. Assume we're using split I & D caches. Assume that the cache line is N words long, filled 1 word/cycle after a latency of L cycles. One would expect that efficient cache designs have L <= N. When filling an I-cache miss, you can do L more barrel slots, then you must stall for N slots (or equivalent), because it doesn't make sense to have the I-cache run faster than the chip (if it did, you would run the chip faster). Putting an I-cache on the chip just moves the problem around. Assuming L <= N, this says that when you hit an I-cache miss, you get at most 50% of the total refill time (L+N) that you can actually initiate new instructions. D-cache refill is a little less painful, in that only 30% of the instructions are loads/stores (on our systems, but typical), so that you don't block, or skip something, until you hit a load/store. I'm not sure what you do when you execute something that causes a cache miss while you're already in a cache miss, maybe just block. Of course, I & D cache refills run into each other, and if you're using write-thru caches, writes run into refills also. These numbers seem to indicate that maybe 2-way barrel might be possible, but much above that, very little benefit can be gained from overlapping cache refill with execution. >Another issue is the instruction set. It's not clear that you want >a bunch of registers. It may be much better to do more of a memory >to memory architecture. (I would recommend keeping some base registers). >A number of other areas also have some surprising tradeoffs. -----------------------------------^^^^ Please explain some more. Note that in a memory-memory architecture, what I said about 30% load/store above also gets worse. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086