Path: utzoo!attcan!uunet!snorkelwacker!apple!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Cache Line Fills -- Critical Word First Message-ID: <41895@mips.mips.COM> Date: 3 Oct 90 16:57:16 GMT References: <34275@cup.portal.com> <14780@cbmvax.commodore.com> <41856@mips.mips.COM> <1990Oct3.140725.3931@mozart.amd.com> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 71 In article <1990Oct3.140725.3931@mozart.amd.com> tim@amd.com (Tim Olson) writes: >In article <41856@mips.mips.COM> mash@mips.COM (John Mashey) writes: ... >There are also other possibilities, such as: > I4) Have a valid bit per word in the cache block, and fetch > the missed instruction first, then burst reload continuing > from that instruction into subsequent blocks, rather than > wrapping around to complete the missed block. >This tends to match instruction fetch patterns better than the other >solutions, but with the added expense of extra valid bits and more >complexity. Recall that the R3000 uses block refill, but has 1 valid bit per word of cache, so that, for example, stores can store 1 word thru the cache, and only affect that word. (The story might be different if they were write-back caches, of course.) Hence, it would have been fairly easy to implement the proposed I4. Of course, we simulated that one, a long time ago. Unfortunately.... a) It turns out that if you I-cache-miss in word N of an M-word cache block, there is a fairly high probability that you will want word(s) earlier in the cache block fairly soon. (For instance, a conditional branch inside a loop make take the branch the first time, then fall thru the next.) b) So, the cache miss penalty for I2 or I3 = access time + M c) The cache miss penalty for I4 = access time + N + penalty for refetches that I2/I3 don't do. This last term is nontrivial, i.e., with this scheme, every time you have a cache miss, you fetch from there to the end of the block, so it depends on what order things hit in. Let's try the simplest case, and the most likely, which is to assume that the next miss is to word 0. hence, the I4 penalty would be: access time + N + Pr(touch word 0 "soon") * (access time + N ) a gross rule of thumb is that access time and refill time should be about the same (this isn't exactly right, but close enough for this). If you assume N = M/2 on the average: I4: A + M/2 + Pr(0) * (A + M) I2/I3: A + M Difference (how much worse) between I4 & I2/I3: diff = A + M/2 + Pr(0)*(A+M) - A - M = diff = Pr(0)*(A+M) - M/2 assuming A == M, we get: diff = Pr(0)*(M+M) - M/2 diff is > 0 (i.e., I4 is worse), if Pr(0) > 25%, that is, if you jump into the middle of a block, and come back and hit the beginning of the block again, before I2/I3 would have replaced the block, and you do this >25% of the time, then the proposed I4 is worse. This is, of course, an over-simplified analysis, as the only way to really do it is to simulate it heavily, and it may well work in some environments. I don't know what the percentage was, but it was high enough for us to make I4 less good than I2. -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086