Path: utzoo!attcan!uunet!snorkelwacker!apple!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Cache Line Fills -- Critical Word First
Message-ID: <41895@mips.mips.COM>
Date: 3 Oct 90 16:57:16 GMT
References: <34275@cup.portal.com> <14780@cbmvax.commodore.com> <41856@mips.mips.COM> <1990Oct3.140725.3931@mozart.amd.com>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Inc.
Lines: 71

In article <1990Oct3.140725.3931@mozart.amd.com> tim@amd.com (Tim Olson) writes:
>In article <41856@mips.mips.COM> mash@mips.COM (John Mashey) writes:
...
>There are also other possibilities, such as:

>	I4) Have a valid bit per word in the cache block, and fetch
>	    the missed instruction first, then burst reload continuing
>	    from that instruction into subsequent blocks, rather than
>	    wrapping around to complete the missed block.

>This tends to match instruction fetch patterns better than the other
>solutions, but with the added expense of extra valid bits and more
>complexity.

Recall that the R3000 uses block refill, but has 1 valid bit
per word of cache, so that, for example, stores can store 1 word
thru the cache, and only affect that word.  (The story might be
different if they were write-back caches, of course.)

Hence, it would have been fairly easy to implement the proposed I4.
Of course, we simulated that one, a long time ago.

Unfortunately....
	a) It turns out that if you I-cache-miss in word N of an M-word cache
	block, there is a fairly high probability that you will want
	word(s) earlier in the cache block fairly soon.  (For instance,
	a conditional branch inside a loop make take the branch the first
	time, then fall thru the next.)

	b) So, the cache miss penalty for I2 or I3 =
	access time + M

	c) The cache miss penalty for I4 =
	access time + N 
	+ penalty for refetches that I2/I3 don't do.

	This last term is nontrivial, i.e., with this scheme, every time
	you have a cache miss, you fetch from there to the end of the block,
	so it depends on what order things hit in.  Let's try the simplest
	case, and the most likely, which is to assume that the next miss
	is to word 0.  hence, the I4 penalty would be:
	access time + N
	+ Pr(touch word 0 "soon") * (access time + N )

	a gross rule of thumb is that access time and refill time should
	be about the same (this isn't exactly right, but close enough for this).
	If you assume N = M/2 on the average:
	I4: A + M/2  + Pr(0) * (A + M)
	I2/I3: A + M 

	Difference (how much worse) between I4 & I2/I3:
	diff = A + M/2 + Pr(0)*(A+M) - A - M =
	diff = Pr(0)*(A+M) - M/2
	assuming A == M, we get:
	diff = Pr(0)*(M+M) - M/2
	diff is > 0 (i.e., I4 is worse), if Pr(0) > 25%,

	that is, if you jump into the middle of a block, and come back
	and hit the beginning of the block again, before I2/I3 would
	have replaced the block, and you do this >25% of the time,
	then the proposed I4 is worse.

This is, of course, an over-simplified analysis, as the only way to really
do it is to simulate it heavily, and it may well work in some environments.
I don't know what the percentage was, but it was high enough for us to
make I4 less good than I2.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086