Path: utzoo!attcan!uunet!seismo!sundc!pitstop!texsun!texsun.central.sun.com!convex!authorplaceholder
From: gruger@convex.UUCP
Newsgroups: comp.arch
Subject: Re: SPARC and multiprocessing (larg
Message-ID: <63900015@convex>
Date: 11 May 88 19:04:00 GMT
References: <2710@geac.UUCP>
Lines: 44
Nf-ID: #R:geac.UUCP:-271000:convex:63900015:000:2116
Nf-From: convex.UUCP!gruger    May 11 14:04:00 1988


>/* Written  9:32 pm  May 10, 1988 by bartlett@encore.Sun.COM in convex:comp.arch */
>
>This sounds easy, but every time I have anaylzed this I have come to the 
>conclusion that one of those tag stores has to be fully associative, to insure
>that the two tag stores will always have the same addresses allocated.  Am I
>missing something here?  
>
>In our systems, we can't afford the realestate for a fully associative tag store
>for each processor cache.
>

Maybe there is some semantics problem here in the communication...
Which "two tag stores" are you referring to?   By "fully associative"
I take it you mean a true content addressable memory for the entire tag 
memory??  I believe you only need to have high associativity when
searching through multiple sets of your data cache.

Our cache structures consist of:
	a) a virtually addressed data RAM
	b) a virtually addressed validity RAM 
	c) a physically addressed tag RAM
The tag ram is only written as read data returns to the cache.  The 
tag ram is read only as remote processor writes occur, and if there
is a hit, the validity bits are cleared.

All these RAMs certainly take up a lot of space (although we manage
to pull a lot inside gate arrays).  We also reached the conclusion
that we could not afford the real estate of multiple cache sets and
the increased complexity/cost/low-pay-back.

Prior responses have discussed the increasing complexity of the tag RAM
as your data cache gets deeper - you have to have search more than just
one tag as your cache size increases beyond the page size.  We have found
it quite effective in a _vector_processing_ machine to have a fairly
small cache equal to page size for scalar operands only.  We bypass vector 
operands around the cache and invalidate entries if a vector load/store
encounters one.  There is NO performance improvement in running
vector data through a cache if you have enough basic bandwidth (which
not all parallel-vector machines have).   Large caches best serve plain
old scalar machines that have to stuff entire data arrays into cache
in order to achieve performance.

Jeff Gruger