Path: utzoo!attcan!uunet!seismo!sundc!pitstop!texsun!texsun.central.sun.com!convex!authorplaceholder From: gruger@convex.UUCP Newsgroups: comp.arch Subject: Re: SPARC and multiprocessing (larg Message-ID: <63900015@convex> Date: 11 May 88 19:04:00 GMT References: <2710@geac.UUCP> Lines: 44 Nf-ID: #R:geac.UUCP:-271000:convex:63900015:000:2116 Nf-From: convex.UUCP!gruger May 11 14:04:00 1988 >/* Written 9:32 pm May 10, 1988 by bartlett@encore.Sun.COM in convex:comp.arch */ > >This sounds easy, but every time I have anaylzed this I have come to the >conclusion that one of those tag stores has to be fully associative, to insure >that the two tag stores will always have the same addresses allocated. Am I >missing something here? > >In our systems, we can't afford the realestate for a fully associative tag store >for each processor cache. > Maybe there is some semantics problem here in the communication... Which "two tag stores" are you referring to? By "fully associative" I take it you mean a true content addressable memory for the entire tag memory?? I believe you only need to have high associativity when searching through multiple sets of your data cache. Our cache structures consist of: a) a virtually addressed data RAM b) a virtually addressed validity RAM c) a physically addressed tag RAM The tag ram is only written as read data returns to the cache. The tag ram is read only as remote processor writes occur, and if there is a hit, the validity bits are cleared. All these RAMs certainly take up a lot of space (although we manage to pull a lot inside gate arrays). We also reached the conclusion that we could not afford the real estate of multiple cache sets and the increased complexity/cost/low-pay-back. Prior responses have discussed the increasing complexity of the tag RAM as your data cache gets deeper - you have to have search more than just one tag as your cache size increases beyond the page size. We have found it quite effective in a _vector_processing_ machine to have a fairly small cache equal to page size for scalar operands only. We bypass vector operands around the cache and invalidate entries if a vector load/store encounters one. There is NO performance improvement in running vector data through a cache if you have enough basic bandwidth (which not all parallel-vector machines have). Large caches best serve plain old scalar machines that have to stuff entire data arrays into cache in order to achieve performance. Jeff Gruger