Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mailrus!usenet.ins.cwru.edu!mephisto!bbn!bbn.com!lkaplan
From: lkaplan@bbn.com (Larry Kaplan)
Newsgroups: comp.arch
Subject: Re: Coherent cache for Killer Micros
Message-ID: <49446@bbn.COM>
Date: 9 Dec 89 21:19:53 GMT
References: <13910002@hpisod2.HP.COM> <40110@lll-winken.LLNL.GOV> <4095@amelia.nas.nasa.gov>
Sender: news@bbn.COM
Reply-To: lkaplan@BBN.COM (Larry Kaplan)
Organization: Bolt Beranek and Newman Inc., Cambridge MA
Lines: 47

In article <4095@amelia.nas.nasa.gov> eugene@orville.nas.nasa.gov (Eugene Miya) writes:
>>In article <13910002@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner) writes:
>>>Is a coherent cache system essential or just nice?  And why?
>
>Depends if you want "consistent" results.
>
>If you can rely on heuristics like asynchronous chaotic relaxation
>(do you really want to call this an algorithm? [Baudet 1980] then NO.
>Otherwise, I think most people will answer YES (for consistent results).
>
>--eugene miya

Let's take a little more serious attitude to this question.  Coherent
caches are certainly necessary if one wants correct results for the majority
of parallel programs if cacheing of shared data is desired.  

The real question is whether the coherent cache support
MUST be in hardware or whether a software solution is acceptable.  In the
context of KILLER MICROS, bus based systems, and therefore the related
bus-based hardware consistency schemes, seem to run out of bandwidth after
about 30 processors or so.  Looking at a more scalable architecture,
such as a shuffle-exchange network, broadcast invalidates tend to flood
the interconnection network.  Directory based schemes that try and keep track of
who has what in their cache thereby allowing selective invalidates, are
the best I've heard of so far but they are still very expensive to implemented.
Lots of other research is going on to see how to do this job cheaply enough
to make it commercially viable.  I know of no currently released hardware cache
consistent machines larger than 30+ processors.

Software schemes can be constructed based on invalidating and locking and then
unlocking and flushing shared data structures.  These schemes are good because
they cause no more memory traffic (or interconnection network traffic) than
is necessary.  Making the programmer use these schemes causes some extra coding
effort though I could imagine adding compiler support for a new storage class
that would automatically handle the protocol.  No additional hardware is needed
other than cheap synchronizing locks and the ability to explicitly invalidate
and flush the cache.

BBN is currently recommending a software based scheme for users that want
to cache shared data on the TC2000.  The kernel default caching policy is to 
not cache shared data, though the user may override this.

_______________________________________________________________________________
				 ____ \ / ____
Laurence S. Kaplan		|    \ 0 /    |		BBN Advanced Computers
lkaplan@bbn.com			 \____|||____/		10 Fawcett St.
(617) 873-2431			  /__/ | \__\		Cambridge, MA  02238