Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mailrus!usenet.ins.cwru.edu!mephisto!bbn!bbn.com!lkaplan From: lkaplan@bbn.com (Larry Kaplan) Newsgroups: comp.arch Subject: Re: Coherent cache for Killer Micros Message-ID: <49446@bbn.COM> Date: 9 Dec 89 21:19:53 GMT References: <13910002@hpisod2.HP.COM> <40110@lll-winken.LLNL.GOV> <4095@amelia.nas.nasa.gov> Sender: news@bbn.COM Reply-To: lkaplan@BBN.COM (Larry Kaplan) Organization: Bolt Beranek and Newman Inc., Cambridge MA Lines: 47 In article <4095@amelia.nas.nasa.gov> eugene@orville.nas.nasa.gov (Eugene Miya) writes: >>In article <13910002@hpisod2.HP.COM> dhepner@hpisod2.HP.COM (Dan Hepner) writes: >>>Is a coherent cache system essential or just nice? And why? > >Depends if you want "consistent" results. > >If you can rely on heuristics like asynchronous chaotic relaxation >(do you really want to call this an algorithm? [Baudet 1980] then NO. >Otherwise, I think most people will answer YES (for consistent results). > >--eugene miya Let's take a little more serious attitude to this question. Coherent caches are certainly necessary if one wants correct results for the majority of parallel programs if cacheing of shared data is desired. The real question is whether the coherent cache support MUST be in hardware or whether a software solution is acceptable. In the context of KILLER MICROS, bus based systems, and therefore the related bus-based hardware consistency schemes, seem to run out of bandwidth after about 30 processors or so. Looking at a more scalable architecture, such as a shuffle-exchange network, broadcast invalidates tend to flood the interconnection network. Directory based schemes that try and keep track of who has what in their cache thereby allowing selective invalidates, are the best I've heard of so far but they are still very expensive to implemented. Lots of other research is going on to see how to do this job cheaply enough to make it commercially viable. I know of no currently released hardware cache consistent machines larger than 30+ processors. Software schemes can be constructed based on invalidating and locking and then unlocking and flushing shared data structures. These schemes are good because they cause no more memory traffic (or interconnection network traffic) than is necessary. Making the programmer use these schemes causes some extra coding effort though I could imagine adding compiler support for a new storage class that would automatically handle the protocol. No additional hardware is needed other than cheap synchronizing locks and the ability to explicitly invalidate and flush the cache. BBN is currently recommending a software based scheme for users that want to cache shared data on the TC2000. The kernel default caching policy is to not cache shared data, though the user may override this. _______________________________________________________________________________ ____ \ / ____ Laurence S. Kaplan | \ 0 / | BBN Advanced Computers lkaplan@bbn.com \____|||____/ 10 Fawcett St. (617) 873-2431 /__/ | \__\ Cambridge, MA 02238