Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-winken!vette!brooks From: brooks@vette.llnl.gov (Eugene Brooks) Newsgroups: comp.arch Subject: Re: i860 CPU information Message-ID: <21984@lll-winken.LLNL.GOV> Date: 16 Mar 89 00:27:38 GMT References: <1895@oakhill.UUCP> <21570@shemp.CS.UCLA.EDU> <3024@alliant.Alliant.COM> <222@ross.UUCP> Sender: usenet@lll-winken.LLNL.GOV Reply-To: brooks@maddog.llnl.gov (Eugene Brooks) Organization: Lawrence Livermore National Laboratory Lines: 49 In article <222@ross.UUCP> doug@ross.UUCP (doug carmean) writes: With regard to a write through explicitly managed cache system. >It seems to me that you are proposing a multiprocessing system that >implements a cache but never actually uses the cache. Your cache >scheme uses write through and then forces misses on the reads. Why >even bother implementing a D-cache? You don't force misses on all the reads. You force misses only on reads in your shared memory parallel program for which you know are data communicated from another processor. As an example, you can consider a parallel linear system solver using Gauss Elimination. It is quite easy to write a parallel algorithm which explicitly manages the cache. The last row of the matrix is reused N times, where N is the matrix dimension, before it is communicated to the rest of the processors. Even explicit cache flushing for the communication could be used, but the cost of doing this is horrible context switching overhead for cache sizes large enough to be useful. Your notion of including a context descriptor in the cache line is useful for this, but one will still pay a cost when you need to clear the cache of a specific descriptor upon process death. At least, it is a better situation than having to clear the cache on every context switch. >I think what you really mean here is that using such a cache system >in an application with heavy context switching is not really much >of an issue - you would never want to do it! From what I understand >of the i860, you must flush the I-cache, the D-cache and the TLB on >a context switch. This seems like a very big penalty to pay every >single time you want to switch contexts. Agreed, or as in the i860, the size of the cache you are willing to treat in this manner would be quite small. >other than the solution you have presented here. A virtual cache >that implements a copyback scheme with bus snooping is very >feasible in a multprocessing environment. No doubt there are several ways to do this. I made no claim that you couldn't. I only indicated what one might want to do for a multiprocessor using a multistage interconnection network to the memory modules where snooping might not be that practical of an option. Microprocessor speeds are cranking up to the point that the number you could hang on a bus will be very limited, even with the very best of write back coherent cache protocols. The fact that the whole processor, memory management, cache, etc, is now appearing on one chip is making systems with large numbers of processors VERY feasable. The processors are free and its the memory subsystem which will cost the bucks. This will likely drive the commercial development of scalable shared memory systems soon. Scalable message passing systems, of course, are already here. Is the news software incompatible with your mailer too? brooks@maddog.llnl.gov, brooks@maddog.uucp, uunet!maddog.llnl.gov!brooks