Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!lll-winken!vette!brooks
From: brooks@vette.llnl.gov (Eugene Brooks)
Newsgroups: comp.arch
Subject: Re: i860 CPU information
Message-ID: <21984@lll-winken.LLNL.GOV>
Date: 16 Mar 89 00:27:38 GMT
References: <1895@oakhill.UUCP> <21570@shemp.CS.UCLA.EDU> <3024@alliant.Alliant.COM> <222@ross.UUCP>
Sender: usenet@lll-winken.LLNL.GOV
Reply-To: brooks@maddog.llnl.gov (Eugene Brooks)
Organization: Lawrence Livermore National Laboratory
Lines: 49

In article <222@ross.UUCP> doug@ross.UUCP (doug carmean) writes:
With regard to a write through explicitly managed cache system.
>It seems to me that you are proposing a multiprocessing system that 
>implements a cache but never actually uses the cache.  Your cache 
>scheme uses write through and then forces misses on the reads.  Why
>even bother implementing a D-cache?
You don't force misses on all the reads.  You force misses only on
reads in your shared memory parallel program for which you know are data
communicated from another processor.  As an example, you can consider
a parallel linear system solver using Gauss Elimination.  It is quite
easy to write a parallel algorithm which explicitly manages the cache.
The last row of the matrix is reused N times, where N is the matrix
dimension, before it is communicated to the rest of the processors.
Even explicit cache flushing for the communication could be used, but
the cost of doing this is horrible context switching overhead for cache
sizes large enough to be useful.  Your notion of including a context descriptor
in the cache line is useful for this, but one will still pay a cost when
you need to clear the cache of a specific descriptor upon process death.
At least, it is a better situation than having to clear the cache on every
context switch.
>I think what you really mean here is that using such a cache system
>in an application with heavy context switching is not really much
>of an issue - you would never want to do it!  From what I understand
>of the i860, you must flush the I-cache, the D-cache and the TLB on
>a context switch.  This seems like a very big penalty to pay every
>single time you want to switch contexts.
Agreed, or as in the i860, the size of the cache you are willing to
treat in this manner would be quite small.

>other than the solution you have presented here.  A virtual cache 
>that implements a copyback scheme with bus snooping is very
>feasible in a multprocessing environment.
No doubt there are several ways to do this.  I made no claim that you
couldn't. I only indicated what one might want to do for a multiprocessor
using a multistage interconnection network to the memory modules where
snooping might not be that practical of an option.  Microprocessor speeds
are cranking up to the point that the number you could hang on a bus will
be very limited, even with the very best of write back coherent cache
protocols.  The fact that the whole processor, memory management, cache,
etc, is now appearing on one chip is making systems with large numbers of
processors VERY feasable.  The processors are free and its the memory subsystem
which will cost the bucks.  This will likely drive the commercial development
of scalable shared memory systems soon.  Scalable message passing systems,
of course, are already here.


Is the news software incompatible with your mailer too?
brooks@maddog.llnl.gov, brooks@maddog.uucp, uunet!maddog.llnl.gov!brooks