Path: utzoo!mnetor!uunet!lll-winken!lll-crg.llnl.gov!brooks
From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III)
Newsgroups: comp.arch
Subject: Re: SPARC and multiprocessing
Message-ID: <6618@lll-winken.llnl.gov>
Date: 30 Apr 88 22:12:09 GMT
References: <1521@pt.cs.cmu.edu> <28200135@urbsdc> <4921@bloom-beacon.MIT.EDU> <1671@alliant.Alliant.COM> <51321@sun.uucp> <1680@alliant.Alliant.COM>
Sender: usenet@lll-winken.llnl.gov
Reply-To: brooks@lll-crg.llnl.gov.UUCP (Eugene D. Brooks III)
Organization: Lawrence Livermore National Laboratory
Lines: 33

In article <1680@alliant.Alliant.COM> jeff@alliant.UUCP (Jeff Collins) writes:
>	    - Is there an announced version of the SPARC that allows time
>	      between address-ready and data-ready to have an MMU before the
>	      cache?
>
This is the key to bringing these RISC chips into the realm of real
supercomputing and will of course happen as it will be driven by market
pressures.  You need to allow an "arbitrary" time between address-ready
and data-ready to allow successful use in a multiprocessor environment
where the latency of memory is more or less undetermined due to conflicts
in the shared memory subsystem.  Basically, part of the address ready lines
include a tag which indentifies the request, and when the response to the
request returns the copy of the tag that arrives with it allows the cpu to
figure out what to do with the data.  The number of tag bits limits the
number of outstanding requests for the cpu.  One is likely to sequence the
tag bits in order for speed so the number of outstanding requests will be
further limited by flucuations in arrival order of the responses.  If the
request gets satisfied by the cache it comes back with a low latency, but if
it goes to main memory (shared memory in a multiprocessor) it might have a
substantial latency.  By allowing many requests to be pending at once one
can get "no wait state" performance in the same sense that internal pipelining
delivers "no wait state" performance for the internal cpu functions.  The cpu
must be able to efficiently handle the fact that requests come back out of
order, which means that simple fifo's a la the WM machine won't do.


Rest assured that some future SPARC, MOT88000, Clipper, ..., implementation
will provide this capability by some means as it will be the only way to
further increase performance in the face of the memory latency of a shared
memory multiprocessor.  Whether MIPS will pull this off is not clear, their
basis design principle (religion) of not having harware interlocks would seem
orthogonal to doing it, in short they will find that their basic design
princible was wrong when they try to pipeline cache misses in a shared memory
environment.