Path: utzoo!mnetor!uunet!lll-winken!lll-crg.llnl.gov!brooks From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III) Newsgroups: comp.arch Subject: Re: SPARC and multiprocessing Message-ID: <6618@lll-winken.llnl.gov> Date: 30 Apr 88 22:12:09 GMT References: <1521@pt.cs.cmu.edu> <28200135@urbsdc> <4921@bloom-beacon.MIT.EDU> <1671@alliant.Alliant.COM> <51321@sun.uucp> <1680@alliant.Alliant.COM> Sender: usenet@lll-winken.llnl.gov Reply-To: brooks@lll-crg.llnl.gov.UUCP (Eugene D. Brooks III) Organization: Lawrence Livermore National Laboratory Lines: 33 In article <1680@alliant.Alliant.COM> jeff@alliant.UUCP (Jeff Collins) writes: > - Is there an announced version of the SPARC that allows time > between address-ready and data-ready to have an MMU before the > cache? > This is the key to bringing these RISC chips into the realm of real supercomputing and will of course happen as it will be driven by market pressures. You need to allow an "arbitrary" time between address-ready and data-ready to allow successful use in a multiprocessor environment where the latency of memory is more or less undetermined due to conflicts in the shared memory subsystem. Basically, part of the address ready lines include a tag which indentifies the request, and when the response to the request returns the copy of the tag that arrives with it allows the cpu to figure out what to do with the data. The number of tag bits limits the number of outstanding requests for the cpu. One is likely to sequence the tag bits in order for speed so the number of outstanding requests will be further limited by flucuations in arrival order of the responses. If the request gets satisfied by the cache it comes back with a low latency, but if it goes to main memory (shared memory in a multiprocessor) it might have a substantial latency. By allowing many requests to be pending at once one can get "no wait state" performance in the same sense that internal pipelining delivers "no wait state" performance for the internal cpu functions. The cpu must be able to efficiently handle the fact that requests come back out of order, which means that simple fifo's a la the WM machine won't do. Rest assured that some future SPARC, MOT88000, Clipper, ..., implementation will provide this capability by some means as it will be the only way to further increase performance in the face of the memory latency of a shared memory multiprocessor. Whether MIPS will pull this off is not clear, their basis design principle (religion) of not having harware interlocks would seem orthogonal to doing it, in short they will find that their basic design princible was wrong when they try to pipeline cache misses in a shared memory environment.