Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!asuvax!ncar!mephisto!udel!udccvax1!mccalpin From: mccalpin@vax1.acs.udel.EDU (John D Mccalpin) Newsgroups: comp.arch Subject: Re: Cache Size Keywords: garbage collection, locality of reference, cache size Message-ID: <5791@udccvax1.acs.udel.EDU> Date: 28 Feb 90 18:23:37 GMT References: <7393@pdn.paradyne.com> <76700146@p.cs.uiuc.edu> <1990Feb26.022057.28461@Neon.Stanford.EDU> <8189@pt.cs.cmu.edu> <8848@boring.cwi.nl> Reply-To: mccalpin@vax1.acs.udel.EDU (John D Mccalpin) Organization: College of Marine Studies, Univ. of Delaware Lines: 41 In article <8848@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes: >In article <8189@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes: > > So, that's why most supercomputers seem to use vector register > > files instead of caches for their vector units. > > >Well, no (depends on your definition of supercomputer of course; let us >assume vector processor). There are systems without cache that use vector >registers (Cray, NEC), or have memory to memory operations (Cyber 205). >And there are processors with cache. Possibilities are: >1. No vector registers, bypass cache (Cyber 995). >2. Vector registers, bypass cache (i know none). >3. No vector registers, through cache (again, i know none). >4. Vector registers, through cache (IBM 3090, Convex, Alliant, Gould). >So, no, vector registers are not a replacement for cache. >-- >dik t. winter, cwi, amsterdam, nederland >dik@cwi.nl I believe that the Convex C-2 series bypass the cache to load/store the vector registers, so it should be in category 2, not category 4. The ETA-10 is a special case different than any of the above categories. It had no vector registers (being a memory-to-memory machine), but it had a set of cache-like registers (the "short-stop" registers) that were used to cache only short vectors. I never got a definitive answer from any of my ETA friends about exactly how big this register set was used or exactly how the hardware decided to use it. I do know from direct observation that repeated access to short vectors showed a lower start-up cost than the equivalent long-vector operations. The impression that I got was that this use is different than category 3, since the short-stop registers were only used for vector operands. I think that short vectors were stuffed into these registers in parallel with their first use, and then they could be re-loaded directly from the short- stop registers if needed again soon. The output of the vector unit seems also to be loaded into the short-stop registers (if short enough) to make back-to-back vector operations run with slightly less overhead --- this is in the spirit of chaining. Corrections from informed sources welcome....