Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!ucsd!ucsdhub!celit!ps
From: ps@fps.com (Patricia Shanahan)
Newsgroups: comp.arch
Subject: Re: Cache Size
Keywords: garbage collection, locality of reference, cache size
Message-ID: <7021@celit.fps.com>
Date: 28 Feb 90 16:49:57 GMT
References: <7393@pdn.paradyne.com> <76700146@p.cs.uiuc.edu> <1990Feb26.022057.28461@Neon.Stanford.EDU> <8189@pt.cs.cmu.edu> <8848@boring.cwi.nl>
Sender: daemon@fps.com
Reply-To: ps@fps.com (Patricia Shanahan)
Organization: FPS Computing Inc., San Diego CA
Lines: 59

In article <8848@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
>In article <8189@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:
> > So, that's why most supercomputers seem to use vector register
> > files instead of caches for their vector units.
> > 
>Well, no (depends on your definition of supercomputer of course; let us
>assume vector processor).  There are systems without cache that use vector
>registers (Cray, NEC), or have memory to memory operations (Cyber 205).
>And there are processors with cache.  Possibilities are:
>1.  No vector registers, bypass cache (Cyber 995).
>2.  Vector registers, bypass cache (i know none).

The FPS Model 500 is in this category. The scalar processors use
caches for instructions and scalar data references. The vector processors
are connected to the System Memory Bus and fetch data from memory without
going through cache. 

The only interaction is that the data caches snoop on the bus, and stores
by the vector processors cause corresponding purges in the data caches.

>3.  No vector registers, through cache (again, i know none).
>4.  Vector registers, through cache (IBM 3090, Convex, Alliant, Gould).

I would add the VAX 9000 to this list, but question whether the Convex C2
belongs here. I think it is a vector registers and bypass cache system.

>So, no, vector registers are not a replacement for cache.
>-- 
>dik t. winter, cwi, amsterdam, nederland
>dik@cwi.nl

Vector registers and cache both perform the same basic function, of holding
data that is likely to be used in future calculations close to the arithmetic
unit that is going to do those calculations. They also serve to buffer between
the arithmetic unit's need for individual numbers, and the memory preference
for highly parallel processing of larger blocks.

Caches are very good when reference patterns are basically unpredictable and
recent past reference to a data item or something close to it in memory is
positively correlated with probability of near future reference. They work
by fetching a block surrounding a referenced item (because of spacial locality)
and keeping it around for a while to exploit temporal locality.

Vector memory references are not like that. The reference pattern is set at
compile time and known to the compiler. During a strided fetch there is no
spacial locality. There is frequently a strong negative correlation between
recent past reference and near future reference because of the large scale
sequential characteristics of vector processing. Any cache substantially
smaller than main memory can be flooded by some vector jobs. 

The very big manufacturers, especially IBM, are something of an exception
because they can persuade third party software vendors to do a lot of special
tuning for their systems. It is usually possible, with care, to arrange a
vector job so that it will fit well with a particular cache size.
--
	Patricia Shanahan
	ps@fps.com
        uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps
	phone: (619) 271-9940