Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!bellcore!decvax!genrad!panda!talcott!harvard!seismo!hao!ames!eugene
From: eugene@ames.UUCP (Eugene Miya)
Newsgroups: net.arch
Subject: Re: "The Shared Memory Hypercube"  Do you smell any smoke?
Message-ID: <973@ames.UUCP>
Date: Mon, 6-May-85 19:05:16 EDT
Article-I.D.: ames.973
Posted: Mon May  6 19:05:16 1985
Date-Received: Thu, 9-May-85 02:28:23 EDT
References: <2132@sun.uucp> <1447@think.ARPA> <551@lll-crg.ARPA>
Distribution: net
Organization: NASA-Ames Research Center, Mtn. View, CA
Lines: 57

<1483@think.ARPA> <560@lll-crg.ARPA>

> >Some of my assumptions are:
> >   - Lots and lots and lots of small processors are better than fewer big
> > processors.
> A very bad assumption,  you want as many of the most COST EFFECTIVE
> processors
> . . .
> > I tend to think that it is possible to
> > get more MIPS per dollar by using smaller, cheaper processing elements.
> You get the most MIPS for your dollar by using the most COST EFFECTIVE
> processing elements.  These do not happen to be the smallest and cheapest.

To reinforce Eugene's comments:
We just had a SIG meeting with Joe Oliger (the CS chair at Stanford)
recently.  Joe came to the conclusion [during the course of thinking] that
"fewer, high performance CPU" proponents of multiprocessing had the
advantage in being able to fit reasonable portions of problems into
individual processor/memories.

Do not forget! We are not developing these machines in a vacuum.
We have to look at the applications which may be run on these machines.
Consider a 100 x 100 x 100 array with 30 variables (and increasing
as our known of the natural sciences increases).  I can barely fit
a fluid dynamics code on a 32-node Hypercube because the storage requirements
per CPU are fierce [a different example, not the 100^3x30 example].
If I am repeating myself from any earlier postings, sorry.  Jack Dennis
had to revise the way he thought dataflow machines need to be built
after two weeks here: he needs more memory and faster I/O.

If there is any one thing multiprocessors allow us to do, it is add yet
more memory.  Hum? :-)

> > What is the start up time for your vectors (i.e. how big does
> > a vector have to be before the vector processing part wins over the
> > scalar processing part.)

I have just tested this recently on four different CRAY architectures.
Short vector startup time is very good.  The vector registers are faster
than the scalar registers after you pass 3 elements.  I have a graph
that shows this.  [To: Eugene Brooks: I sent a copy of this graph to
Frank McMahon, the guy who developed the Livermore Loops, for scatter-
gather operations in hardware on the XMP/48.]  DON'T WORRY about
vector STARTUP on a CRAY, the difference is insignificant, you are
wasting your time optimizing this arena.  A 205 might be another story,
more later.

> > Typical vector processors are limited in their
> > speed by lack of memory bandwidth (this is true for a single processor
> > with high bandwidth memories (e.g. the CRAY uses a 16-way interleaved

Our Cray has a much higher degree of interleave.  The new C-2 will have
128-way.

--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene