Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ames.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!bellcore!decvax!genrad!panda!talcott!harvard!seismo!hao!ames!eugene From: eugene@ames.UUCP (Eugene Miya) Newsgroups: net.arch Subject: Re: "The Shared Memory Hypercube" Do you smell any smoke? Message-ID: <973@ames.UUCP> Date: Mon, 6-May-85 19:05:16 EDT Article-I.D.: ames.973 Posted: Mon May 6 19:05:16 1985 Date-Received: Thu, 9-May-85 02:28:23 EDT References: <2132@sun.uucp> <1447@think.ARPA> <551@lll-crg.ARPA> Distribution: net Organization: NASA-Ames Research Center, Mtn. View, CA Lines: 57 <1483@think.ARPA> <560@lll-crg.ARPA> > >Some of my assumptions are: > > - Lots and lots and lots of small processors are better than fewer big > > processors. > A very bad assumption, you want as many of the most COST EFFECTIVE > processors > . . . > > I tend to think that it is possible to > > get more MIPS per dollar by using smaller, cheaper processing elements. > You get the most MIPS for your dollar by using the most COST EFFECTIVE > processing elements. These do not happen to be the smallest and cheapest. To reinforce Eugene's comments: We just had a SIG meeting with Joe Oliger (the CS chair at Stanford) recently. Joe came to the conclusion [during the course of thinking] that "fewer, high performance CPU" proponents of multiprocessing had the advantage in being able to fit reasonable portions of problems into individual processor/memories. Do not forget! We are not developing these machines in a vacuum. We have to look at the applications which may be run on these machines. Consider a 100 x 100 x 100 array with 30 variables (and increasing as our known of the natural sciences increases). I can barely fit a fluid dynamics code on a 32-node Hypercube because the storage requirements per CPU are fierce [a different example, not the 100^3x30 example]. If I am repeating myself from any earlier postings, sorry. Jack Dennis had to revise the way he thought dataflow machines need to be built after two weeks here: he needs more memory and faster I/O. If there is any one thing multiprocessors allow us to do, it is add yet more memory. Hum? :-) > > What is the start up time for your vectors (i.e. how big does > > a vector have to be before the vector processing part wins over the > > scalar processing part.) I have just tested this recently on four different CRAY architectures. Short vector startup time is very good. The vector registers are faster than the scalar registers after you pass 3 elements. I have a graph that shows this. [To: Eugene Brooks: I sent a copy of this graph to Frank McMahon, the guy who developed the Livermore Loops, for scatter- gather operations in hardware on the XMP/48.] DON'T WORRY about vector STARTUP on a CRAY, the difference is insignificant, you are wasting your time optimizing this arena. A 205 might be another story, more later. > > Typical vector processors are limited in their > > speed by lack of memory bandwidth (this is true for a single processor > > with high bandwidth memories (e.g. the CRAY uses a 16-way interleaved Our Cray has a much higher degree of interleave. The new C-2 will have 128-way. --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene