Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site cmcl2.UUCP
Path: utzoo!watmath!clyde!bonnie!akgua!mcnc!philabs!cmcl2!gottlieb
From: gottlieb@cmcl2.UUCP (Allan Gottlieb)
Newsgroups: net.arch
Subject: Re: Transputer and occam
Message-ID: <675@cmcl2.UUCP>
Date: Tue, 26-Mar-85 22:22:21 EST
Article-I.D.: cmcl2.675
Posted: Tue Mar 26 22:22:21 1985
Date-Received: Sun, 31-Mar-85 04:15:26 EST
References: <825@ucbtopaz.CC.Berkeley.ARPA> <811@loral.UUCP> <455@bonnie.UUCP> <440@cornell.UUCP>
Reply-To: ihnp4!cmcl2!gottlieb (Allan Gottlieb)
Organization: New York University
Lines: 41
Summary: 

In article <440@cornell.UUCP> kevin@gvax.UUCP (Kevin Karplus) writes:
>I'm a little dubious about the value of hypercubes, as most big
>programs have a 5% to 20% purely serial component.
>(Note: this only applies to general-purpose
>machines.  Obviously, certain problems can have the serial part reduced
>to a tiny fraction.)

Have you any data to support this claim?
At NYU, our Ultracomputer project has (very extensive) experience with
a wide range of important scientific applications and we have NEVER
found the serial code to be k% for fixed k.  Instead each of these problems
is invariably a class of problems parameterized by some "size"
variables (often the number of mesh points) and the serial portion
approaches 0 as the size increases.  Thus, for large enough problems
the potential for parallelism can be made arbitrarily large.

This raises the question of "how much is enough".  That is how big
must a problem be for 1000 processors to be used effectively.  We have
numerious simulation results on this question.  The NASA (GISS)
"weather code" (i.e. three dimensional atmospheric simulation) when 
executed using meshes appropriate for an Amdahl V7 or V8 can get high
efficiency (above 70%) with a few hundred processors but not
thousands.  However, when (more desirable from a numerical analysis
point of view) meshes separated by about 1 degree of arc are used
thousands of processors can be efficiently employed.  Thus for this
problem class, thousands (but not millions) of processors would be
useful.  I should note that we parallelized this program without
excessive effort using techniques that Kuck (Illinois) and Kennedy
(Rice) and their collegeues believe can be done automatically.
Perhaps using more sophisticated parallelization techniques or by
employing a new algorithm, more processors could be used.  I do not
believe that just refining the mesh enough to utilize a million
processors is justified from a numerical analysis point of view -- but
here I am on shakey grounds.

Caltech has also reported on many scientific problems (using their
real hardware) and again the serial portion drops with problem size.
-- 
Allan Gottlieb
GOTTLIEB@NYU
{floyd,ihnp4}!cmcl2!gottlieb   <---the character before the 2 is an el