Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site cmcl2.UUCP Path: utzoo!watmath!clyde!bonnie!akgua!mcnc!philabs!cmcl2!gottlieb From: gottlieb@cmcl2.UUCP (Allan Gottlieb) Newsgroups: net.arch Subject: Re: Transputer and occam Message-ID: <675@cmcl2.UUCP> Date: Tue, 26-Mar-85 22:22:21 EST Article-I.D.: cmcl2.675 Posted: Tue Mar 26 22:22:21 1985 Date-Received: Sun, 31-Mar-85 04:15:26 EST References: <825@ucbtopaz.CC.Berkeley.ARPA> <811@loral.UUCP> <455@bonnie.UUCP> <440@cornell.UUCP> Reply-To: ihnp4!cmcl2!gottlieb (Allan Gottlieb) Organization: New York University Lines: 41 Summary: In article <440@cornell.UUCP> kevin@gvax.UUCP (Kevin Karplus) writes: >I'm a little dubious about the value of hypercubes, as most big >programs have a 5% to 20% purely serial component. >(Note: this only applies to general-purpose >machines. Obviously, certain problems can have the serial part reduced >to a tiny fraction.) Have you any data to support this claim? At NYU, our Ultracomputer project has (very extensive) experience with a wide range of important scientific applications and we have NEVER found the serial code to be k% for fixed k. Instead each of these problems is invariably a class of problems parameterized by some "size" variables (often the number of mesh points) and the serial portion approaches 0 as the size increases. Thus, for large enough problems the potential for parallelism can be made arbitrarily large. This raises the question of "how much is enough". That is how big must a problem be for 1000 processors to be used effectively. We have numerious simulation results on this question. The NASA (GISS) "weather code" (i.e. three dimensional atmospheric simulation) when executed using meshes appropriate for an Amdahl V7 or V8 can get high efficiency (above 70%) with a few hundred processors but not thousands. However, when (more desirable from a numerical analysis point of view) meshes separated by about 1 degree of arc are used thousands of processors can be efficiently employed. Thus for this problem class, thousands (but not millions) of processors would be useful. I should note that we parallelized this program without excessive effort using techniques that Kuck (Illinois) and Kennedy (Rice) and their collegeues believe can be done automatically. Perhaps using more sophisticated parallelization techniques or by employing a new algorithm, more processors could be used. I do not believe that just refining the mesh enough to utilize a million processors is justified from a numerical analysis point of view -- but here I am on shakey grounds. Caltech has also reported on many scientific problems (using their real hardware) and again the serial portion drops with problem size. -- Allan Gottlieb GOTTLIEB@NYU {floyd,ihnp4}!cmcl2!gottlieb <---the character before the 2 is an el