Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!hao!gatech!hubcap!brooks
From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III)
Newsgroups: comp.hypercube
Subject: Re: Difficulty of programming in parallel
Message-ID: <1019@hubcap.UUCP>
Date: 25 Feb 88 12:53:08 GMT
Sender: fpst@hubcap.UUCP
Lines: 53
Approved: hypercube@hubcap.clemson.edu

In article <1010@hubcap.UUCP> elroy!johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) writes:
>
>In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes:
>>The program architecture required for a distributed system is VERY
>>DIFFERENT than that of a serial program, and you can't slowly evolve
>>a serial program into a parallel program for a distributed architecture,
>>as you can for a shared memory machine.
>>
>>						Eugene Brooks
>>-------------------------------------------------------------------
>
>I disagree that programs have to be different.
>The program architecture must be different on sequential
>and distributed memory machines.
You simply reiterate my point after disagreing, but given that your
response seems to infer thay my main point that shared memory machines
are easier to program than distributed memory machines and easier to
extract good performance from as well, I propose a test.  I have a
shared memory parallel program which simulates a packet switched network.
This program is written in PCP, an explicitly parallel extension of C for
shared memory machines.  The parallel program was created from its serial
starting point in one week.  The parallel implementation also runs on a
uniprocessor with all of the parallel runtime overhead stripped out.

I propose the following.
	1) That you port the program to the hypercube.  I do not doubt
	that it is possible, I know and love hypercubes and have had
	something to do with the "hype" the surrounds them, but I would
	like to know just how long it will take you to create the distributed
	memory version of this program.  Your distributed memory version must
	produce results that are identical to the serial version of the program.
	It may not be a "subset" of what the shared memory version does, ie you
	can not simulated a "limited" class of networks.  I do not care what
	the hypercube code looks like, only that the program deliver the same
	outputs given the same inputs as the serial and shared memory parallel
	version of the code.
	
	2) That you measure the speedups and more importantly the ABSOLUTE
	performance obtained with the hypercube version.  I will be interested
	in just how large the problem size must be made for a given number
	of processors before good efficiency is obtained, and would like to
	compare this to the shared memory version.
	
	3) I would then like to you to run the "hypercubeized" version of the
	code on a serial machine using the same processor technology as the
	hypercube you run on, as you claim is possible, and measure the
	extra overhead (My favorite catch phrase these days is "Compiler,
	algorithm and architectural inefficiency is 100% parallelizable")
	so that we can get a feel for how "inefficient" the message passing
	version of the code is.
	
Are you up to the Brooks challenge?  Is anyone out there who is "hyping"
hypercubes these days willing to accept the "Brooks challenge" and report
the results in the open literature?