Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!hao!gatech!hubcap!brooks From: brooks@lll-crg.llnl.gov (Eugene D. Brooks III) Newsgroups: comp.hypercube Subject: Re: Difficulty of programming in parallel Message-ID: <1019@hubcap.UUCP> Date: 25 Feb 88 12:53:08 GMT Sender: fpst@hubcap.UUCP Lines: 53 Approved: hypercube@hubcap.clemson.edu In article <1010@hubcap.UUCP> elroy!johns%tybalt.caltech.edu@ames.arc.nasa.gov (John Salmon) writes: > >In article <959@hubcap.UUCP> brooks@LLL-CRG.LLNL.GOV (Eugene D. Brooks III) writes: >>The program architecture required for a distributed system is VERY >>DIFFERENT than that of a serial program, and you can't slowly evolve >>a serial program into a parallel program for a distributed architecture, >>as you can for a shared memory machine. >> >> Eugene Brooks >>------------------------------------------------------------------- > >I disagree that programs have to be different. >The program architecture must be different on sequential >and distributed memory machines. You simply reiterate my point after disagreing, but given that your response seems to infer thay my main point that shared memory machines are easier to program than distributed memory machines and easier to extract good performance from as well, I propose a test. I have a shared memory parallel program which simulates a packet switched network. This program is written in PCP, an explicitly parallel extension of C for shared memory machines. The parallel program was created from its serial starting point in one week. The parallel implementation also runs on a uniprocessor with all of the parallel runtime overhead stripped out. I propose the following. 1) That you port the program to the hypercube. I do not doubt that it is possible, I know and love hypercubes and have had something to do with the "hype" the surrounds them, but I would like to know just how long it will take you to create the distributed memory version of this program. Your distributed memory version must produce results that are identical to the serial version of the program. It may not be a "subset" of what the shared memory version does, ie you can not simulated a "limited" class of networks. I do not care what the hypercube code looks like, only that the program deliver the same outputs given the same inputs as the serial and shared memory parallel version of the code. 2) That you measure the speedups and more importantly the ABSOLUTE performance obtained with the hypercube version. I will be interested in just how large the problem size must be made for a given number of processors before good efficiency is obtained, and would like to compare this to the shared memory version. 3) I would then like to you to run the "hypercubeized" version of the code on a serial machine using the same processor technology as the hypercube you run on, as you claim is possible, and measure the extra overhead (My favorite catch phrase these days is "Compiler, algorithm and architectural inefficiency is 100% parallelizable") so that we can get a feel for how "inefficient" the message passing version of the code is. Are you up to the Brooks challenge? Is anyone out there who is "hyping" hypercubes these days willing to accept the "Brooks challenge" and report the results in the open literature?