Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!ucsd!sdcsvax!trantor.harris-atd.com!melmac!chuck
From: chuck@melmac.harris-atd.com (Chuck Musciano)
Newsgroups: comp.arch
Subject: Re: Speed
Message-ID: <2023@trantor.harris-atd.com>
Date: 4 May 89 14:18:58 GMT
References: <GRUNWALD.89May1131736@flute.cs.uiuc.edu> <13089@paris.ics.uci.edu> <150@dg.dg.com> <2187@wpi.wpi.edu>
Sender: news@trantor.harris-atd.com
Reply-To: chuck@trantor.harris-atd.com (Chuck Musciano)
Distribution: comp
Organization: Advanced Technology Dept., Harris Corp., Melbourne, Fl.
Lines: 68

In article <2187@wpi.wpi.edu> lfoard@wpi.wpi.edu (Lawrence C Foard) writes:
>Why isn't parallel computing used more? I can't think of many problems that
>couldn't be done in parallel if the hardware/software to do it was done well.

     Well, having been a player in the quest for working parallel machines, 
here are some of the problems encountered in building large scale parallel
machines:

	How do you program it?  Parallelize existing code (the dusty deck
	     problem), retrofit parallel constructs onto existing langauges,
	     or design entirely new parallel programming languages?  Also
	     nontrivial: how do you change the mindset of thousands of
	     programmers well-versed in serial programming techniques?

	For shared memory machines: what interconnection scheme do you use?
	     Flat memory space?  Heirarchical?  What about caches to reduce
	     long memory latencies?  Cache coherency for thousands of
	     processors?

	For message passing architectures: what interconnection scheme do
	     you use?  How do you handle messages in general?  What sort
	     of communication synchronization should you use?

	What sort of task granularity are you trying to support?  Fine grained
	     tasks of just a few instructions each?  Coarse grained tasks of
	     many thousand instructions?  If you choose fine grained, how
	     will you reduce the task overhead to make the tasks effective?
	     If coarse grained is the way to go, how do you solve the load
	     balancing problem when just a few processors have all the work,
	     starving the remaining processors?

	Will you use static or dynamic scheduling?  Static makes the task
	     overhead smaller, encouraging granularity reduction, but it is
	     very difficult to predict at compile-time exactly how big a task
	     will be, so that you can correctly place tasks in the machine.
	     Dynamic scheduling solves the load balancing problem at runtime,
	     but increases task overhead to move tasks through the system.
	     In addition, runtime heuristics for data distribution throughout
	     large networks of machines are difficult to formulate.

	How do you debug your programs?  What sort of mental model do you give
	     your application programmers who must deal with hundreds of 
	     processors accessing both global and local data objects simul-
	     taneously?  How do you implement debugging primitives so that
	     you can easily single step any, all, or just one processor?  How
	     can the debugger access all the data in all the processors,
	     including the caches and local buffers?  How do you even stop
	     an entire machine all at once, given certain propogation delays
	     throughout the architecture?  How do you debug programs which
	     are completely nondeterminisitic, with execution ordering being
	     essentially random?

	How about your programming model?  Communicating sequential processes?
	     Control flow with barrier synchronization?  Data flow?  If so,
	     static or dynamic?  Graph reduction?  How much of the underlying
	     machine model do you promote into the programming languages?

     I could go on and on.  Your instincts are right: there aren't many things
which wouldn't run faster on a parallel machine.  Unfortunately, what seems 
like a straightforward solution is in reality a very hard problem.  If you
are really interested, you might want to browse through past proceedings of
the International Conference on Parallel Processing (Penn State Press) or
some of the architecture and supercomputing conference proceedings.

Chuck Musciano			ARPA  : chuck@trantor.harris-atd.com
Harris Corporation 		Usenet: ...!uunet!x102a!trantor!chuck
PO Box 37, MS 3A/1912		AT&T  : (407) 727-6131
Melbourne, FL 32902		FAX   : (407) 727-{5118,5227,4004}