Path: utzoo!mnetor!uunet!husc6!bbn!gatech!hubcap!ut-sally!kumar
From: ut-sally!kumar@uunet.UU.NET (Vipin Kumar)
Newsgroups: comp.parallel
Subject: Re: Breakthrough at Sandia?  + Iso-efficiency function
Message-ID: <1252@hubcap.UUCP>
Date: 30 Mar 88 13:23:01 GMT
Sender: fpst@hubcap.UUCP
Lines: 85
Summary: Scalability of parallel formulation
Approved: parallel@hubcap.clemson.edu

In article <1223@hubcap.UUCP>, uiucdcs!pur-ee!uiucdcsm.cs.uiuc.edu!grunwald@uunet.uu.net writes:
> Basically, they took problems well suited to parallelism (wave equation,
> finite difference methods for beam stress and one other one) and did good
> implementations.
> 
> For a fixed problem size, they got from 550 -> 650 speedup. If you fix the
> problem size *per processor* (i.e. the problem gets bigger as you throw
> more processesors at it), the ``scaled speedup'' peaks at 1020.
> 
> This is for total execution time -- loading, running & unloaded.
> 
> The scaled speedup measure is justified because many problems sizes are
> simply constrained by time, not by the problem. In those cases, you just
> want the thing done within a reasonable time. If you can run a larger version
> of the problem in that time, you're happy about it.
> 
> The 550 -> 650 speedup indicates that the serial code was about .15%, which
> is much better than Amdahl conjectured to be possible.
> 

Clearly, it is fair to increase the problem size to test speedup
on more processor.  If the problem size is fixed, then 
the speedup will at best saturate at some point. (Each program
has some (however small) fixed sequential component, which
puts a ceiling on the obtainable speedup).
Actually for many parallel algorithms, it is possible to obtain linear speedup
for arbitrarily large N (the number of processors)
by simply increasing the problem size.

A more interesting question to ask is: how fast should
the problem size (measured in terms of sequential execution time) 
grow as a function of N  to maintain linear speedup (i.e., constant
efficiency)?  In other words, what is the iso-efficiency function
of the parallel formulation? (If the problem size needs to grow as
f(N) to maintain constant efficiency, then the iso-efficiency function
of the parallel formulation is f(N).) 
 

The iso-efficiency function precisely denotes the scalability   
of the parallel formulation.
For example, if the iso-efficiency function is linear 
(i.e., if the problem size only needs to grow linearly
with the number of processors to maintain constant efficiency),
then the parallel formulation is highly scalable.  Parallel formulations
of many problems with data-level parallelism fall in this category.
On the other hand, if the iso-efficiency function is exponential, 
then the formulation is not good for large N (because the
problem size would have to be extremely large to effectively
utilize parallel machines with large N). For example,
the iso-efficiency function of a commonly used parallel 
formulation of quick sort (divide the list sequentially, 
and then sort each sublist in parallel) is exponential.
As another example, the iso-efficiency function of the
parallel formulation of 0/1 Knapsack problem presented by 
Lee, Shragowitz and Sahni in 1987 parallel processing conf. 
(pages 699-706)  is O(N log N) which is almost linear; 
hence this parallel formulation is highly scalable.

If we know the iso-efficiency function of the parallel
formulation developed at Sandia Labs, then we could predict
how well it would do on larger hypercubes.  
 
Vipin
----- 
 
 
PS.  The concept of iso-efficiency is introduced in the
following papers:
 
Parallel Depth First Search on the Ring Architecture
by Vipin Kumar, V. Nageshwara Rao and K. Ramesh.
To appear in the proc. of 1988 Parallel Processing Conference.
 
 
Parallel Depth-First on Multiprocessors, Part II: Analysis
by Vipin Kumar and V. Nageshwar Rao.
To appear in International Journal of Parallel programming.
 
The second paper presents a work distribution scheme (with
applications in depth-first search and divide-and-conquer
algorithms) that has O(N log N) iso-efficiency function
for suitable parallel architectures.  Using this scheme,
we get linear speedup for over 100 processors on BBN Butterfly.
Another formulation presented in the paper gives linear speedup 
on the Intel hypercube for 100 processors.