Path: utzoo!mnetor!uunet!husc6!bbn!gatech!hubcap!ut-sally!kumar From: ut-sally!kumar@uunet.UU.NET (Vipin Kumar) Newsgroups: comp.parallel Subject: Re: Breakthrough at Sandia? + Iso-efficiency function Message-ID: <1252@hubcap.UUCP> Date: 30 Mar 88 13:23:01 GMT Sender: fpst@hubcap.UUCP Lines: 85 Summary: Scalability of parallel formulation Approved: parallel@hubcap.clemson.edu In article <1223@hubcap.UUCP>, uiucdcs!pur-ee!uiucdcsm.cs.uiuc.edu!grunwald@uunet.uu.net writes: > Basically, they took problems well suited to parallelism (wave equation, > finite difference methods for beam stress and one other one) and did good > implementations. > > For a fixed problem size, they got from 550 -> 650 speedup. If you fix the > problem size *per processor* (i.e. the problem gets bigger as you throw > more processesors at it), the ``scaled speedup'' peaks at 1020. > > This is for total execution time -- loading, running & unloaded. > > The scaled speedup measure is justified because many problems sizes are > simply constrained by time, not by the problem. In those cases, you just > want the thing done within a reasonable time. If you can run a larger version > of the problem in that time, you're happy about it. > > The 550 -> 650 speedup indicates that the serial code was about .15%, which > is much better than Amdahl conjectured to be possible. > Clearly, it is fair to increase the problem size to test speedup on more processor. If the problem size is fixed, then the speedup will at best saturate at some point. (Each program has some (however small) fixed sequential component, which puts a ceiling on the obtainable speedup). Actually for many parallel algorithms, it is possible to obtain linear speedup for arbitrarily large N (the number of processors) by simply increasing the problem size. A more interesting question to ask is: how fast should the problem size (measured in terms of sequential execution time) grow as a function of N to maintain linear speedup (i.e., constant efficiency)? In other words, what is the iso-efficiency function of the parallel formulation? (If the problem size needs to grow as f(N) to maintain constant efficiency, then the iso-efficiency function of the parallel formulation is f(N).) The iso-efficiency function precisely denotes the scalability of the parallel formulation. For example, if the iso-efficiency function is linear (i.e., if the problem size only needs to grow linearly with the number of processors to maintain constant efficiency), then the parallel formulation is highly scalable. Parallel formulations of many problems with data-level parallelism fall in this category. On the other hand, if the iso-efficiency function is exponential, then the formulation is not good for large N (because the problem size would have to be extremely large to effectively utilize parallel machines with large N). For example, the iso-efficiency function of a commonly used parallel formulation of quick sort (divide the list sequentially, and then sort each sublist in parallel) is exponential. As another example, the iso-efficiency function of the parallel formulation of 0/1 Knapsack problem presented by Lee, Shragowitz and Sahni in 1987 parallel processing conf. (pages 699-706) is O(N log N) which is almost linear; hence this parallel formulation is highly scalable. If we know the iso-efficiency function of the parallel formulation developed at Sandia Labs, then we could predict how well it would do on larger hypercubes. Vipin ----- PS. The concept of iso-efficiency is introduced in the following papers: Parallel Depth First Search on the Ring Architecture by Vipin Kumar, V. Nageshwara Rao and K. Ramesh. To appear in the proc. of 1988 Parallel Processing Conference. Parallel Depth-First on Multiprocessors, Part II: Analysis by Vipin Kumar and V. Nageshwar Rao. To appear in International Journal of Parallel programming. The second paper presents a work distribution scheme (with applications in depth-first search and divide-and-conquer algorithms) that has O(N log N) iso-efficiency function for suitable parallel architectures. Using this scheme, we get linear speedup for over 100 processors on BBN Butterfly. Another formulation presented in the paper gives linear speedup on the Intel hypercube for 100 processors.