Path: utzoo!attcan!uunet!cs.utexas.edu!rutgers!netnews.upenn.edu!central.cis.upenn.edu!jms From: jms@central.cis.upenn.edu (Jonathan M. Smith) Newsgroups: comp.arch Subject: Superlinear speed-up (long) Message-ID: <26088@netnews.upenn.edu> Date: 15 Jun 90 20:27:30 GMT Sender: news@netnews.upenn.edu Reply-To: jms@central.cis.upenn.edu (Jonathan M. Smith) Organization: University of Pennsylvania Lines: 88 Re: Super-linear speedup: The references are: %T Superlinear speedup of an efficient sequential algorithm is not possible %A V. Faber %A Olaf M. Lubeck %A Andrew B. White,Jr. %J Parallel Computing %I North-Holland %V 3 %D 1986 %P 259-260 and: %T Parallel efficiency can be greater than unity %A D. Parkinson %J Parallel Computing %I North-Holland %V 3 %D 1986 %P 261-262 Note the interesting juxtaposition of the articles in the same issue of the same journal. It's nice to know editors have a sense of humor. In any case, Chuck Simmons mentions the idea of speed-up from trying alternatives in parallel. I attacked this in my thesis, and illustrated several of the ideas in the much more compact paper in ICPP: %T Exploring ``Multiple Worlds'' in Parallel %A Jonathan M. Smith %A Gerald Q. Maguire,Jr. %B Proceedings, International Conference on Parallel Processing %C St. Charles, Illinois %V II, Software %P 239-245 %I The Pennsylvania State University Press %X CUCS-436-89 %D August 8-12, 1989 The thesis is: %T Concurrent Execution of Mutually Exclusive Alternatives %A Jonathan M. Smith %R Ph.D. Thesis %X Technical Report CUCS-442-89 %I Columbia University Computer Science Department %O also available from UMI %D May, 1989 The basic conclusion is: If you have alternatives with varying execution time, and the variation can be characterized in some reasonable way (e.g., a probability distribution function of a random variable defined over the execution times of the alternatives), you can get a speed-up by exploring the alternatives in parallel. The speed-up may be at some expense in throughput (but it if the iron is idle anyway, who cares) and can be estimated from the distribution. The analysis is completely general, and applies to other values which can be described as random variables with PDFs, e.g., error and failures. I implemented the scheme on an IBM ACE multiprocessor using the Jenkins-Traub (complex polynomials with complex coefficients) rootfinder on some polynomials of degrees 19->43 and it worked pretty decently. Numbers for a 2-head Ardent Titan are in the ICPP paper. Numbers for the ACE are in the thesis (which really focuses on building). On superlinearity (which I discuss briefly in the ICPP paper and somewhat more extensively in the thesis): The reason that the speed-up can be super-linear (ignoring the timesharing method, which anybody who's measured a context switch knows is suspicious) is that the only reasonable way to discuss speedup when any of a set of alternatives can be executed is in terms of the *expected* execution time. Since the probability of hitting a "good" case (fast) increases and the probability of hitting a "bad" case (slow) decreases according to the probability distribution function, it is easy to see that you can get good speedups. Viz, you have two algorithms for X(). One takes O(1) on 1/2 the inputs, O(N**3) on the other 1/2. The other algorithm has complementary run-times on the input set. Now, on average, the algorithm will take (O(N**3)+O(1))/2 = O(N**3). But, in parallel, we just choose the fastest and kill the other: O(1). So the speedup (in this admittedly contrived example) is O(N**3)/O(1) on a two processor system! -Jonathan