Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!fernwood!oracle!news From: csimmons@jewel.oracle.com (Charles Simmons) Newsgroups: comp.arch Subject: Re: Re^2: Superlinear Speedup (was Re: Scalability?) Message-ID: <1990Jun4.121321.1985@oracle.com> Date: 4 Jun 90 12:13:21 GMT References: <1990Jun3.145408.2472@watmath.waterloo.edu> <1990May3.203405.23456@ecn.purdue.edu> <2075@naucse.UUCP> <6897@odin.corp.sgi.com> <49622@lanl.gov> <1990May1.154558.24009@cs.rochester.edu> <1990Jun2.080658.12651@oracle.com> Sender: news@oracle.com Reply-To: csimmons@oracle.com Organization: Oracle Corp Lines: 47 In article <1990Jun3.145408.2472@watmath.waterloo.edu>, sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) writes: > From: sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) > Subject: Re^2: Superlinear Speedup (was Re: Scalability?) > Date: 3 Jun 90 14:54:08 GMT > > csimmons@jewel.oracle.com (Charles Simmons) writes: > > >I'm curious as to what extent researchers have observed real superlinear > >speedups. For example, consider a chess program. On an old-fashioned > >single processor architechture, the program will examine multiple > >alternatives one at a time. The results of each alternative can trim > >the amount of searching required in subsequent alternatives. (The > >alpha-beta cutoffs become closer together.) Now we might imagine that > >a version of the program written for a parallel machine might be able > >to benefit from examining multiple alternatives in parallel. For example, > >if one alternative is highly advantageous, it might cause the alpha-beta > >cutoff values to get lots closer lots faster. > >-- Chuck > > It's a theorum that (theoretically, anyway) super-linear speedup cannot > occur. In practice, it may occur marginally, but this is due to the > fact that P processors have: > -P times as much cache > -P times as many data lines to their local main memory (modulo > shared memory) > And therefore approximately P times the memory bandwidth. > > As some may have noticed, high-performance RISC processors are reducing > even compute-bound problems to memory BW-bound problems. Not because > RISCs are memory pigs, but because they run so fast that they saturate > the memory bandwidth. So the major win of multiple processors is > becoming the increase in bandwidth to memory. > > Crispin Yah... Hopefully obviously, I'm not real interested in the speedup effects caused by having more caches and more main memory bandwidth. Also, obviously, I can simulate any multiprocessor algorithm on a single processor by using time sharing techniques. Clearly, adding the time sharing techniques would cause a certain amount of overhead to occur. So I'm still interested in the question of: To what extent have researchers found algorithms that run much faster when you explore multiple possible solutions at the same time. Thanks, Chuck