Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!fernwood!oracle!news
From: csimmons@jewel.oracle.com (Charles Simmons)
Newsgroups: comp.arch
Subject: Re: Re^2: Superlinear Speedup (was Re: Scalability?)
Message-ID: <1990Jun4.121321.1985@oracle.com>
Date: 4 Jun 90 12:13:21 GMT
References: <1990Jun3.145408.2472@watmath.waterloo.edu> <1990May3.203405.23456@ecn.purdue.edu> <2075@naucse.UUCP> <6897@odin.corp.sgi.com> <49622@lanl.gov> <1990May1.154558.24009@cs.rochester.edu> <1990Jun2.080658.12651@oracle.com>
Sender: news@oracle.com
Reply-To: csimmons@oracle.com
Organization: Oracle Corp
Lines: 47

In article <1990Jun3.145408.2472@watmath.waterloo.edu>,
sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan) writes:
> From: sccowan@watmsg.uwaterloo.ca (S. Crispin Cowan)
> Subject: Re^2: Superlinear Speedup (was Re: Scalability?)
> Date: 3 Jun 90 14:54:08 GMT
> 
> csimmons@jewel.oracle.com (Charles Simmons) writes:
> 
> >I'm curious as to what extent researchers have observed real superlinear
> >speedups.  For example, consider a chess program.  On an old-fashioned
> >single processor architechture, the program will examine multiple
> >alternatives one at a time.  The results of each alternative can trim
> >the amount of searching required in subsequent alternatives.  (The
> >alpha-beta cutoffs become closer together.)  Now we might imagine that
> >a version of the program written for a parallel machine might be able
> >to benefit from examining multiple alternatives in parallel.  For example,
> >if one alternative is highly advantageous, it might cause the alpha-beta
> >cutoff values to get lots closer lots faster.
> >-- Chuck
> 
> It's a theorum that (theoretically, anyway) super-linear speedup cannot
> occur.  In practice, it may occur marginally, but this is due to the
> fact that P processors have:
> 	-P times as much cache
> 	-P times as many data lines to their local main memory (modulo
> 	shared memory)
> And therefore approximately P times the memory bandwidth.
> 
> As some may have noticed, high-performance RISC processors are reducing
> even compute-bound problems to memory BW-bound problems.  Not because
> RISCs are memory pigs, but because they run so fast that they saturate
> the memory bandwidth.  So the major win of multiple processors is
> becoming the increase in bandwidth to memory.
> 
> Crispin

Yah...  Hopefully obviously, I'm not real interested in the speedup
effects caused by having more caches and more main memory bandwidth.

Also, obviously, I can simulate any multiprocessor algorithm on a
single processor by using time sharing techniques.  Clearly, adding
the time sharing techniques would cause a certain amount of overhead
to occur.  So I'm still interested in the question of:  To what extent
have researchers found algorithms that run much faster when you explore
multiple possible solutions at the same time.

Thanks, Chuck