Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!unmvax!gatech!hubcap!gusciora
From: gusciora@sam.cs.cmu.edu (George Gusciora)
Newsgroups: comp.parallel
Subject: Superlinear speedup
Message-ID: <3841@hubcap.UUCP>
Date: 12 Dec 88 13:18:36 GMT
Sender: fpst@hubcap.UUCP
Lines: 26
Approved: parallel@hubcap.clemson.edu

Why is everyone so surprised about superlinear speedup?  Sometimes
an n-processor machine is just better suited for a particular
algorithm than that machine with less than n processors.  Example:
I've got a neural network program running on the Warp (10 processor
MIMD machine).  The algorithm basically uses 9 cells to perform
the error propagation and uses one cell to update the connections.
Each of the 9 cell achieves about 8 MFLOPS and the 10th cell is well
underutilized (giving a total performance of about 72 MFLOPS out of
a possible 100).  Now, if I ran this on a 5-cell Warp, only 4 cells 
would be performing error propagation (instead of 9) and I'd get only
about 32 MFLOPS out of a possible 50.  The net result is that the
program runs more than twice as fast with just twice as many processors.
Superlinear speedup.  Several programs behave like this.

That was a simple example.  It's easy to say "I'm really comparing
a 9-processor machine with a 4-processor machine (because the last
processor is so underutilized) and a 9/4 speedup is expected."  So
what.  Nearly every alogrithm on any machine wastes machine cycles.
I picked the neural network example just to illustrate this.
Superlinear speedup is only impossible when both machines are 
running at full 100 percent utilization.

- George Gusciora
  Carnegie Mellon University
  (412) 268 - 7553
  gusciora@sam.cs.cmu.edu