Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!unmvax!gatech!hubcap!gusciora From: gusciora@sam.cs.cmu.edu (George Gusciora) Newsgroups: comp.parallel Subject: Superlinear speedup Message-ID: <3841@hubcap.UUCP> Date: 12 Dec 88 13:18:36 GMT Sender: fpst@hubcap.UUCP Lines: 26 Approved: parallel@hubcap.clemson.edu Why is everyone so surprised about superlinear speedup? Sometimes an n-processor machine is just better suited for a particular algorithm than that machine with less than n processors. Example: I've got a neural network program running on the Warp (10 processor MIMD machine). The algorithm basically uses 9 cells to perform the error propagation and uses one cell to update the connections. Each of the 9 cell achieves about 8 MFLOPS and the 10th cell is well underutilized (giving a total performance of about 72 MFLOPS out of a possible 100). Now, if I ran this on a 5-cell Warp, only 4 cells would be performing error propagation (instead of 9) and I'd get only about 32 MFLOPS out of a possible 50. The net result is that the program runs more than twice as fast with just twice as many processors. Superlinear speedup. Several programs behave like this. That was a simple example. It's easy to say "I'm really comparing a 9-processor machine with a 4-processor machine (because the last processor is so underutilized) and a 9/4 speedup is expected." So what. Nearly every alogrithm on any machine wastes machine cycles. I picked the neural network example just to illustrate this. Superlinear speedup is only impossible when both machines are running at full 100 percent utilization. - George Gusciora Carnegie Mellon University (412) 268 - 7553 gusciora@sam.cs.cmu.edu