Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!wuarchive!emory!hubcap!toulouse
From: toulouse@iro.umontreal.ca (Michel Toulouse)
Newsgroups: comp.parallel
Subject: superlinear speedup
Message-ID: <11767@hubcap.clemson.edu>
Date: 21 Nov 90 14:01:18 GMT
Sender: fpst@hubcap.clemson.edu
Organization: Universite de Montreal
Lines: 431
Approved: parallel@hubcap.clemson.edu


	Two weeks ago, I sent a message on this news group asking 
for references about papers claiming superlinear speedup. Since 
then, I have received many references and also enjoy the reading of 
very interesting comments. 

	My first plan was to take a close look to every papers and 
add my owns comments. But this may take a lost of time and many of 
those who send me references and comments did manifest their interest 
in receiving this list of references. So I do it now with this simple
comment.

	The traditional argument against superlinear speedup is that
"if a processus Q is running on p processors for t time, then there
is a sequential machine that will simulate Q in at most (kpt) + k' 
time steps for k and k' being constants. So the speedup is by a 
factor of p only". As it was mention in many comments, there is 
superlinear based on avantages like more physical memory in the parallel 
machine or obtained by heuristically guided search. They are not
in my point of view convincing arguments against the traditional 
scepticism. In the first case it is clearly not convincing since 
you can always add more memory to the sequential machine. In the 
second case if you partition the search space equally among the p 
processors, you can achieve very big speedup (as mention in one 
of the comments I received). But I think this, from a theoretical point 
of view, is meaningless since the speedup depend only on factors like 
the partition of the search space, the heuristic used, your random 
function, etc.

	However, relate to heuristics in search space, this paper: "The
performance of cooperative processes", B.A. Huberman, Physica D42, 1990
38--47, claim superlinear speedup not based on partition of search space
but on "hints" that cooperative processes might used to eliminate 
ininteresting subspaces. The paper give a formal explanation to this
possible superlinear speedup. Will it be a counter-argument to the argument
above, I don't know for the moment. 

	The volume 42 of Physica D concentrate only on nonlinear phenomena,
the discussion about emergent computation is traited extensively, I found
many interesting papers, one about a complete characterizations of Turing
machines with a restrained number of states (no problem you still can
teach that the halting problem is undecidable), also many papers on neural
computation. I think this review of Physica over emergent computation 
is a very serious work (as it is almost always the case in this Journal).
At least (for me) this number introduce an approach to parallel computation
that might challenge more seriously the argument against superlinear speedup. 
As I said before, I didn't have time to look at every paper, I'm anxious to
find challenger to the traditional argument.

	I thank's very much every one who send me references and comments,
I'm still very interesting to received any comments on this subject or
relate one like emergent computation.


	Michel Toulouse
	Universite de Montreal
	email:toulouse@iro.umontreal.ca

-------------------------------------------------------------------------

@inproceedings{RAO88fsttcs,
       AUTHOR = "Nageshwara Rao, V. and Kumar, Vipin",
        TITLE = "Superlinear Speedup in State-Space Search",
BOOKTITLE = "Proceedings of the 1988 Foundation of Software Technology
and Theoretcal Computer Science",
        YEAR = "December 1988",
Note = "Lecture Notes in Computer Science number 338, Springer Verlag"
       } 


@techreport{RAO90suplin,
       AUTHOR = "V. Nageshwara Rao and Vipin Kumar",
        TITLE = "On the Efficicency of Parallel Depth-First Search",
        YEAR = "1990",
    INSTITUTION = "Tech Report 89-41, Computer Science Department, University of Minnesota",
NOTE = "Submitted for publication"
      }  


-------------------------------------------------------------------------

@article{lihudak89
    ,key={LiKai}
    ,author={Li, K. and Hudak, P.}
    ,title={Memory Coherence in Shared Virtual Memory Systems}
    ,journal={ACM Transactions on Computer Systems}
    ,volume=7
    ,number=4
    ,month=Nov
    ,year=1989
    ,pages={321-359}
    }

@phdthesis{liphd
  ,key={li}
  ,author={Li, K.}
  ,title={Shared Virtual Memory on Loosely Coupled Multiprocessors}
  ,school={Yale University, Department of Computer Science}
  ,year=1986
  }

-------------------------------------------------------------------------

Charlie McDowell and myself (David Helmbold) have a paper
in the IEEE transactions on Parallel and Distributed Systems,
Vol 1, No. 2 (april 1990) titled "Modeling Speedup (n) greater
than n.  Our references include:
@techreport{sandia,
  AUTHOR = "Benner, R.E. and J.L. Gustafson and R.E. Montry",
  TITLE = "Development and analysis of scientific application programs 
	on a 1024-processor hypercube",
  INSTITUTION = "Sandia National Laboratories",
  YEAR = "1988",
  MONTH = "February",
  NUMBER = "SAND 88-0317",
  ADDRESS = "Albuquerque, N.M.", 
  note = ""}

@article{Gus88,
  AUTHOR = "Gustafson, J.L.",
  TITLE = "Reevaluating {A}mdahl's Law",
  JOURNAL = "CACM",
  YEAR = "1988",
  volume = "31",
  number = "5",
  pages = "532-533",
  month = "May",
  note = ""}

@article{EZL89,
  AUTHOR = "D.L. Eager and J. Zahorjan and E.D. Lazowska",
  TITLE = "Speedup versus Efficiency in Parallel Systems",
  JOURNAL = "IEEE Trans. on Comp.",
  YEAR = "1989",
  volume = "38",
  number = "3",
  pages = "408-423",
  month = "March",
  note = ""}

@inproceedings{SF89,
  AUTHOR = "C.B. Stunkel and W.K. Fuchs",
  TITLE = "Analysis of Hypercube cache performance using address 
	traces generated by {TRAPEDS}",
  BOOKTITLE = "Proc. 1989 International Conference on Parallel Processing",
  YEAR = "1989",
  pages = "I-33--I-40",
  note = ""}
%
@article{LS84,
  AUTHOR = "T. Lai and S. Sahni",
  TITLE = "Anomalies in parallel branch-and-bound algorithms",
  JOURNAL = "CACM",
  YEAR = "1984",
  volume = "27",
  number = "6",
  pages = "594-602",
  month = "June",
  note = ""}
% Ohio State and Univ of Minnesota

@article{Wei82,
  AUTHOR = "B. W. Weide",
  TITLE = "Modeling unusual behavior of parallel algorithms",
  JOURNAL = "IEEE Trans. on Computers",
  YEAR = "1982",
  volume = "C-31",
  number = "11",
  pages = "1126-1130",
  month = "November",
  note = ""}
% Ohio State

@article{LW86,
  AUTHOR = "G.-J. Li and B. W. Wah",
  TITLE = "Coping with anomalies in parallel branch-and-bound algorithms",
  JOURNAL = "IEEE Trans. on Comp.",
  YEAR = "1986",
  volume = "C-35",
  number = "6",
  pages = "568-573",
  month = "June",
  note = ""}
% was Purdue now Univ of Illinois

@inproceedings{PH88,
  AUTHOR = "B. R. Preiss and V. C. Hamacher",
  TITLE = "Semi-static dataflow",
  BOOKTITLE = "Proc. 1988 Inter. Conf. on Parallel Processing",
  YEAR = "1988",
  pages = "127-134",
  note = ""}
% Univ of Waterloo and Univ of Toronto

@article{FLW86,
  AUTHOR = "V. Faber and O. M. Lubeck and A. B. White Jr.",
  TITLE = "Superlinear speedup of an efficient sequential algorithm is not
  possible",
  JOURNAL = "Parallel Computing",
  YEAR = "1986",
  volume = "3",
  pages = "259-260",
  note = ""}
% LANL

@article{Par86,
  AUTHOR = "D. Parkinson",
  TITLE = "Parallel efficiency can be greater than unity",
  JOURNAL = "Parallel Computing",
  YEAR = "1986",
  volume = "3",
  pages = "261-262",
  note = ""}
% DAP support unit, Queen Mary College, London

@book{GSS87,
	AUTHOR = "E. F. Gehringer and D. P Siewiorek and Z. Segall",
	TITLE = "Parallel processing, the {Cm}* experience",
	PUBLISHER = "Digital press",
	pages = "227-231",
	YEAR = "1987"}

@inproceedings{Li88,
  AUTHOR = "K. Li",
  TITLE = "{IVY}: a shared virtual memory system for parallel computing",
  BOOKTITLE = "Proc. 1988 Inter. Conf. on Parallel Processing",
  YEAR = "1988",
  pages = "94-101",
  note = ""}
% Princeton

@inproceedings{MG85,
  AUTHOR = "R. Mehrotra and E. F. Gehringer",
  TITLE = "Superlinear speedup through randomized algorithms",
  BOOKTITLE = "Proc. 1985 Inter. Conf. on Parallel Processing",
  YEAR = "1985",
  pages = "291-300",
  note = ""}
% North Carolina State

@article{Kor82,
  AUTHOR = "W. A. Kornfeld",
  TITLE = "Combinatorially implosive algorithms",
  JOURNAL = "CACM",
  YEAR = "1982",
  volume = "25",
  number = "10",
  pages = "734-738",
  month = "October",
  note = ""}
% MIT

@techreport{Fla87,
  AUTHOR = "H. P. Flatt",
  TITLE = "Further comments on a model for parallel processing",
  INSTITUTION = "IBM PASC G320-3503",
  YEAR = "1987",
  note = ""}

@article{Miy88,
  AUTHOR = "E. Miya",
  TITLE = "Suggestion on Superlinear speed up terminology",
  JOURNAL = "Network news posting",
  YEAR = "1988",
  volume = "comp.parallel",
  number = "318",
  month = "December",
  note = ""}
% NASA AMES

@inproceedings{Smi81,
  AUTHOR = "B. J. Smith",
  TITLE = "Architecture and applications of the {HEP} multiprocessor computer",
  BOOKTITLE = "Real-Time signal processing IV",
  YEAR = "1981",
  volume = "298",
  pages = "241-248",
  note = ""}
% 

@article{San86,
  AUTHOR = "J. Sanguinetti",
  TITLE = "Performance of a message-based multiprocessor",
  JOURNAL = "Computer",
  YEAR = "1986",
  volume = "19",
  number = "9",
  month = "September",
  pages = "47-55",
  note = ""}
% just slightly superunity on an {ELXSI}, due to reduced {OS} overhead

@article{Min70,
  AUTHOR = "M. Minsky",
  TITLE = "Form and Content in Computer Science, {ACM Turing Lecture}",
  JOURNAL = "J. ACM",
  YEAR = "1970",
  volume = "17",
  pages = "197-215",
  note = ""}
% the log(n) Minsky conjecture

@article{Lip88,
  AUTHOR = "J. Lipkis",
  TITLE = "Superlinear performance of a monte carlo code on the {NYU}
  {Ultracomputer}",
  JOURNAL = "Personal communication",
  YEAR = "1988",
  note = ""}
% 

@article{Bar88,
  AUTHOR = "J. Barton",
  TITLE = "Superlinear performance of the {SAXPY} kernel on the {Silicon}
  {Graphics} {4D120/GTX}",
  JOURNAL = "Personal communication",
  YEAR = "1988",
  note = ""}
% 

@article{GM85,
  AUTHOR = "D. H. Grit and J. R. McGraw",
  TITLE = "Programming Divide and Conquer for a {MIMD} Machine",
  JOURNAL = "Software Practice and Experience",
  YEAR = "1985",
  month = " January",
  volume = "15",
  number = "1",
  pages = "41-53",
  note = ""}

@article{Amd67,
  AUTHOR = "G. Amdahl",
  TITLE = "Validity of the single processor approach to achieving large
  scale computing capabilities",
  JOURNAL = "Proceedings AFIPS Comp. Conf.",
  YEAR = "1967",
  volume = "30",
  pages = "483-485",
  note = ""}

-------------------------------------------------------------------------

\bibitem{Kuma88c}
Kumar, Vipin, Ramesh, K., and Nageshwara, Rao V.,
``Parallel Best-First Search of State-Space Graphs: A Summary of Results",
{\em AAAI 1988} : 122-127.


\bibitem{Kuma89a}
Kumar, Vipin, and Nageshwara, Rao V.,
``Load Balancing on the Hypercube Architecture",
{\em Proc. 4th Conf. on Hypercubes, Concurrent Computers and Applications},
March 1989.


\bibitem{Nels90b}
Nelson, Peter C., and Toptsis, Anestis A., 
``Artificial Intelligence, Parallel Processing, and Bidirectional
Heuristic Search",
Submitted

\bibitem{Nels90c}
Nelson, Peter C., and Toptsis, Anestis A., 
``Superlinear Speedup in Multiprocessor Bidirectional Heuristic 
State-Space Search", 
Submitted

\bibitem{Nels90d}
Nelson, Peter C., and Toptsis, Anestis A., 
``Wave-Shaping in Parallel Bidirectional Heuristic State-Space Search", 
Submitted


\bibitem{Rao87a}
Rao, Nageshwara, V., Kumar, Vipin, and Ramesh, K.,
``A Parallel Implementation of Iterative-Deepening-A*",
{\em AAAI 1987}, 178-182.

\bibitem{Rao87d}
Rao, Nageshwara, V., and Kumar, Vipin, and Ramesh, K., 
``Parallel Heuristic Search on a Shared Memory  Multiprocessor", 
{\em Tech. Report. AI Lab, TR-87-45},
Univ. of Texas at Austin, January 1987.

\bibitem{Rao88b}
Rao, Nageshwara, V., and Kumar, Vipin, 
``Concurrent Insertions and Deletions in a Priority Queue",
{\em 1988 Inter. Conf. on Parallel Processing}, Vol. 3, (1988) : 207-210.


-------------------------------------------------------------------------

In the Conpar90-Vapp 4 Proceedings you'll find
M.Cosnard, J.L-Philippe
Achieving Superlinear Speedups for the Multiple Polinomial Quadratic 
Sieve Factoring Algorithm on a Distributed Memory Multiprocessor
p.863 - 874

The Proceedings have been printed at Springer Verlag, Germany and
may be ordered by bookshops or with a 30% reduction directly at

*********************************************************
* Institut fuer Informatik   | phone : +41 61 321 99 67    *
* Mittlere Strasse 142       | fax   : +41 61 321 99 15     *
* 4000 Basel                 | email : waser@urz.unibas.ch   *
* Switzerland                                                 *
***************************************************************

-------------------------------------------------------------------------

Li & Wah "How to cope with anomalies in parallel approximate
branch-andf-bound search". AAAI-84 (National Conf. on AI).

McBurney & Sleep "Transputer based experimanets with the ZAPP
architecture" In PARLE-87 (Parallel Architectures and Languages,
Europe) conference, Springer LNCS 258.

The former is a theoretical analysis, the latter gives some
practical results. In both cases the superlinear is speedup is
due to a single processor doing a long search of a tree with no
solutions, while in a multiprocessor system this tree is
searched in parallel with another in which a solution is easily
found.

-------------------------------------------------------------------------

David P. Helmbold, and Charles E. McDowell,"Modeling Speedup (n) Greater
than n", IEEE Trans. on Parallel and Distributed Sys., Apr. 1990,
pp. 250 - 256.