Xref: utzoo comp.sys.super:363 comp.arch:23073 comp.parallel:2614
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!hubcap!fpst
From: stevo@elroy.jpl.nasa.gov (Steve Groom)
Newsgroups: comp.sys.super,comp.arch,comp.parallel
Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine
Message-ID: <1991Jun3.233741.8570@elroy.jpl.nasa.gov>
Date: 3 Jun 91 23:37:41 GMT
References: <1991Jun3.130104.15667@hubcap.clemson.edu>
Sender: news@elroy.jpl.nasa.gov (Usenet)
Organization: Image Analysis Systems Group, JPL
Lines: 46
Approved: parallel@hubcap.clemson.edu
Originator: stevo@uniblab
Nntp-Posting-Host: uniblab.jpl.nasa.gov

In article <1991Jun3.130104.15667@hubcap.clemson.edu> baugh%ssd.intel.com@RELAY.
CS.NET (Jerry Baugh) writes:
>
>The LINPACK benchmark has often been used as one measure of comparison between
>machines, and most recently, a new section of the report, entitled 'Massively
>Parallel Computing' defines the same test, solve a dense set of linear
>equations, but allows for the problem sizes to scale with the size of the
>machine.  With the unveiling of the Touchstone Delta machine, Intel can now
>publish the following double precision performance numbers for massively
>parallel LINPACK:
[numbers deleted]

At first, I started to read this thinking "what the heck does LINPACK
have to do with the performance of a parallel computer other than
measuring the power of individual nodes?"  Then I started reading more
closely, and it appears that there's more to it than that.

Can someone explain how "massively parallel LINPACK" is different from
regular LINPACK?  What considerations for communication are made
in this benchmark?  Since LINPACK is normally used as a measure of number
crunching, I'm curious how this benchmark translates to parallel
computers.  As we all know (or we all should know), the performance
of a parallel computer is usually NOT the same as the performance of an
individual node multiplied by the number of nodes in the computer
(although we'd just love that to always be the case).
The obvious misapplication of this kind of benchmark would
be to multiply a single node's LINPACK performance by the number of nodes in
the machine.  I notice that the numbers posted in the above-referenced
article do not scale linearly with the number of nodes used, so there
is some efficiency loss from the single node case.  I'm itching to
find out what the source of this loss is.  This is of particular interest
as I am currently porting some existing parallel code to the Delta, and I'd
like to be able to handle the inevitable queries about "well, they say
the Delta does such-and-such LINPACK GFLOPS...".

Any explanation or references would be welcome.
-- 
Steve Groom, Jet Propulsion Laboratory, Pasadena, CA
stevo@elroy.jpl.nasa.gov  {ames,usc}!elroy!stevo
"... and the babe, she needs a chaaaa--nging..." (apologies to Bob Dylan)

-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell