Xref: utzoo comp.sys.super:363 comp.arch:23073 comp.parallel:2614 Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!hubcap!fpst From: stevo@elroy.jpl.nasa.gov (Steve Groom) Newsgroups: comp.sys.super,comp.arch,comp.parallel Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine Message-ID: <1991Jun3.233741.8570@elroy.jpl.nasa.gov> Date: 3 Jun 91 23:37:41 GMT References: <1991Jun3.130104.15667@hubcap.clemson.edu> Sender: news@elroy.jpl.nasa.gov (Usenet) Organization: Image Analysis Systems Group, JPL Lines: 46 Approved: parallel@hubcap.clemson.edu Originator: stevo@uniblab Nntp-Posting-Host: uniblab.jpl.nasa.gov In article <1991Jun3.130104.15667@hubcap.clemson.edu> baugh%ssd.intel.com@RELAY. CS.NET (Jerry Baugh) writes: > >The LINPACK benchmark has often been used as one measure of comparison between >machines, and most recently, a new section of the report, entitled 'Massively >Parallel Computing' defines the same test, solve a dense set of linear >equations, but allows for the problem sizes to scale with the size of the >machine. With the unveiling of the Touchstone Delta machine, Intel can now >publish the following double precision performance numbers for massively >parallel LINPACK: [numbers deleted] At first, I started to read this thinking "what the heck does LINPACK have to do with the performance of a parallel computer other than measuring the power of individual nodes?" Then I started reading more closely, and it appears that there's more to it than that. Can someone explain how "massively parallel LINPACK" is different from regular LINPACK? What considerations for communication are made in this benchmark? Since LINPACK is normally used as a measure of number crunching, I'm curious how this benchmark translates to parallel computers. As we all know (or we all should know), the performance of a parallel computer is usually NOT the same as the performance of an individual node multiplied by the number of nodes in the computer (although we'd just love that to always be the case). The obvious misapplication of this kind of benchmark would be to multiply a single node's LINPACK performance by the number of nodes in the machine. I notice that the numbers posted in the above-referenced article do not scale linearly with the number of nodes used, so there is some efficiency loss from the single node case. I'm itching to find out what the source of this loss is. This is of particular interest as I am currently porting some existing parallel code to the Delta, and I'd like to be able to handle the inevitable queries about "well, they say the Delta does such-and-such LINPACK GFLOPS...". Any explanation or references would be welcome. -- Steve Groom, Jet Propulsion Laboratory, Pasadena, CA stevo@elroy.jpl.nasa.gov {ames,usc}!elroy!stevo "... and the babe, she needs a chaaaa--nging..." (apologies to Bob Dylan) -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell