Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!convex!usenet
From: patrick@convex.COM (Patrick F. McGehearty)
Newsgroups: comp.sys.super
Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine
Message-ID: <1991Jun05.185818.1071@convex.com>
Date: 5 Jun 91 18:58:18 GMT
References: <1991Jun3.130104.15667@hubcap.clemson.edu> <1991Jun3.233741.8570@elroy.jpl.nasa.gov> <1991Jun5.120653.7852@hubcap.clemson.edu>
Sender: usenet@convex.com (news access account)
Reply-To: patrick@convex.COM (Patrick F. McGehearty)
Organization: CONVEX Computer Corporation, Richardson, Tx., USA
Lines: 35
Nntp-Posting-Host: mozart.convex.com

In article <1991Jun5.120653.7852@hubcap.clemson.edu> baugh%ssd.intel.com@RELAY.CS.NET (Jerry Baugh) writes:
>
>One of the reasons we are proud of our numbers here at Intel is that
>we got a high GFLOP number with a relatively small problem size (we can
>still increase the problem size as we have not yet run out of physical
>memory).
>
And rightfully so.  It's relatively easy to get a high GFLOP number
(assuming the raw machine power exists) for a sufficiently large
problem, assuming you can hook enough machines together and wait the
necessary days for the final solution.  There are effective linear equation
solver techniques with communication times proportional to the square
of the problem size, while the computation time is proportion to the
cube of the problem size.  So for sufficiently large problems, a gaggle
of workstations on a cluster of Ethernets could give GFLOP numbers
if someone wanted to take the trouble to run the test (say at one of
the universities with lots of workstations, like Carnegie Mellon over a
holiday weekend).

Another challenge I would like to see would be one which uses an efficent,
portable algorithm for linear equation solving written in a high level
language such as Fortran.  Then we could get some idea of what is achievable
on a machine without resorting to assembly language.

I know that what's most efficent on each machine varies with the
architecture, so maybe the rules could be that the portability requirement
is met by requiring that program be "standard" which could be determined
by running it on some commonly available workstations (with different
instruction sets from the target machine), but the supercomputer vendor
could select and implement the algorithm of their choice.  This approach
would provide some information about the ability of a compiler to get
the most out of a machine under ideal circumstances.  The requirement
that it also run on some other machine is intended to avoid machine
specific system calls.  Does this benchmark idea sound interesting to
anyone else?