Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!convex!usenet From: patrick@convex.COM (Patrick F. McGehearty) Newsgroups: comp.sys.super Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine Message-ID: <1991Jun05.185818.1071@convex.com> Date: 5 Jun 91 18:58:18 GMT References: <1991Jun3.130104.15667@hubcap.clemson.edu> <1991Jun3.233741.8570@elroy.jpl.nasa.gov> <1991Jun5.120653.7852@hubcap.clemson.edu> Sender: usenet@convex.com (news access account) Reply-To: patrick@convex.COM (Patrick F. McGehearty) Organization: CONVEX Computer Corporation, Richardson, Tx., USA Lines: 35 Nntp-Posting-Host: mozart.convex.com In article <1991Jun5.120653.7852@hubcap.clemson.edu> baugh%ssd.intel.com@RELAY.CS.NET (Jerry Baugh) writes: > >One of the reasons we are proud of our numbers here at Intel is that >we got a high GFLOP number with a relatively small problem size (we can >still increase the problem size as we have not yet run out of physical >memory). > And rightfully so. It's relatively easy to get a high GFLOP number (assuming the raw machine power exists) for a sufficiently large problem, assuming you can hook enough machines together and wait the necessary days for the final solution. There are effective linear equation solver techniques with communication times proportional to the square of the problem size, while the computation time is proportion to the cube of the problem size. So for sufficiently large problems, a gaggle of workstations on a cluster of Ethernets could give GFLOP numbers if someone wanted to take the trouble to run the test (say at one of the universities with lots of workstations, like Carnegie Mellon over a holiday weekend). Another challenge I would like to see would be one which uses an efficent, portable algorithm for linear equation solving written in a high level language such as Fortran. Then we could get some idea of what is achievable on a machine without resorting to assembly language. I know that what's most efficent on each machine varies with the architecture, so maybe the rules could be that the portability requirement is met by requiring that program be "standard" which could be determined by running it on some commonly available workstations (with different instruction sets from the target machine), but the supercomputer vendor could select and implement the algorithm of their choice. This approach would provide some information about the ability of a compiler to get the most out of a machine under ideal circumstances. The requirement that it also run on some other machine is intended to avoid machine specific system calls. Does this benchmark idea sound interesting to anyone else?