Xref: utzoo comp.sys.super:382 comp.arch:23157 comp.parallel:2638
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!malgudi!caen!uflorida!gatech!hubcap!fpst
From: prins@cs.unc.edu (Jan Prins)
Newsgroups: comp.sys.super,comp.arch,comp.parallel
Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine
Message-ID: <1991Jun8.200043.15944@hubcap.clemson.edu>
Date: 7 Jun 91 17:06:16 GMT
References: <1991Jun3.233741.8570@elroy.jpl.nasa.gov> <13301@pt.cs.cmu.edu> <ELIAS.91Jun6090922@wonton.TC.Cornell.EDU> <1991Jun6.174129.25202@hubcap.clemson.edu>
Sender: news@cs.unc.edu
Followup-To: comp.sys.super
Organization: UNC-Chapel Hill Computer Science
Lines: 20
Approved: parallel@hubcap.clemson.edu


In article <1991Jun6.174129.25202@hubcap.clemson.edu>, dodson@convex.COM (Dave Dodson) writes:
> >   "FLOPS is defined as  (2/3 N^3 + 2 N^2) / elapsed-time",
> 
> What is interesting about this is that there are algorithms based on
> "fast" matrix multiplication, where the product of two K by K matrices
> can be formed with fewer than O(K^3) floating point operations.  If you
> use one of these fast algorithms, you may do significantly fewer than
> (2/3 N^3 + 2 N^2) floating point operations, but you get credit for
> (2/3 N^3 + 2 N^2) operations anyway.  [...]

In particular, using this months asymptotically fastest solver, your
lowly workstation can beat the LINPACK performance of *any* machine 
whose performance was obtained through use of the standard algorithm, 
provided enough time and space.

Of course it's not clear whether you are allowed to report a record
performance 10,000 years before it completes.

--\--  Jan Prins  (prins@cs.unc.edu)  
  /    Computer Science Dept. 
--\--  UNC Chapel Hill