Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!corton!mirsa!zig.inria.fr!furnish
From: furnish@zig.inria.fr (Geoffrey Furnish)
Newsgroups: comp.sys.super
Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine
Message-ID: <11771@mirsa.inria.fr>
Date: 13 Jun 91 12:09:46 GMT
References: <1991Jun5.120653.7852@hubcap.clemson.edu> <1991Jun05.185818.1071@convex.com> <1991Jun6.144903.20456@chpc.utexas.edu> <1991Jun06.205144.22611@ariel.unm.edu> <1991Jun10.144354.695@chpc.utexas.edu>
Sender: news@mirsa.inria.fr
Organization: INRIA, Sophia-Antipolis (Fr)
Lines: 163
Nntp-Posting-Host: zig.inria.fr

In article <1991Jun10.144354.695@chpc.utexas.edu>, gary@chpc.utexas.edu (Gary Smith) writes:
> In article <1991Jun06.205144.22611@ariel.unm.edu>, john@spectre.unm.edu (John Prentice) writes:
> |> In article <1991Jun6.144903.20456@chpc.utexas.edu> gary@chpc.utexas.edu (Gary Smith) writes:
> |>
> |> >Again, it's time for the advocates of the promise of massive parallelism
> |> >to acknowledge Synder's "corollary of modest potential."
> |> >
> |> 
> |> So what would you suggest as an alternative.  By this analysis, anyway
> |> you cut it, a serial processor will still take p times longer to do the
> |> same problem (of course, you have ignored overhead, but that works in 
> |> favor of your argument).  If I can do it 64,000 times faster on a CM-2
> |> and I don't have any choice but to do the problem, then I am going to
> |> use the CM-2.  The alternative is to just not do the problem. 
> |> 
> |> Your are both right and wrong about what the goal is in scientific
> |> computing.  For many applications, the goal isn't to run bigger problems,
> |> it is to make current ones less expensive.
> 
> Using Speedup(f,p) = 1/[(1-f)+(f/p)], with f being the fraction of code
> parallelized and p being the number of processors, and unrealistically
> assuming no overhead, a speedup of 64000 using 65536 processors requi-
> res that the problem be 99.9999634% parallel.  How many problems do you
> know of that are that parallel?
> 
> ---Gary
> 
> Randolph Gary Smith                       Internet: gary@chpc.utexas.edu
> Systems Group                             Phonenet: (512) 471-2411
> Center for High Performance Computing     Snailnet: 10100 Burnet Road
> The University of Texas System                      Austin, Texas 78758-4497

I have been reading this thread for a while and am finding it very 
interesting.  Especially since I am a graduate student in physics at
UT Austin, and am quite familiar with the center you
are affiliated with.  I am tempted to infer that the pessimism you are
expressing probably reflects the attitudes of the UT System procurement
plans.  If so, that is extremely unfortunate for researchers in the UT
system.

Although I am not familiar with the theoretical results you are citing, 
and indeed am quite interested in what they imply, there are a few things
which are clear to me.

First of all, I have been spending this spring doing serious scientific
computing on a Connection Machine 2 (located in France).  By this point I
have worked with it extensively for several months, and on two completely
different sorts of problems.

One of these is the time advancement of a field subject to a nasty nonlinear
PDE.  The CM with its massively parallel architecture is _EXTREMELY_ well
suited for this problem, and the results I have obtained are absolutely
mind boggling.  I have not computed MIPS or MFLOPS for this problem, but I
do know that the throughput is many many times better than I have ever
seen on any computer before in my life.

Secondly, I am not a green horn.  I spend lots of computing time on high
dollar Cray's out at the National Energy Research Supercomputing Center's
facilities.  I also occasionally use the facilites at the UT Center for
High Performance Computing.  And I can tell you now from personal experience
that for some types of problems, comparing a Cray to a CM-2 is like comparing
a tortoies to a chetah.  That is not a theoretical analysis.  It is an
"experimental" fact/observation.

Additionally, your last line above asks us to consider a speed up of 64000,
and then you presumably show how ridiculous this is.  Of course it is 
ridiculous.  No one needs to see a speed up of 64000 to justify the cost
of an MP architecture machine.  One only needs to see a speed up which
off sets the cost differential in the prices of the computers in question.
It's not clear to me that a CM-2 is much pricier than a Cray in the first
place, yet is _MUCH_ faster for at least some kinds of problems.  The speed
up I see is more like two orders of magnitude, not 4.5.

Your further assert/imply that no problem could be 99.999... % parallel.
I think this shows that you really don't understand what you're talking about.
In my field advancement problem, it is _ALL_ parallel. Period.  End of
discussion.  Even the real-time display of results is accomplished via
a parallel projection function.  The only part of the code which is not
parallel computation is control flow.  And that takes essentially zero time,
since there are no decison branchings in the solution algorithm.  There is
also history keeping, but that relates to i/o, not computer architecture.

The point is, Danny Hillis (president/founder of Thinking Machines, Inc.)
is right:  Many problems are _inherently_ parallel, and trying to solve them
with serial architecture computers introduces artificial and unneccessary
complexity.

Which brings me to my last point.  You seem to be completely oblivious to
the whole issue of ease of programming.  The CM comes with a slew of
languages which implement parallel programming operations in simple and
easy to understand ways, syntax, etc.  They have a compiler they call
C* which consists of very reasonable extensions to ANSI C to provide 
facilities for expressing parallel constructs and communication operations.
Substantial efforts are underway to standardize this language with other
vendors of MP hardware.  They also have an implementation of FORTRAN 90
which provides direct support for array processing.

The importance of this is that these products make progrmming in parallel 
_MUCH EASIER_ than programming for serial architecture machines like Cray's.
For example, elemental array arithmetic requires no subscripts.  On the CM
you can say:
	a = b * c
On a Cray you would have to say:
	do i = 1, n
		do j = 1, m
			a(i,j) = b(i,j) * c(i,j)
		enddo
	enddo

Big deal?  Well it is when you try to go and enhance your program in some
way.  In my case I was able to take my parallel program on the CM and 
convert it from a code which solved a field in two dimension to one which
solved it in three dimensions, by changing 4 lines of code which related
to the specification of the arrays, taking a total of about 30 minutes of
my time.

In a language written for a serial machine like a CRAY, etc, I'd have had
to change every single set of double loops into tripple loops, and add a
subscript to every single array reference in the program.  Something like
3 or 4 hundred places.  In my book, the opportunity for coding errors goes
as the exponential of the number of changes you make.  On the CM, using
a parallel version of C, I only had to change the data declearations.
The actual solution algorithm was not even touched.  On a CRAY I'd have
spent a couple of weeks making the necessary mods to my sources, and then
I'd have had to worry for months about possible coding errors.

Furthermore, using the parallel C on the CM, my solution  algorithm is 
actually  _COMPLETELY INDEPENDENT OF SYSTEM DIMENSIONALITY_.  To make
the code solve this field in 4 or 5 or n dimensions requires precisely
zero modifications to the source code algorithm.  I would have to change
the data decls, but again, that's four lines of code.  

The advantages of massive parallelism are many and great.  You should
endeavor to try it before hollering about what a farce it is.  In my book
there are two ways to benchmark computers.  One is theoretical computing
speed like MFLOPS, etc. In this department the CM completely overwhelms the
Cray for certain classes of programs.  The other is the pragmatic issue
of how long it takes to get a trustworthy program up and running, wall time
to completeion, ease of obtaining graphics, and the like.  Again,
using parallel expression, on problems well suited for it, the CM
beats the Cray hands down in the "relaible programming" department.  From
the standpoint of scientific visualization, the best tools I've ever seen
run on the CM.  One of the reasons I don't use UT CHPC more is because of
the infuriating difficulty of getting graphical output.  In contrast, 
Thinking Machines provides direct and easy to use support for X; so much so
that you can render images on their high speed graphics device or in an
X window on a networkd workstation _WITHOUT MODIFYING A SINGLE LINE OF CODE_.
Ask Cray to do that for you!

Is massive parallelism the solution to all mankind's computing needs?
Probably not.  I am sure the theoretical concerns you have cited have their
domain of applicability.  But, I am also certain that when I want to get
work done, I can get it done faster, more reliably and with less effort on
a CM-2 than I can on a Cray.  That's just the way it is.

For the benefit of the UT scientific research community and beyond, I urge
you and others who hold your skepticism, to reach a little beyond the
confines of Cray's marketting grasp, and take a more informed look at
what MP has to offer.  I for one have been extremely impressed by what I
found under the hood of the CM-2.

Geoffrey Furnish
furnish@solar.ph.utexas.edu