Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!think.com!snorkelwacker.mit.edu!bloom-beacon!eru!kth.se!sunic!mcsun!corton!mirsa!zig.inria.fr!furnish
From: furnish@zig.inria.fr (Geoffrey Furnish)
Newsgroups: comp.sys.super
Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine
Message-ID: <11806@mirsa.inria.fr>
Date: 17 Jun 91 08:18:07 GMT
References: <1991Jun6.144903.20456@chpc.utexas.edu> <1991Jun06.205144.22611@ariel.unm.edu> <1991Jun10.144354.695@chpc.utexas.edu> <1991Jun10.235501.7039@ariel.unm.edu>
Sender: news@mirsa.inria.fr
Organization: INRIA, Sophia-Antipolis (Fr)
Lines: 128
Nntp-Posting-Host: zig.inria.fr

This is a followup to my prior posting in which I related my experiences
in using the CM-2.  Since that posting I have received a deluge of private
mail containing comments of all sorts.  In particular it has come to my
attention that some of the statements I made in that posting were not correct.
In the following I present corrected information supplied to me by an
employee of Cray, and some comments of my own.  Assuming that I get all the
mistakes worked out with this posting, I don't intend to continue my part
of this discussion further on the net, but would still be happy to receive
comments from other interested parties.  My appologies for the confusion I
introduced.
----------
>> you can say:
>> 	a = b * c
>> On a Cray you would have to say:
>> 	do i = 1, n
>> 		do j = 1, m
>> 			a(i,j) = b(i,j) * c(i,j)
>> 		enddo
>> 	enddo
>
>A subset of the Fortran-90 array syntax has been in CFT77 since
>version 1.0 (about 1986).  If you would like to express your problems
>this way, there is nothing stopping you!

I was unaware that CFT77 provides partial Fortran 90 support, and
thanks for setting me straight.  I am sure we will all benefit by the
widespread availability of this enhanced language definition on all
platforms from workstations to supercomputers.  May that day come quickly!

>Many believe that Unicos has excellent development tools for both Fortran and
>C.  Both are far beyond what a 'normal' Unix system would provide.  Many are
>X window based (cdbx, and the *view tools for example).  

Having not used Unicos, I was unaware of this also.  So far all Cray systems
I've used have run either COS or CTSS.  I am aware that there is a growing
movement among supercomputing facilities to move to Unicos, and it sounds
like we will benefit from this trend.

>Also comparing a 'stand-alone' connection machine use (they aren't *really*
>timeshared...) with a highly utilized Cray system is an Apples to Oranges
>comparison.  

I'm not sure what the author means by *really* in this case.  I do know that
the CM-2 I use every day comes in two "halves."  One half is single tasking
during business hours, and the other provides what I consider to be genuine
timesharing capability during business hours.  By this I mean that several
users can simultaneously use this half of the machine, simultaneously running
programs and obtaining results.  Obviously it isn't as fast as when you 
operate in single task mode, but that is only natural.  My understanding is
that this multi user capability is accomplished by swapping user's session
in and out, rather than making them share resources.  I believe this is by
definition timesharing, as opposed to multitasking.  Both halves run batch
queues at night, but you can slip in interactive sessions between batch
jobs if you're sly.

>Stand-alone Cray systems can provide excellent turnaround to
>a single user as well.  If you have DARPA to provide you with an expensive
>play toy, a CM can make sense.  Most of our customers are not so lucky.

No doubt.  On the other hand, neither am I.  The CM-2 I have been using
is shared by a large number of researchers.  The user load is similar
to what I have experienced when using CRAY's at US based supercomputer
centers.  My claim is that the CM has provided the most productive research 
environment that I have used to date.  I obviously can't speak for anyone
else, but it works for me.

>> Thinking Machines provides direct and easy to use support for X; so much so
>> that you can render images on their high speed graphics device or in an
>> X window on a networkd workstation _WITHOUT MODIFYING A SINGLE LINE OF CODE_.
>> Ask Cray to do that for you!
>
>This is also an interesting comment.  X has been available on Unicos since
>version 3.0 (1987) - first X10, and currently X11 (R4).  Motif and OpenLook
>are also readily available.  Again what is the problem? 

Again, I have not used Unicos, so was not aware of this capability.  Sounds
like all Cray users would be a lot better off if all Cray's ran Unicos.

To wrap up my contribution to this thread, let me say that it was certainly
not my intention to under-represent Cray products.  I can only comment on
my own experiences, and all of these things were outside of my experience.

Furthermore, parallelism has many facets, and runs a wide gamut from coarse
grained (multi processors like Y-MP's and others) to fine grained (like the
CM and others).  Each system has applicability to a class of problems, and
those of us doing research with parallel computing are continually finding
new ways to use systems of both descriptions.  I think the point that I and
several others have been trying to make is that MP in particular has a whole
lot more capability to offer than most people realize.

When I first sat down and read the technical description of the CM and its
SIMD programming model I thought "well, that's interesting, but it would
be better if ..."  Then I sat down and started programming it.  And I realized
much to my own surprise that a very large percentage of the things I am
interested in are very naturally and efficiently expressed and solved in the
SIMD model.  In fact, there is nothing that I am interested in which is
more easily/naturally expressed using any other paradigm.  

MIMD machines in particular (I have been reminded of the origins of this
thread) may provide very significant per processor performance, that is
true.  But then you have to introduce all kinds of synchronization code to
manage the dispersal of intermediate results.  This introduces time delay,
(overhead referred to in numerous prior postings) and what I consider to be
more important--added complexity.  Additionally, in order to squeeze good
performance out of such systems throughout the course of an application's
exectuion, it is often necessary to dynamically repartition the problem to
balance the load on each processor so that they each perform their share in
a comparble amount of time.  This adds even more artificial complexity to
the problem. 

It seems to me that a lot of the discussion about supercomputer performance
is made in reference to performance on certain canned applications like
Linpack or NASTRAN or the like. While these things do reflect the needs of
a large body of users, there is another group who NEVER run canned
programs, but rather spend the vast majority of our time writing our own
code to solve our own problems.  For those of us in that category, M/GFLOPS
may very legitimately take a back seat to more important and less discussed
issues like code complexity and reliability.  NASTRAN may have been
debugged in 1965, but what about the nuclear reactor simulator you are
writing yourself?  For these people, the expressive capability and
simplicity of SIMD programming is not a matter of convenience, but of
necessity.  That such machines can yield superior performance too for a
large class of problems, is icing on the cake. 

To each his own.

Geoff Furnish
furnish@solar.ph.utexas.edu