Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!think.com!snorkelwacker.mit.edu!bloom-beacon!eru!kth.se!sunic!mcsun!corton!mirsa!zig.inria.fr!furnish From: furnish@zig.inria.fr (Geoffrey Furnish) Newsgroups: comp.sys.super Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine Message-ID: <11806@mirsa.inria.fr> Date: 17 Jun 91 08:18:07 GMT References: <1991Jun6.144903.20456@chpc.utexas.edu> <1991Jun06.205144.22611@ariel.unm.edu> <1991Jun10.144354.695@chpc.utexas.edu> <1991Jun10.235501.7039@ariel.unm.edu> Sender: news@mirsa.inria.fr Organization: INRIA, Sophia-Antipolis (Fr) Lines: 128 Nntp-Posting-Host: zig.inria.fr This is a followup to my prior posting in which I related my experiences in using the CM-2. Since that posting I have received a deluge of private mail containing comments of all sorts. In particular it has come to my attention that some of the statements I made in that posting were not correct. In the following I present corrected information supplied to me by an employee of Cray, and some comments of my own. Assuming that I get all the mistakes worked out with this posting, I don't intend to continue my part of this discussion further on the net, but would still be happy to receive comments from other interested parties. My appologies for the confusion I introduced. ---------- >> you can say: >> a = b * c >> On a Cray you would have to say: >> do i = 1, n >> do j = 1, m >> a(i,j) = b(i,j) * c(i,j) >> enddo >> enddo > >A subset of the Fortran-90 array syntax has been in CFT77 since >version 1.0 (about 1986). If you would like to express your problems >this way, there is nothing stopping you! I was unaware that CFT77 provides partial Fortran 90 support, and thanks for setting me straight. I am sure we will all benefit by the widespread availability of this enhanced language definition on all platforms from workstations to supercomputers. May that day come quickly! >Many believe that Unicos has excellent development tools for both Fortran and >C. Both are far beyond what a 'normal' Unix system would provide. Many are >X window based (cdbx, and the *view tools for example). Having not used Unicos, I was unaware of this also. So far all Cray systems I've used have run either COS or CTSS. I am aware that there is a growing movement among supercomputing facilities to move to Unicos, and it sounds like we will benefit from this trend. >Also comparing a 'stand-alone' connection machine use (they aren't *really* >timeshared...) with a highly utilized Cray system is an Apples to Oranges >comparison. I'm not sure what the author means by *really* in this case. I do know that the CM-2 I use every day comes in two "halves." One half is single tasking during business hours, and the other provides what I consider to be genuine timesharing capability during business hours. By this I mean that several users can simultaneously use this half of the machine, simultaneously running programs and obtaining results. Obviously it isn't as fast as when you operate in single task mode, but that is only natural. My understanding is that this multi user capability is accomplished by swapping user's session in and out, rather than making them share resources. I believe this is by definition timesharing, as opposed to multitasking. Both halves run batch queues at night, but you can slip in interactive sessions between batch jobs if you're sly. >Stand-alone Cray systems can provide excellent turnaround to >a single user as well. If you have DARPA to provide you with an expensive >play toy, a CM can make sense. Most of our customers are not so lucky. No doubt. On the other hand, neither am I. The CM-2 I have been using is shared by a large number of researchers. The user load is similar to what I have experienced when using CRAY's at US based supercomputer centers. My claim is that the CM has provided the most productive research environment that I have used to date. I obviously can't speak for anyone else, but it works for me. >> Thinking Machines provides direct and easy to use support for X; so much so >> that you can render images on their high speed graphics device or in an >> X window on a networkd workstation _WITHOUT MODIFYING A SINGLE LINE OF CODE_. >> Ask Cray to do that for you! > >This is also an interesting comment. X has been available on Unicos since >version 3.0 (1987) - first X10, and currently X11 (R4). Motif and OpenLook >are also readily available. Again what is the problem? Again, I have not used Unicos, so was not aware of this capability. Sounds like all Cray users would be a lot better off if all Cray's ran Unicos. To wrap up my contribution to this thread, let me say that it was certainly not my intention to under-represent Cray products. I can only comment on my own experiences, and all of these things were outside of my experience. Furthermore, parallelism has many facets, and runs a wide gamut from coarse grained (multi processors like Y-MP's and others) to fine grained (like the CM and others). Each system has applicability to a class of problems, and those of us doing research with parallel computing are continually finding new ways to use systems of both descriptions. I think the point that I and several others have been trying to make is that MP in particular has a whole lot more capability to offer than most people realize. When I first sat down and read the technical description of the CM and its SIMD programming model I thought "well, that's interesting, but it would be better if ..." Then I sat down and started programming it. And I realized much to my own surprise that a very large percentage of the things I am interested in are very naturally and efficiently expressed and solved in the SIMD model. In fact, there is nothing that I am interested in which is more easily/naturally expressed using any other paradigm. MIMD machines in particular (I have been reminded of the origins of this thread) may provide very significant per processor performance, that is true. But then you have to introduce all kinds of synchronization code to manage the dispersal of intermediate results. This introduces time delay, (overhead referred to in numerous prior postings) and what I consider to be more important--added complexity. Additionally, in order to squeeze good performance out of such systems throughout the course of an application's exectuion, it is often necessary to dynamically repartition the problem to balance the load on each processor so that they each perform their share in a comparble amount of time. This adds even more artificial complexity to the problem. It seems to me that a lot of the discussion about supercomputer performance is made in reference to performance on certain canned applications like Linpack or NASTRAN or the like. While these things do reflect the needs of a large body of users, there is another group who NEVER run canned programs, but rather spend the vast majority of our time writing our own code to solve our own problems. For those of us in that category, M/GFLOPS may very legitimately take a back seat to more important and less discussed issues like code complexity and reliability. NASTRAN may have been debugged in 1965, but what about the nuclear reactor simulator you are writing yourself? For these people, the expressive capability and simplicity of SIMD programming is not a matter of convenience, but of necessity. That such machines can yield superior performance too for a large class of problems, is icing on the cake. To each his own. Geoff Furnish furnish@solar.ph.utexas.edu