Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!corton!mirsa!zig.inria.fr!furnish From: furnish@zig.inria.fr (Geoffrey Furnish) Newsgroups: comp.sys.super Subject: Re: Massively Parallel LINPACK on the Intel Touchstone Delta machine Message-ID: <11771@mirsa.inria.fr> Date: 13 Jun 91 12:09:46 GMT References: <1991Jun5.120653.7852@hubcap.clemson.edu> <1991Jun05.185818.1071@convex.com> <1991Jun6.144903.20456@chpc.utexas.edu> <1991Jun06.205144.22611@ariel.unm.edu> <1991Jun10.144354.695@chpc.utexas.edu> Sender: news@mirsa.inria.fr Organization: INRIA, Sophia-Antipolis (Fr) Lines: 163 Nntp-Posting-Host: zig.inria.fr In article <1991Jun10.144354.695@chpc.utexas.edu>, gary@chpc.utexas.edu (Gary Smith) writes: > In article <1991Jun06.205144.22611@ariel.unm.edu>, john@spectre.unm.edu (John Prentice) writes: > |> In article <1991Jun6.144903.20456@chpc.utexas.edu> gary@chpc.utexas.edu (Gary Smith) writes: > |> > |> >Again, it's time for the advocates of the promise of massive parallelism > |> >to acknowledge Synder's "corollary of modest potential." > |> > > |> > |> So what would you suggest as an alternative. By this analysis, anyway > |> you cut it, a serial processor will still take p times longer to do the > |> same problem (of course, you have ignored overhead, but that works in > |> favor of your argument). If I can do it 64,000 times faster on a CM-2 > |> and I don't have any choice but to do the problem, then I am going to > |> use the CM-2. The alternative is to just not do the problem. > |> > |> Your are both right and wrong about what the goal is in scientific > |> computing. For many applications, the goal isn't to run bigger problems, > |> it is to make current ones less expensive. > > Using Speedup(f,p) = 1/[(1-f)+(f/p)], with f being the fraction of code > parallelized and p being the number of processors, and unrealistically > assuming no overhead, a speedup of 64000 using 65536 processors requi- > res that the problem be 99.9999634% parallel. How many problems do you > know of that are that parallel? > > ---Gary > > Randolph Gary Smith Internet: gary@chpc.utexas.edu > Systems Group Phonenet: (512) 471-2411 > Center for High Performance Computing Snailnet: 10100 Burnet Road > The University of Texas System Austin, Texas 78758-4497 I have been reading this thread for a while and am finding it very interesting. Especially since I am a graduate student in physics at UT Austin, and am quite familiar with the center you are affiliated with. I am tempted to infer that the pessimism you are expressing probably reflects the attitudes of the UT System procurement plans. If so, that is extremely unfortunate for researchers in the UT system. Although I am not familiar with the theoretical results you are citing, and indeed am quite interested in what they imply, there are a few things which are clear to me. First of all, I have been spending this spring doing serious scientific computing on a Connection Machine 2 (located in France). By this point I have worked with it extensively for several months, and on two completely different sorts of problems. One of these is the time advancement of a field subject to a nasty nonlinear PDE. The CM with its massively parallel architecture is _EXTREMELY_ well suited for this problem, and the results I have obtained are absolutely mind boggling. I have not computed MIPS or MFLOPS for this problem, but I do know that the throughput is many many times better than I have ever seen on any computer before in my life. Secondly, I am not a green horn. I spend lots of computing time on high dollar Cray's out at the National Energy Research Supercomputing Center's facilities. I also occasionally use the facilites at the UT Center for High Performance Computing. And I can tell you now from personal experience that for some types of problems, comparing a Cray to a CM-2 is like comparing a tortoies to a chetah. That is not a theoretical analysis. It is an "experimental" fact/observation. Additionally, your last line above asks us to consider a speed up of 64000, and then you presumably show how ridiculous this is. Of course it is ridiculous. No one needs to see a speed up of 64000 to justify the cost of an MP architecture machine. One only needs to see a speed up which off sets the cost differential in the prices of the computers in question. It's not clear to me that a CM-2 is much pricier than a Cray in the first place, yet is _MUCH_ faster for at least some kinds of problems. The speed up I see is more like two orders of magnitude, not 4.5. Your further assert/imply that no problem could be 99.999... % parallel. I think this shows that you really don't understand what you're talking about. In my field advancement problem, it is _ALL_ parallel. Period. End of discussion. Even the real-time display of results is accomplished via a parallel projection function. The only part of the code which is not parallel computation is control flow. And that takes essentially zero time, since there are no decison branchings in the solution algorithm. There is also history keeping, but that relates to i/o, not computer architecture. The point is, Danny Hillis (president/founder of Thinking Machines, Inc.) is right: Many problems are _inherently_ parallel, and trying to solve them with serial architecture computers introduces artificial and unneccessary complexity. Which brings me to my last point. You seem to be completely oblivious to the whole issue of ease of programming. The CM comes with a slew of languages which implement parallel programming operations in simple and easy to understand ways, syntax, etc. They have a compiler they call C* which consists of very reasonable extensions to ANSI C to provide facilities for expressing parallel constructs and communication operations. Substantial efforts are underway to standardize this language with other vendors of MP hardware. They also have an implementation of FORTRAN 90 which provides direct support for array processing. The importance of this is that these products make progrmming in parallel _MUCH EASIER_ than programming for serial architecture machines like Cray's. For example, elemental array arithmetic requires no subscripts. On the CM you can say: a = b * c On a Cray you would have to say: do i = 1, n do j = 1, m a(i,j) = b(i,j) * c(i,j) enddo enddo Big deal? Well it is when you try to go and enhance your program in some way. In my case I was able to take my parallel program on the CM and convert it from a code which solved a field in two dimension to one which solved it in three dimensions, by changing 4 lines of code which related to the specification of the arrays, taking a total of about 30 minutes of my time. In a language written for a serial machine like a CRAY, etc, I'd have had to change every single set of double loops into tripple loops, and add a subscript to every single array reference in the program. Something like 3 or 4 hundred places. In my book, the opportunity for coding errors goes as the exponential of the number of changes you make. On the CM, using a parallel version of C, I only had to change the data declearations. The actual solution algorithm was not even touched. On a CRAY I'd have spent a couple of weeks making the necessary mods to my sources, and then I'd have had to worry for months about possible coding errors. Furthermore, using the parallel C on the CM, my solution algorithm is actually _COMPLETELY INDEPENDENT OF SYSTEM DIMENSIONALITY_. To make the code solve this field in 4 or 5 or n dimensions requires precisely zero modifications to the source code algorithm. I would have to change the data decls, but again, that's four lines of code. The advantages of massive parallelism are many and great. You should endeavor to try it before hollering about what a farce it is. In my book there are two ways to benchmark computers. One is theoretical computing speed like MFLOPS, etc. In this department the CM completely overwhelms the Cray for certain classes of programs. The other is the pragmatic issue of how long it takes to get a trustworthy program up and running, wall time to completeion, ease of obtaining graphics, and the like. Again, using parallel expression, on problems well suited for it, the CM beats the Cray hands down in the "relaible programming" department. From the standpoint of scientific visualization, the best tools I've ever seen run on the CM. One of the reasons I don't use UT CHPC more is because of the infuriating difficulty of getting graphical output. In contrast, Thinking Machines provides direct and easy to use support for X; so much so that you can render images on their high speed graphics device or in an X window on a networkd workstation _WITHOUT MODIFYING A SINGLE LINE OF CODE_. Ask Cray to do that for you! Is massive parallelism the solution to all mankind's computing needs? Probably not. I am sure the theoretical concerns you have cited have their domain of applicability. But, I am also certain that when I want to get work done, I can get it done faster, more reliably and with less effort on a CM-2 than I can on a Cray. That's just the way it is. For the benefit of the UT scientific research community and beyond, I urge you and others who hold your skepticism, to reach a little beyond the confines of Cray's marketting grasp, and take a more informed look at what MP has to offer. I for one have been extremely impressed by what I found under the hood of the CM-2. Geoffrey Furnish furnish@solar.ph.utexas.edu