Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!know!news.cs.indiana.edu!ariel.unm.edu!ghostwheel.unm.edu!john From: john@ghostwheel.unm.edu (John Prentice) Newsgroups: comp.lang.fortran Subject: Re: Missing the whole point (the Fortran vs. C debate) Message-ID: <1990Dec6.193644.12920@ariel.unm.edu> Date: 6 Dec 90 19:36:44 GMT References: <28548@usc> <185951.1857@timbuk.cray.com> <1990Dec5.182145.2639@ariel.unm.edu> <9424@ncar.ucar.edu> Sender: news@ariel.unm.edu (USENET News System) Organization: University of New Mexico Math Dept., Albuquerque, NM Lines: 123 In article <9424@ncar.ucar.edu> morreale@bierstadt.scd.ucar.edu (Peter Morreale) writes: >In article <1990Dec5.182145.2639@ariel.unm.edu>, john@ghostwheel.unm.edu (John Prentice) writes: >> >> Of couse, Cray Fortran has had constructs (the CDIR directives) for telling >> the compiler to vectorize a loop since the beginning. However, in general >> I agree. > > The Cray Fortran compilers will vectorize *every* loop (which meets > vectorization criteria) by default. The programmer doesn't need to > make any modifications to his code. (although most do to obtain > increased performance, but non-portable constructs are not used or > needed) > This is true, but the problem is the vectorization crieria. The Cray compiler is much better at sensing when a loop is vectorizable than it used to be, but one can still construct cases where it is unable to resolve what appear to it to be vector dependencies but which are in fact not. That is why Cray provides the CDIR directives in the first place. An interesting flip side on the Cray is if you have a short loop. Often you need to inhibit vectorization because the overhead exceeds the savings of vectorization. If you loop with something like do 10 i=1,n the compiler has no way to know that n is small and the loop should not be vectorized. You have to go in and tell it what to do by hand. The Convex compiler is alot better at vectorizing than the Cray one is by the way. It can vectorize nexted do-loops with it for example, something Cray has never been able to do. But Cray has never been famous for their software. >> With regard to expressing parallelism, the people at Myrias had >> the easiest expression in Fortran that I have encountered yet. If you >> wanted a parallel do-loop, you said pardo instead of do. That was it. They > > Sounds like a very non-portable construct. The Cray method of > obtaining parallelism is to add directives which appear as Fortran > comment cards. The directives are interpreted by source code > analyzers and translated into system calls. > No argument, the Myrias approach is not portable, but it is easy (of course getting it to run efficiently may not be). We handled the portability problem using the C preprocessor to either use a do or pardo depending on the target computer. >> >> I don't know how Cray does parallel constructs. But I hardly know anyone >> who tries to do serious parallel programming on the Cray. Unless you are >> among the chosen few who can get dedicated Cray time, it is not cost effective. > > Hummm... I do question this statement. If I am executing a code > on a Cray which only runs for a few minutes, you bet, it's not cost > effective. > > How about ocean and climate models which run for literally > *hundreds* of Cray CPU hours? (say a thousand wallclock hours) If > I can reduce the turnaround time by a factor of 3 or 4, 6, is it > worth it? If I can, perhaps I can increase the resolution of the > model in the first place. Does better science result from > increased resolution? In addition, I get 64bit results on the > Cray for every calculation in single precision mode. Is this > important? > > In the Cray, I can get subroutines, and/or do loop iterations > passed across multiple CPUs. > I have no quarrel with the goal of using parallelism to reduce any measure of the time required by a calculation. It has just been my experience (and that of my colleagues at Sandia) that you don't get there using a Cray unless the system is idle but for you job. I do know of people doing it at Los Alamos and getting good results using the YMP, but again, these people virtually own those systems. My point is not whether there is an advantage to exploting parallelism (quite the opposite!), it is whether the Cray is the system to do it on. By the way, our applications take hundreds of hours of Cray time also and our limitation is not wall clock time, it is money. Even using cheap DOE lab Cray computer time, these calculations often cost us $50,000. Cray parallelism has not helped there usually because the only cost advantage it offers is reduction of the memory integral. We get charged per processor, so if we have N processors and the wall clock time is now N times less (which it won't be obviously), the summed CPU time is the same. All that has been saved is the memory integral since it is a shared memory system. However, we have not been able to get all the processors at one time typically (on a crowded system), so we lose out there too. The final point I would make, the limitations on resolution are just as serious due to unavailablity of memory and disk as they are by the time it takes to run a calculation. Our finite difference codes use meshes with many million cells, each carrying 20 or so variables. You don't have to dump very many cycles before you exceed the avaiable disk. This is a problem facing all big computing that has not been adequately addressed. >> We tried it on the Cray 2 and on the YMP a couple years ago. We were rarely >> able to get all the processors at one time. Also, you don't have very >> many processors. > > So? On a Connection machine with 64k processors, you only get the > parallel region of the code executed on those processors. The > serial portion of the code is excecuted on a front-end. For a > highly parallel code, you get good results, for a "typical" user > code, you get front-end speeds. > Again, no question about it. However, the sorts of calculations you quoted earlier (weather, etc...) ARE highly parallelizable. This is the same argument people use against vectorization, yet I don't see any massive rush from the scientific community to abondon it. For the "typical" user code, yes, you do get front-end speeds. But given that workstations give floating point performance within a factor of 2 to 5 of a single processor YMP (for non-vectorized code, which the "typical" user code is), why does the "typical" user need a Cray? The Cray is dynamite for the really big calculation that vectorizes like crazy or that requires tons of memory, but that is the exception, not the rule for "typical" users (who after all are most of the people out there). The old comment we used to make at Los Alamos was that 10% of the people in the lab used 90% of the computing resources. John Prentice Amparo Corporation Albuquerque, NM john@unmfys.unm.edu