Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!gatech!hubcap!fouts From: fouts@lemming. (Marty Fouts) Newsgroups: comp.parallel Subject: Re: Capabilities of VAST Message-ID: <3846@hubcap.UUCP> Date: 12 Dec 88 18:02:37 GMT Sender: fpst@hubcap.UUCP Lines: 56 Approved: parallel@hubcap.clemson.edu In article <3840@hubcap.UUCP> mccalpin@loligo.fsu.edu (John McCalpin) writes: In article <3826@hubcap.UUCP> fouts@lemming. (Marty Fouts) writes: >Also, on a certain Navier-Stokes solver, Vast busily inserted >scatter/gather into various loops to cause large vectors to be made >available, which is A Good Thing on the ETA10. Unfortunately, the >scatters and gathers pretty much trashed any chance of working set >locality which is A Bat Thing on the ETA10. Adding directives to >prevent the scatter/gather overcame Vast's problem, as did adding >directives in the dense matrix inverter. However, in both cases, >nontrivial amounts of expert human intervention were required. If the loops are accessing variables in a non-local manner, then the problem is with the programmer, not VAST. The *problem* is *not* with the programmer, but with the universe the programmer is required to operate in. It is often necessary to write algorithms which need to make two passes at data, one in the row order, and one in the column order. Independent on the language one of these passes will be "non-local". . . This is the case with this solver. >For some user communities Vast can be a good win, but if your >community contains a lot of vectorization expertise and the codes were >already well prepared, Vast can be a disaster. I don't know what you mean by this. "Well prepared" for what? Perhaps a Cray? An example of "Well prepared" is the code LES (Large Eddy Simulator) developed here at Ames. Originally written for a Cyber 205, it attempts to alleviate its need to pass through an array in both row and column order by doing a transpose between the passes. The transpose helps both the virtual memory systems and the Cray machines, by overcoming know problems in both kinds of machines. There are two seperate versions of the transpose, one which is efficent on a virtual memory system, and one which is efficent on a Cray. Further, loops are designed to compile to long vector lengths, as is required to maximize efficency on an ETA10. To give such a code to Vast is to court disaster, since anything short of hand rewriting the assembly code from inner loops is going to make the program run slower. The point is that the closer the original code comes to being optimal for the machine, the more likely Vast is to do produce a less optimal result. Marty -- +-+-+-+ I don't know who I am, why should you? +-+-+-+ | fouts@lemming.nas.nasa.gov | | ...!ames!orville!fouts | | Never attribute to malice what can be | +-+-+-+ explained by incompetence. +-+-+-+