Path: utzoo!attcan!uunet!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!uwm.edu!ogicse!emory!hubcap!mccalpin From: mccalpin@vax1.udel.edu (John D Mccalpin) Newsgroups: comp.parallel Subject: Elliptic Equation Solvers on CM-2 ? Keywords: Helmholtz equation, FFT, Poisson's equation, Connection Machine Message-ID: <9167@hubcap.clemson.edu> Date: 30 May 90 17:18:25 GMT Sender: fpst@hubcap.clemson.edu Lines: 52 Approved: parallel@hubcap.clemson.edu I am looking at porting a finite-difference model to the Connection Machine CM-2, but have not yet figured out how to get optimum performance out of the implicit equation solver. Most of the code is very straightforward and the port is essentially done, but at every time step I must solve a linear, constant-coefficient elliptic equation of the form: u + u - g^2*u = f(x,y) xx yy Subject to the boundary conditions: u = 0 on x=0, x=L, y=0, y=L (i.e. a square box!) (This equation is variously referred to as Helmholtz' equation or as a generalized Poisson equation.) On vector machines (Cyber 205, ETA-10, Cray Y/MP, IBM 3090/VF) I solve this by two-dimensional FFTs with a simple scaling by the (wavenumber-dependent) transfer function. Since the equation requires only a Fourier Sine series for solution, I pack the NxN grid into a N/4xN/4 complex array for the FFT's. (Note: I do not use 1-D FFT with parallel tridiagonals in the other direction because that is actually slower on the Cyber 205 and IBM 3090 because of the slow vector divides required in the tridiagonal solver. I have not timed this approach on the Cray.) For state-of-the-art calculations, N=512 to 1024, yielding a 128x128 or 256x256 complex FFT. While these are large enough for good performance on the Cray (over 1000 MFLOPS on 4 CPU's in the latter case), I expect it to be too small to get good performance on the CM-2. In this case, I don't think it appropriate to crank up the problem size on the CM-2 to get better parallelism -- after all, the 1024x1024 problem is already running at a virtual processor (VP) ratio of 16. What I really want is a very fast way to solve the 256x256 problem, or maybe the 512x512 problem (2048x2048 spatial grid). Possible approaches are: classical relaxation - Gauss-Jacobi & friends conjugate-gradient - w/ various preconditioners The truncated Neumann series preconditioners should be amenable to the SIMD approach. other direct solvers? Is anyone else out there working on this sort of problem? -- John D. McCalpin mccalpin@vax1.udel.edu Assistant Professor mccalpin@delocn.udel.edu College of Marine Studies, U. Del. mccalpin@scri1.scri.fsu.edu