Path: utzoo!mnetor!tmsoft!torsqnt!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!samsung!uunet!mcsun!ukc!strath-cs!expya!exua!JRowe From: JRowe@exua.exeter.ac.uk (John Rowe) Newsgroups: comp.lang.c Subject: Re: Execution time bottleneck: How to speed up execution? Message-ID: Date: 14 Feb 91 19:36:22 GMT References: <21658@yunexus.YorkU.CA> Sender: JRowe@exua.exeter.ac.uk Followup-To: comp.lang.c Organization: Computer Unit. - University of Exeter. UK Lines: 69 In-reply-to: racine@yunexus.yorku.ca's message of 11 Feb 91 15:17:37 GMT In article <21658@yunexus.YorkU.CA> racine@yunexus.yorku.ca (Jeff Racine) writes: Execution time query: I have the following nested loop in which I must sum (over all i), for each x[j], e raised to the power x[i]-x[j] squared (times a constant). for (j=1; j <= n; j++) { sum = 0.0; for (i=1; i<= n; i++) { sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) ); } a[j] = k*sum; } Jeff needs to speed it up. The C guys will tell you how to speed up the loops; being a physicist, ie really just a FORTRAN programmer, I would just point out to halve the number of them and maybe take out a multiply. With minimal recoding maybe something along the lines of: /* Just typed in - maybe typos */ foo = sqrt(con > 0 ? con : -con); for (j = 1; j <= n; ++j) { a[j] = 1.0; /* i = j case */ y[j] = x[j] * foo; } if ( con > 0.0 ) for (j = 1; j < n; ++j) for (i = j + 1; i <= n; ++i) { a[j] += foo = exp( (y[i]-y[j]) * (y[i]-y[j]) ); a[i] += foo; /* Two sums per exp */ } else for (j = 1; j < n; ++j) for (i = j + 1; i <= n; ++i) { a[j] += foo = exp( - (y[i]-y[j]) * (y[i]-y[j]) ); a[i] += foo; /* NB ^ only difference.*/ } for (j=1; j <= n; ++j) a[j] *= k; [pious platitudes about first improving the algorithm, taking advantage of symmetry, etc. omitted]. I don't know your machine but it's the halving of the number of exponentials that I would expect to give you the greatest gain, 50%. If you've got a modern, floating point orientated RISC machine (RS6000, i860??) floating point multiplies may be so fast it's not it may not be worth the hassle of the vector y and the if ( con < 0.0 ) bit. Any FORTRAN compiler would optimise out the (y[i]-y[j]) * (y[i]-y[j]) in its sleep, so should your C compiler. If not you could do it by hand as I see someone has sugested, but that's *probably* even less of a gain. I would guess that the question of c v assembler hangs upon how long your cpu takes to do a floating point multiply and exponential against the quality of your compiler's optimiser. Halving the number of exponentials is only a factor of two but it's zero effort and it's always there however you end up coding it. But I may have missed really something obvious as to why you can't do that; I had a late night this morning. Good luck and hope this helps. John Rowe Exeter University Computational Physics Group Exeter U.K.