Path: utzoo!utgpu!cunews!cognos!geovision!pt From: pt@geovision.gvc.com (Paul Tomblin) Newsgroups: comp.lang.c Subject: Re: Execution time bottleneck: How to speed up execution? Keywords: Execution time, would like to speed up. Alternatives? Message-ID: <1416@geovision.gvc.com> Date: 13 Feb 91 14:21:11 GMT References: <21658@yunexus.YorkU.CA> <1991Feb12.233522.5195@Think.COM> Reply-To: pt@geovision.gvc.com Organization: GeoVision Corp., Ottawa, Ontario Lines: 70 barmar@think.com (Barry Margolin) writes: >In article <21658@yunexus.YorkU.CA> racine@yunexus.yorku.ca (Jeff Racine) writes: >> for (j=1; j <= n; j++) { >> sum = 0.0; >> for (i=1; i<= n; i++) { >> sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) ); >> } >> a[j] = k*sum; >> } >First a question about the correctness: do you really want to loop from 1 >to n? In C, if the array has n elements, the indices run from 0 to n-1, >not from 1 to n. >>This part of the algorithmn is a bottleneck. I want to make this as >>quick as possible. What are my options? I have heard that assembler >>would be faster. How much faster would assembler be? Would it be worth >>the effort? Are there other alternatives that you can see? [Some good suggestions from Barry deleted] >Register Allocation. >Loop Rule 1 -- Code Motion Out of Loops. >Expression Rule 3 -- Common Subexpression Elimination. putting all the optimization I can think of off the top of my head, and assuming that he *really* meant 0 to n-1, I come up with: register x_ptr_t ptri, ptrj; register a_ptr_t ptra; register double sum, sub_expression; ptrj = x[0]; ptra = a[0]; ptr_to_end = x[n]; while (ptrj < ptr_to_end) { sum = 0.0; ptri = x[0]; while (ptri < ptr_to_end) { sub_expression = ptri * ptrj; sum += exp( con * sub_expression * sub_expression); ptri++; } ptra++ = k*sum; ptri++; } This combines MOST of the techniques that I learnt from an article in Microsoft System Journal talking about the optimizer in MS-C. In other words, if you use MS-C, you don't need to do most of these things. If you use pcc (the bundled compiler on Suns, Ultrix, on lots of other unix boxen), you DO need to do all this in critical loops. The techniques I used: Pointer unraveling? (I think that's what it's called): Converting array references to pointers eliminates a lot of calculations of array offsets on lousy optimizers Register allocation: Tell the compiler that the variables are going to be referenced a lot. If you have enough registers, it will help (again, only if your compiler isn't smart enough to decide on its own) Common subexpressions: Another easy one, but I don't think pcc does it. Anybody think of anything else? Did I make any major boo-boos? -- Paul Tomblin, Department of Redundancy Department. ! My employer does The Romanian Orphans Support Group needs your help, ! not stand by my Ask me for details. ! opinions.... pt@geovision.gvc.com or {cognos,uunet}!geovision!pt ! Me neither.