Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rice!hsdndev!think.com!barmar From: barmar@think.com (Barry Margolin) Newsgroups: comp.lang.c Subject: Re: Execution time bottleneck: How to speed up execution? Keywords: Execution time, would like to speed up. Alternatives? Message-ID: <1991Feb12.233522.5195@Think.COM> Date: 12 Feb 91 23:35:22 GMT References: <21658@yunexus.YorkU.CA> Sender: news@Think.COM Organization: Thinking Machines Corporation, Cambridge MA, USA Lines: 77 In article <21658@yunexus.YorkU.CA> racine@yunexus.yorku.ca (Jeff Racine) writes: > for (j=1; j <= n; j++) { > sum = 0.0; > for (i=1; i<= n; i++) { > sum += exp( con * (x[i]-x[j]) * (x[i]-x[j]) ); > } > a[j] = k*sum; > } First a question about the correctness: do you really want to loop from 1 to n? In C, if the array has n elements, the indices run from 0 to n-1, not from 1 to n. >This part of the algorithmn is a bottleneck. I want to make this as >quick as possible. What are my options? I have heard that assembler >would be faster. How much faster would assembler be? Would it be worth >the effort? Are there other alternatives that you can see? Assembler probably won't help much, unless you're already a pretty good assembler programmer. However, it would probably be helpful if you looked at the assembler code produced by your C compiler, to see whether there are ways you could recode the C so that it would produce more optimal compiled code. You might want to look at the book "Writing Efficient Programs" by Jon Louis Bentley (he's the same person who went on to write the "Programming Pearls" column in CACM (these columns have since been collected into a couple of books). It describes many general techniques for determining why a program is slow and improving it. As for your code, there are a couple of minor changes that may help a little (but small improvements inside inner loops can have noticeable effects). They assume that your C compiler doesn't have a very good optimizer (generally a pretty safe assumption). I'll reference the efficiency rules that Bentley describes. Register Allocation. You reference i and j very frequently, so they should be declared "register". This may not be necessary, though; I expect many compilers default to putting loop index variables into registers. Loop Rule 1 -- Code Motion Out of Loops. Your inner loop refers to x[j] each time through the loop, although j doesn't change during that loop. It would probably be better to do "register x_sub_j_temp = x[j];" before the inner loop, and then access x_sub_j_temp in the inner loop instead of x[j]. You've already applied this rule in your assignments to a[j], by the way. A slower, but equivalent, way to have written your code would have been for the inner loop to contain: a[j] += k * exp (...); Expression Rule 3 -- Common Subexpression Elimination. You perform the subtraction of the two elements twice when squaring. Instead, store the result of the subtraction into a temporary, and square that: temp_difference = x[i] - x[j]; sum += exp (con * temp_difference * temp_difference); Many compilers are able to do this optimization themselves (it's one of the oldest and most well known), so it may not improve efficiency (but you might need to use the "register" declaration to get exactly the same code); look at the compiler's assembler output to see. However, I personally find this good style even if it doesn't change run time; the originally code forced me to examine carefully to be sure that both subtractions were the same (it could have been (x[i]-x[j])*(x[j]-x[i]), which looks almost the same); by simplifying the expression you make it easier for the human reader to discern the structure of the formula (it would also help to use a more descriptive name than "temp_difference", especially if this difference has a name in the application domain). -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar