Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!stanford.edu!shap@shasta.Stanford.EDU From: shap@shasta.Stanford.EDU (shap) Newsgroups: comp.lang.c Subject: Re: Some optimization results Message-ID: <186@shasta.Stanford.EDU> Date: 6 May 91 18:41:27 GMT References: <444@smds.UUCP> Organization: Stanford University Computer Systems Laboratory Lines: 90 In article <444@smds.UUCP> rh@smds.UUCP (Richard Harter) writes: > >What have we learned? First of all we learned that careful optimization >of plausible looking code can reduce execution time of the code in question >by a big factor, e.g. a factor of 6.5 in this case. Secondly the effect on >the total time is not nearly as signifigant (cost cut by 40%.) Thirdly >we have guidelines on code optimization: > >(a) Make arrays static or malloced; don't make them automatic. >(b) Use loop unrolling and count down loops. >(c) Avoid using conditional code within tight loops >(d) It's often better to run a loop to completion and test for > failure afterwards than it is to test with the loop. >(e) Bitwise operators can make a signifigant difference > >Note: The test machine was a SUN 4/110. The code presented has comments >removed and does not use the actual layout and naming conventions style. It's good when someone describes a compiler experiment at the level of detail you did. A postmortem may be interesting to some readers, so I am offering one. The compiler you were using explains everything. Of the 5 optimizations you did, only two are not visible to the compiler and really did need to be hand-generated. The compiler is not free to remove the branch test within the loop because it does not have enough information. Suppose that the array passed in to the function was at the end of memory. If the branch were deferred, the resulting program might cause an addressing exception, which would be incorrect code. The compiler must be conservative and leave the breanch alone. Similarly, the compiler in general has no way to know that you don't actually care about the magnitude of the check variable. It SHOULD have been smart enough to turn the equality comparison into a 1 or zero comparison, and this is a very interesting special case. You might be interested to know that the GNU compiler for the SPARC handles this case. The compiler should have been able to handle auto arrays fine, and should not have had any difficulty unrolling the final version of your loop. It might be an interesting experiment to check if once you remove the branch the compiler doesn't unroll it correctly. Your description didn't indicate that you tried this test, and on a conventional SPARC, doing the unrolling with that conditional branch in the picture doesn't help. Note that your unrolling took advantage of knowing about the fact that exceptions couldn't happen. Though in this particular case, the unrolling is hard to spot. Basically, the compiler has to notice that it can use the branch below the loop as an early termination for the unrolled loop. To do this, it has to know that you only care about the nonzeroness of the check variable. So, in understanding your conclusions, it's useful to look at what they tell us. I have taken the liberty of rewording them: >(a) Make arrays static or malloced; don't make them automatic. Don't bother. Get a serious compiler. >(b) Use loop unrolling and count down loops. In this specific case you were after early termination, which is a very very special case. Once again, you know that terminating after a few extra ops doesn't hurt, but the cmopiler cannot. In general, the compiler is probably pretty good at unrolling. More to the point, use the memcmp() or bcmp() routines. They are probably better suited to the problem because they are hand-tuned to the system at hand. In general, the compiler is probably good at unrolling. >(c) Avoid using conditional code within tight loops Good idea. This is something the compiler typically cannot help with much. >(d) It's often better to run a loop to completion and test for > failure afterwards than it is to test with the loop. Good idea, bearing in mind that you have to know that things like exceptinos won't get in the way. >(e) Bitwise operators can make a signifigant difference True, and well worth remembering, but in your case you simply needed a serious compiler. Jonathan Shapiro