Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!wuarchive!uunet!stanford.edu!shap@shasta.Stanford.EDU
From: shap@shasta.Stanford.EDU (shap)
Newsgroups: comp.lang.c
Subject: Re: Some optimization results
Message-ID: <186@shasta.Stanford.EDU>
Date: 6 May 91 18:41:27 GMT
References: <444@smds.UUCP>
Organization: Stanford University Computer Systems Laboratory
Lines: 90

In article <444@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
>
>What have we learned?  First of all we learned that careful optimization
>of plausible looking code can reduce execution time of the code in question
>by a big factor, e.g. a factor of 6.5 in this case.  Secondly the effect on
>the total time is not nearly as signifigant (cost cut by 40%.)  Thirdly
>we have guidelines on code optimization:
>
>(a)	Make arrays static or malloced; don't make them automatic.
>(b)	Use loop unrolling and count down loops.
>(c)	Avoid using conditional code within tight loops
>(d)	It's often better to run a loop to completion and test for
>	failure afterwards than it is to test with the loop.
>(e)	Bitwise operators can make a signifigant difference
>
>Note: The test machine was a SUN 4/110.  The code presented has comments
>removed and does not use the actual layout and naming conventions style.

It's good when someone describes a compiler experiment at the level of
detail you did.  A postmortem may be interesting to some readers, so I
am offering one.

The compiler you were using explains everything.  Of the 5 optimizations
you did, only two are not visible to the compiler and really did need to be
hand-generated.

The compiler is not free to remove the branch test within the loop
because it does not have enough information.  Suppose that the
array passed in to the function was at the end of memory.  If the
branch were deferred, the resulting program might cause an addressing
exception, which would be incorrect code.  The compiler must be
conservative and leave the breanch alone.

Similarly, the compiler in general has no way to know that you don't actually
care about the magnitude of the check variable.  It SHOULD have been
smart enough to turn the equality comparison into a 1 or zero
comparison, and this is a very interesting special case.  You might be
interested to know that the GNU compiler for the SPARC handles this
case.

The compiler should have been able to handle auto arrays fine, and
should not have had any difficulty unrolling the final version of your
loop.  It might be an interesting experiment to check if once you remove
the branch the compiler doesn't unroll it correctly.  Your description
didn't indicate that you tried this test, and on a conventional SPARC,
doing the unrolling with that conditional branch in the picture doesn't
help.  Note that your unrolling took advantage of knowing about the fact
that exceptions couldn't happen.

Though in this particular case, the unrolling is hard to spot.
Basically, the compiler has to notice that it can use the branch below
the loop as an early termination for the unrolled loop.  To do this, it
has to know that you only care about the nonzeroness of the check
variable.

So, in understanding your conclusions, it's useful to look at what they
tell us.  I have taken the liberty of rewording them:

>(a)	Make arrays static or malloced; don't make them automatic.

Don't bother. Get a serious compiler.

>(b)	Use loop unrolling and count down loops.

In this specific case you were after early termination, which is a very very
special case. Once again, you know that terminating after a few extra
ops doesn't hurt, but the cmopiler cannot.  In general, the compiler is
probably pretty good at unrolling.

More to the point, use the memcmp() or bcmp() routines.  They are probably
better suited to the problem because they are hand-tuned to the system
at hand.  In general, the compiler is probably good at unrolling.

>(c)	Avoid using conditional code within tight loops

Good idea.  This is something the compiler typically cannot help with
much.

>(d)	It's often better to run a loop to completion and test for
>	failure afterwards than it is to test with the loop.

Good idea, bearing in mind that you have to know that things like
exceptinos won't get in the way.

>(e)	Bitwise operators can make a signifigant difference

True, and well worth remembering, but in your case you simply needed a
serious compiler.

Jonathan Shapiro