Path: utzoo!attcan!uunet!stealth.acf.nyu.edu!brnstnd
From: brnstnd@stealth.acf.nyu.edu
Newsgroups: comp.lang.misc
Subject: Re: Common subexpression optimization
Message-ID: <838.18:06:33@stealth.acf.nyu.edu>
Date: 8 Feb 90 00:06:34 GMT
References: <4561@scolex.sco.COM> <14214@lambda.UUCP> <PCG.90Jan29131938@rupert.cs.aber.ac.uk> <2217@sunset.MATH.UCLA.EDU>
Reply-To: brnstnd@stealth.acf.nyu.edu (Dan Bernstein)
Organization: IR
Lines: 58

Piercarlo is entirely correct. If you care about ``information hiding''
and the ``purity'' of your code more than efficiency, you can't expect
fast code. If you can't bear to put your original code in comments and
write an efficient version with temporary variables, you don't deserve
fast code.

In article <2217@sunset.MATH.UCLA.EDU> pmontgom@math.ucla.edu (Peter Montgomery) writes:
> 	In my programs, common subexpressions often appear 
> as a result of macro expansions.  Others are not visible at the
> source level, but arise when producing the object code.

When efficiency is important, write out the expressions. If the efficient
version is different on different machines, write it out separately for
each. There is a limit to what the compiler can optimize; a competent
programmer assumes merely that the compiler understands scheduling and
register allocation, and does the rest of the job by hand.

> 	There was recently a discussion in comp.arch about the lack of
> an integer multiply instruction on the SPARC, and how to circumvent it.
> In one of my programs (for multiple precision arithmetic), the most 
> time-critical inner loop requires four sums of products (all variables integer)
  [ the expressions he wants, and how he implements them on the SPARC
    with macros referencing a table of triangular numbers ]

Actually, the code would be faster if you used squaring rather than
triangular numbers. Do you expect the compiler to figure this out, or
should the programmer perform this optimization?

Using squaring as the foundation for multiprecision multiplication
(i.e., to multiply arrays a and b, square a+b, square a-b, subtract, and
divide by 4) saves a lot of space (and code). Do you expect the compiler
to figure this out, or should the programmer perform this optimization?

> it would defeat information hiding if I expanded everything
> there and removed the common subexpressions explicitly.

Tough. Efficiency demands code space. You can leave the ``pure''
expressions in the code, commented out by #ifdef SLOW_N_PORTABLE.
Supposedly optimizing compilers don't remove the burden of thought
from programmers.

  [ more optimizations, based on loop invariants ]
> I cannot make these optimizations at the source level.

Again, you can and should perform this optimization by hand, sacrificing
your macros. Feel free to leave the original code there, but don't pretend
that the compiler can figure things out better than you can.

> 	Even on systems using the first macro expansion 
> (in terms of multiplications), there may be hidden common subexpressions.  

So don't trust Cray's compiler to figure out that it should store the fp
variables. How can the compiler tell the difference between a bottleneck
and a loop that's executed only twice? How can the compiler know whether
variable a is used more than variable b in a function? How can the compiler
know whether you prefer code space, compiling effort, memory, or time?

---Dan