Path: utzoo!mnetor!uunet!husc6!ncar!ames!amelia!orville.nas.nasa.gov!fouts
From: fouts@orville.nas.nasa.gov (Marty Fouts)
Newsgroups: comp.arch
Subject: Re: RISC is a nasty no-no!
Message-ID: <360@amelia.nas.nasa.gov>
Date: 17 Mar 88 19:47:06 GMT
References: <179@wsccs.UUCP: <696@nuchat.UUCP> <284@scdpyr.UUCP> <641@bnr-rsc.UUCP> <3723@killer.UUCP>
Sender: news@amelia.nas.nasa.gov
Reply-To: fouts@orville.nas.nasa.gov (Marty Fouts)
Lines: 39

In article <3723@killer.UUCP> chasm@killer.UUCP (Charles Marslett) writes:
>
>I guess my point is, if the demand for good code generation from compilers
>wins out over fast debugging cycles for compiler company resources, we
>may reach the stage where the author of the program need not worry about
>the architecture of the underlying machine when writing code because the
>compiler writer is optimizing generic-CPU code well enough that he can
>ignore the application writer's optimizations.  I hope we get there before
>everyone gets sidetracked to more "interesting" or "marketable" features
>in compilers -- whatever they might be in 1992.
>

I used to hope this to, until I started writing code which runs on a
wide variety of architectures.  I suspect that the best the compiler
will be able to do (which is actually quite good) is fairly local
optimization in the sense of changes which implement an equivalent
algorithm to that expressed in the original program, rather than
solving an equivalent problem, but perhaps with a different algorithm.

For example, there are two basic algorithms for finding all of the
primes less than some integer.  Once can apply the sieve algorithm
explicitly by marking all of the multiples of each prime as it becomes
known, or one can divide each candidate integer by all primes less
than the square root of the integer, looking for an exact divisor.  On
most of the machines I've coded these algorithms on, the explicit
sieve is a major win because counting is much faster than division.
However, on the Connection Machine, the division algorithm is faster.

Although I expect various compilers to optimize the loop invariant
constructs out of my sieve, I would be surprised if any of them
converted it into a division finder, even on the machines where that
was the better approach.

I don't think we'll reach the point where authors never need to worry,
but I do think we'll reach the point where once the machine is chosen,
the author can depend on the compiler to get the best performance out
of a particular choice of algorithm.  It then becomes the author's
problem to figure out which algorithm has the best chance of winning
on the class of machines the program is intended for.