Path: utzoo!mnetor!uunet!husc6!ncar!ames!amelia!orville.nas.nasa.gov!fouts From: fouts@orville.nas.nasa.gov (Marty Fouts) Newsgroups: comp.arch Subject: Re: RISC is a nasty no-no! Message-ID: <360@amelia.nas.nasa.gov> Date: 17 Mar 88 19:47:06 GMT References: <179@wsccs.UUCP: <696@nuchat.UUCP> <284@scdpyr.UUCP> <641@bnr-rsc.UUCP> <3723@killer.UUCP> Sender: news@amelia.nas.nasa.gov Reply-To: fouts@orville.nas.nasa.gov (Marty Fouts) Lines: 39 In article <3723@killer.UUCP> chasm@killer.UUCP (Charles Marslett) writes: > >I guess my point is, if the demand for good code generation from compilers >wins out over fast debugging cycles for compiler company resources, we >may reach the stage where the author of the program need not worry about >the architecture of the underlying machine when writing code because the >compiler writer is optimizing generic-CPU code well enough that he can >ignore the application writer's optimizations. I hope we get there before >everyone gets sidetracked to more "interesting" or "marketable" features >in compilers -- whatever they might be in 1992. > I used to hope this to, until I started writing code which runs on a wide variety of architectures. I suspect that the best the compiler will be able to do (which is actually quite good) is fairly local optimization in the sense of changes which implement an equivalent algorithm to that expressed in the original program, rather than solving an equivalent problem, but perhaps with a different algorithm. For example, there are two basic algorithms for finding all of the primes less than some integer. Once can apply the sieve algorithm explicitly by marking all of the multiples of each prime as it becomes known, or one can divide each candidate integer by all primes less than the square root of the integer, looking for an exact divisor. On most of the machines I've coded these algorithms on, the explicit sieve is a major win because counting is much faster than division. However, on the Connection Machine, the division algorithm is faster. Although I expect various compilers to optimize the loop invariant constructs out of my sieve, I would be surprised if any of them converted it into a division finder, even on the machines where that was the better approach. I don't think we'll reach the point where authors never need to worry, but I do think we'll reach the point where once the machine is chosen, the author can depend on the compiler to get the best performance out of a particular choice of algorithm. It then becomes the author's problem to figure out which algorithm has the best chance of winning on the class of machines the program is intended for.