Path: utzoo!attcan!uunet!nih-csl!lhc!adm!cmcl2!lanl!jlg From: jlg@lanl.gov (Jim Giles) Newsgroups: comp.lang.misc Subject: Re: Aggressive optimization Message-ID: <5839@lanl.gov> Date: 14 Nov 90 01:16:33 GMT References: <5059:Nov1323:43:2290@kramden.acf.nyu.edu> Organization: Los Alamos Natl Lab, Los Alamos, N.M. Lines: 41 From article <5059:Nov1323:43:2290@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [...] > As part of my proof I have posted the general form that a counterexample > would take. Namely, if you show me an algorithm that can supposedly > shift arrays optimally, I'll find a program for which you cannot detect > which of two branches is taken (independently of the input). I'll then > use x[0] (or x[a + b], or whatever) in one branch, and x[1] (or > whatever) in the other branch. You won't be able to figure out whether > it's better to precompute &x[0] or to precompute &x[1]. Such a program > must exist. Fine. The compiler won't be able to determine this one. Neither will the human programmer. If the programmer knows a priori which branch will be taken, why is there a branch? And, if he doesn't know a priori which branch will be taken, then he doesn't know which pointer to precompute. And, since he doesn't know which to precompute, he must guess: which is what the compiler will do as well. Now, there are certain trade-offs about who does the guessing. The programmer _may_ have advance information about the relative frequency of the branches taken. For this reason, it might be better to let the programmer do the guessing. The compiler, on the other hand, knows the register allocation and the instruction pipelining: so it knows which of the branches offers the best chance of completely covering the precomputing of the array index. For this reason, it might be best to let the compiler choose. The best solution is probably to let the compiler choose and give the programmer the ability to assert information about the frequency of each branch. In any case, this contrived example only saves a fractional add at best (an add is saved each time you take the branch that you precomputed for, one is lost each time you take the other branch, the average "savings" is a fraction of an add). The problem is, it's possible that the programmer will pick the wrong one to precompute. He may pick the most frequently taken branch to precompute for and then lose time anyway because the computation on the other branch would have pipelined better. On most compilers, it will probably decide to precompute _both_ anyway - even when you use explicit pointers. J. Giles