Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!hp4nl!charon!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.lang.c Subject: Re: Execution time bottleneck: How to speed up execution? Message-ID: <2934@charon.cwi.nl> Date: 14 Feb 91 01:58:55 GMT References: <1991Feb12.233522.5195@Think.COM> <4763@goanna.cs.rmit.oz.au> <17664:Feb1319:36:1291@kramden.acf.nyu.edu> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 69 This is ludicrious. In article <17664:Feb1319:36:1291@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > In article <4763@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes: (A perfectly good solution; so good that I mailed the same to the original poster.) What most people fail to see is that the bulk of the work in the loop is the calculation of the exponential! So all solutions that fail to reduce that number of calculations are doomed to fail to reduce the execution time very much. (And the solution I saw that used pow rather than exp will not help either because pow will in general use exp, or an algorithm that is roughly equivalent.) Now to Dan's points: > Some random observations, starting from this point: It's wasteful to > repeatedly multiply a homogeneous formula by a constant. First multiply > all the x's by the square root of the constant. Assuming there's space > available (or you can trash the original array, or you can divide > afterwards and not worry about roundoff error): Considering the number of multiplications the exponential uses, this is peanuts (what is it? I am too lazy to look at the source, but I presume more than 10.). > Next, the loops will run faster backwards on most machines. Ah yes, going backwards getting a comparison to 0 rather than to a non-zero value. Useful, *if* the body of the loop is small. > Next, the accumulated a[j] sum might as well be kept in a register, as > well as y[j]: And because the exponential is called this means one more register has to be saved. Question: what do we gain? > Under some compilers on some machines it will produce a slight speedup > to keep a + i and y + i in registers: And again. > On the other hand, the loop should be written somewhat differently for > vector machines: Ha! And because of the call to exp most C compilers for vector machines will not vectorize it. You have to do quite a lot more for those. (Yup, just two weeks ago I did some optimization on the Cray of a loop involving logarithms.) > If the machine can't vectorize exp then the loop should be split still > differently. Yes, loop splitting. Do in scalar what can not be vectorized. So now you are going to vectorize 1-10% of the work involved. A waste of programmers time. Etc. I have seen similar other responses. In general in this kind of cases: *look where the time is going*. In this example the time is going to the exponential function. If this function is not available in hardware it will in general take about 10 to 20 times the time needed for a multiplication (this is an estimate not based on fact, it is probably too low). Even if this function (or its computational workhorse) is available in hardware, it takes a lot of time: Intel 80387; F2XM1 211-476 clocks (workhorse); Motorola 68881; FETOX 466+ clocks. So chaving off one or more subtractions, memory accesses etc. will not help very much. The basic idea must be to reduce the number of calculations of exp (which the solution of Richard A. O'Keefe did). Another thing that is worth considering is reducing the time needed to calculate the exp. In that case questions like argument range, required precision etc. arise. It might be worthwile to roll your own version of exp (see for instance Cody&Waite, Software manual for the mathematical functions; the title is approximate). Especially if the required precision is smaller this might help. But never ever do some micro-optimization that speeds up some part of the algorithm by a factor of perhaps 100 if it takes only a very small percentage of the total time needed (see Amdahl's Law); that is really a waste of programmers time. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl