Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!ccu.umanitoba.ca!herald.usask.ca!alberta!ubc-cs!uw-beaver!milton!dali.cs.montana.edu!uakari.primate.wisc.edu!crdgw1!uunet!mcsun!hp4nl!charon!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.lang.c Subject: Re: Execution time bottleneck: How to speed up execution? Message-ID: <2942@charon.cwi.nl> Date: 14 Feb 91 22:00:14 GMT References: <24587:Feb1411:32:5391@kramden.acf.nyu.edu> <2940@charon.cwi.nl> <26862:Feb1416:46:4391@kramden.acf.nyu.edu> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 48 In article <26862:Feb1416:46:4391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > Dik, the optimizations I posted in a previous article are responsible > for between a 4% and a 50% speedup, depending on your machine, your > compiler, etc. The optimizations you gave in your first article gave between 4 and 10% speedup, not to 50%. To break down the different speedups on a Sun SLC, comparing O'Keefe's variant with your 5 versions (done with n=1000, 5 times; time in seconds): version time variant OK 55.93 O'Keefe's variant 1 55.15 1.4% speedup (using tmp for xi-xj, including c) 2 53.37 4.6% speedup (looping down, not up) 3 53.30 4.7% speedup (register variable for a[j]) 4 53.87 3.7% speedup (pointer for a+i and y+j) 5 53.82 3.8% speedup (vectorizable code) And yes, I consider this micro optimization. > As I said the first time, the 10% is the speedup you get on a Convex > with the standard math library exp() when you apply the ``ludicrous'' > optimizations I pointed out. It is not due to vectorization. Might be. Does the Convex vectorize all five variants? > > Yes, and I could equally well have said ``buy a Cray.'' If the original > poster didn't have a Cray this would result in ``large improvements.'' Sure. Times on a Cray Y/MP: version time remarks OK 0.3458 original 1 0.3282 5.1% speedup 2 0.3292 4.8% speedup 3 0.3312 4.2% speedup 4 0.3717 7.5% slowdown 5 0.3469 0.3% slowdown Although the compiler needed a bit of persuasion to vectorize the original and versions 1 and 2. So also here: micro optimization. (Calling version 4 and 5 optimized is even stressing the meaning of the term a bit!) > Similarly, the code becomes quite a lot faster if the poster uses a fast > exp()---but do you really think that ``use a fast exp()'' is any more > helpful than ``buy a Cray''? No. Neither one answers the question. But 'use fast exp()' is much more cheaper than 'buy a Cray'! And it can be handled by an adqeuate programmer. > Furthermore, if the poster *does* have a fast exp() running on his Cray, > the optimizations I posted will give an even better speedup. I doubt it very much. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl