Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!cs.utexas.edu!sun-barr!rutgers!cmcl2!kramden.acf.nyu.edu!brnstnd From: brnstnd@kramden.acf.nyu.edu (Dan Bernstein) Newsgroups: comp.lang.c Subject: Re: Some optimization results Message-ID: <8834:May1504:18:2391@kramden.acf.nyu.edu> Date: 15 May 91 04:18:23 GMT References: <3501@charon.cwi.nl> <14077:May1321:14:2691@kramden.acf.nyu.edu> <22723@yunexus.YorkU.CA> Organization: IR Lines: 77 In article <22723@yunexus.YorkU.CA> oz@yunexus.yorku.ca (Ozan Yigit) writes: [ an implication that hand optimization only gains a few percent ] Sorry, Ozan, but you're wrong. The yabba code after hand optimization was slightly over two times faster than the code before. As another example, I've taken the ``compress'' source and optimized it by hand; the result is 30% faster for compression and 50% faster for decompression. That is a lot more than a few percent. > >I can keep pointing out how hand optimization gives huge speedups. > wollman@emily.uvm.edu (Garrett Wollman) points out elsewhere: Where is that? All reports of yabbawhap problems should either be sent to me or be crossposted to comp.sources.bugs. How am I supposed to fix them otherwise? > | [For example, yabba will not work for (some subset of the set of) > | large binary files on our SGI 4D/340S. Dunno about this. I've received five reports of yabba failures, and in three of the cases it was simply misconfigured in ways that checkconf (which is part of the package) would have detected had it been run as per instructions. I haven't received word yet on the fourth and fifth. > Dan also spent so much effort > | unrolling loops that the source is nearly incomprehensible, I have no problem understanding it. I really didn't spend much effort on the unrolling; it was just a matter of taking a subroutine like this: #define DOWN0(n,p,h,sp,sch,no,FOUND,NOTFOUND) { \ if ((no) = HANO(h,sch)) { if (PARENT(p,no) != (sp)) { \ do (no) = NEXT(n,no); while ((no) && (PARENT(p,no) != (sp))); \ if (no) { FOUND } else { NOTFOUND } \ } else { FOUND } } else { NOTFOUND } \ } and adding a parallel version like this: #define DOWN1(n,p,h,sp,sch,no,FOUND,NOTFOUND) { \ if ((no) = HANO(h,sch)) { if (PARENT(p,no) != (sp)) { \ if ((no) = NEXT(n,no)) { if (PARENT(p,no) != (sp)) { \ do (no) = NEXT(n,no); while ((no) && (PARENT(p,no) != (sp))); \ if (no) { FOUND } else { NOTFOUND } \ } else { FOUND } } else { NOTFOUND } \ } else { FOUND } } else { NOTFOUND } \ } plus a DOWN macro to select between DOWN0 and DOWN1 (as well as DOWN2 and DOWN5). Since this unrolling is all isolated inside a separate routine with a well-defined interface, it doesn't hurt maintainability. (Admittedly, I'm the only one with a copy of that interface definition at the moment; see the top of huptrie.h for further comments.) The most important hand optimizations in yabba were data structure rearrangements: -DPTRS (though -UPTRS is faster on the Convex, and other machines where a[i] is as fast as a[0]), combining two arrays into a single array of structs, etc. It'll be a long time before compilers can do anything like this. > and breaks > | a parallelizing preprocessor which can be used by the MIPS/SGI > | compiler to great effect.] There is an important issue here: If an optimization will slow down the code on some machine, then you should always provide a way to turn the optimization off. In this case, -DOPTCANT1 is documented to turn off the unrolling, and if Wollman's preprocessor is buggy then he should have followed the instructions and compiled with that option. -DPTRS is a similar example---I made that optimization with the Convex in mind, and made sure that it could be easily toggled on or off. And so on. On the bright side, note that yabbawhap has so far found at least two optimizer bugs, one in a prerelease version. At least in the latter case, the bug will never see the light of day. ---Dan