Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!uwvax!oddjob!tank!uxc!uxc.cso.uiuc.edu!uxg.cso.uiuc.edu!uxe.cso.uiuc.edu!mcdonald From: mcdonald@uxe.cso.uiuc.edu Newsgroups: comp.lang.c Subject: Re: Array indexing vs. pointers... Message-ID: <225800077@uxe.cso.uiuc.edu> Date: 1 Oct 88 14:49:00 GMT Lines: 102 Nf-ID: #R:<8809191521.AA17824@ucbvax.Berke:-40:uxe.cso.uiuc.edu:225800077:000:4873 Nf-From: uxe.cso.uiuc.edu!mcdonald Oct 1 09:49:00 1988 > I've been programming for YEARS, and if it's not a professional trade >secret, I'd love it if you would summarize your "coding practices" and >email them or post them. If you can get half the added efficiency >you're talking about, the programming practices be would of interest to the >whole computing world. Like you I spend a great deal of my time >optimizing code for other people, and I have yet to see more than a >20-25% improvment (even over BAD code.) > *** THIS IS NOT A FLAME OF ANY KIND *** >I would sincerely like to see these practices. They might give me a new >outlook on coding. I think that 25% or even 60% improvement is not a bad guess. This is only an average, though, and includes, in my experience, some cases of truly great gains , that is 30000% improvement (300 times faster). I'll be happy to share my experiences (I have my flameproof suit on, wear yours). 1. Don't use a bad algorithm. I have seen programs with slow Fourier transforms in them, a factor of 10 to 50 too slow. The NCAR graphics package included a 3d surface contour plotter that drew with a pen plotter in such a way that it picked up the pen after EVERY line segment, even if the next one started where the last one left off (also present in Mathcad)(factor of 8). The latest version (3.9p) of MicroEmacs searches 30 times faster than the previous one by an improved algorithm (though the old one suffered from another problem also.) 2. Don't use a general purpose routine to do a very specific, simple thing. The classic case in C is writing char ss[80]; ss[0] = 'a'; ss[1] = '\0'; printf("%s",ss); instead of putchar('a'); ... note that I'm not talking about the first one happening occasionally in a loop. -2. Don't reinvent a slow wheel if a fast one is lying around for the taking. (Illustrated by the recent thread on Duff's device, where several folks discovered memcpy to be faster. Save hacks like that for where they are NEEDED - like writing to IO space, where memcpy might not work.) 3. Don't use a complicated canned system to push square pegs in round holes. This is where I have found my factors of 300. Graphics systems like GKS and Phigs are the classic case, as are the graphics routines in Microsoft Windows (and, I'll bet a cup of coffee, OS/2), but not, for some odd reason , the Mac. Those things go through so many layers of code from your call to the screen that they are guaranteed to be slow. If you have a simple task (as most of mine are) like drawing a bunch of circles and ellipses and moving them around the screen fast, use a simple tool. Write your own tool to draw ellipses direct to the screen - direct to the hardware. My routines beat Microsoft WIndows by a factor of 10 and IBM's GKS by 100 to 1000 times (that's right, 1000 times faster). Save the big clumsy packages for big clumsy tasks - if your calculations take 10,000,000 lines, they are probably just fine. 4. Don't use nested subroutines to do lots of simple things inside lots of simple things. Write the damn stuff out, even if it makes more code. The classic case here is IBM's PC video bios. 5. Okay, we admit the rest is tweaking. 25% is a good goal. To get the fastest code, the first thing to do is convince yourself (and maybe your boss - luckily I don't have one) that you WANT to get it faster. Then understand the hardware and instruction set of the target computer (portable? what's that? We have already decided to go go the gold medal). Look at the compiler's .asm output. Try changing things like x*65 to x<<6+x. It might be faster, might not. IF your code does indexing like for(i = 0; i < 100; i++)a[i] = i*6; change it to b = a; j = 0; for(i = 0; i < 100; i++, j += 6)*b++ = j; It might (or might not) make a difference. Honestly now - aren't all these things obvious? I would love to hear about non-obvious ones!. Nevertheless, I have seen them missed more often than not. 6. When all else fails, cheat. Use assembly language. Look at the compiler output. Maybe the optimizer isn't perfect. Remove stores of never used values. Play with instruction order. Maybe your processor can overlap floating point computations with fixed point ones - take advantage of this. Maybe you can assign all your variables in a loop to registers and get a big gain there (I got a 15% gain in one graphics routine by storing data in argument pointer and segment registers on the 8086). There are some cases where certain compilers can do a better job than you or me, mainly RISC machines with microscheduling compilers. I'd still like to see what the guy who designed the scheduler could eke out if he tried hard! Doug McDonald D D for(i = 0; i < 100; i++, j+=6)