Path: utzoo!attcan!uunet!ncrlnk!ncrcae!ece-csc!mcnc!rti!tijc02!pjs269 From: pjs269@tijc02.UUCP (Paul Schmidt ) Newsgroups: comp.lang.c Subject: Re: Efficient coding considered harmful? Message-ID: <273@tijc02.UUCP> Date: 31 Oct 88 20:43:52 GMT References: <3105@hubcap.UUCP> <34112@XAIT.XEROX.COM> <1700@dataio.Data-IO.COM> <8630@smoke.ARPA> <1704@dataio.Data-IO.COM> <119@twwells.uucp> Organization: Texas Instr., Johnson City TN Lines: 128 After working for a year to optimize a DBMS I have some comments on writing efficient code. "More harm has been done in the sake of efficiency than any thing else." When I was optimizing I found some horrendous "efficient coding" practices that were used that made the code either less managable, less efficient or both. Take, for example, my favorite: some_function(a_variable) short a_variable; The coder (who was inexperienced in C) wanted to optimize the space needed to save the parameter passed to the function. This actually may add to the memory and time to do conversion between short and int. The worst violation of coding for efficiency was done in assembler. The person set a condition bit inside a subroutine. After the return the bit was used in a conditional jump. Of course, another programmer saw the subroutine and couldn't under- stand why there was an unneeded operation in the subroutine and removed it. The time spent coding a project is only about 10%. The maintenance phase lasts around 50% of a project. If the coders write the most readable code for the maintenance, the entire project cost can be reduced. But there is still a need for optimization. This should be done after the code is written and working. Why? Because the amount of time spent in each code segment varies widely. There is no reason to optimize the initialization routines if they are only run once and are fairly fast already. Using prof(1) under UNIX I have always been suprised at where the time is spent for a given program. And using this shows which routines need to be optimized. Using a benchmark it was easy to see that only 10% of the routines were run 90% of the time. Some of the results showed obvious duplication of calculations that were easy to eliminate. But instead of trying to find them by hand, we let the computer show us where they were. After changing the obvious problems, there were many low level optimizations that were done. Some included calculating certain variables once and storing them as globals while others were to make certain variables declared as register. At one point it became obvious that the semaphore routines supplied by UNIX took 25-50% of the total time to do a database retrieve. (This was solved by making ownership of relations, and removing the need to call the semaphore routines.) All through the optimization process we were aware of what was the most important code to optimize so we could, as our boss always put it, "Get the biggest bang for the buck." For less experienced C programmers, try running prof on a program and see which routines are actually taking the most amount of time. Prof will order the output from the most used routine to the least and give the percentage of time spent in each routine. I copied this prof output from July 87, 1987, p 588, on profilers: %time cumsecs #call ms/call name 82.7 4.77 _sqrt 4.5 5.03 999 0.26 _prime 4.3 5.28 5456 0.05 _root 2.6 5.43 _frexp 1.4 5.51 __doprnt 1.2 5.57 _write ... This is for a program to compute prime numbers: root(n) int n; { return (int) sqrt((float) n); } prime(n) int n; { int i; for (i = 2; i <= root(n); i++) if (n % i == 0) return 0; return 1; } main() { int i, n; n = 1000; for (i = 2; i <= n; i++) if (prime(i)) printf("%d\n", i); } It is interesting to see that the square root calculation takes this much time for a function and is not needed to calculate primes. It was probably an "optimization" to make the search for primes quicker. In conclusion, I would like to stress that readability for the maintenance phase should outweigh the importance of optimizing code as it is written. Easy to read code is easier to maintain, and easier to optimize. Paul Schmidt Texas Instruments PO Drawer 1255, MS 3517 Johnson City, TN 37605-1255 mcnc!rti!tijc02!pjs269