Path: utzoo!attcan!uunet!bu.edu!snorkelwacker!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!samsung!crackers!jjmhome!smds!rh From: rh@smds.UUCP (Richard Harter) Newsgroups: comp.lang.c Subject: A study in code optimization in C Keywords: memcopy Message-ID: <133@smds.UUCP> Date: 26 Jul 90 06:07:23 GMT Organization: SMDS Inc., Concord, MA Lines: 100 The macro shown below is an optimized memory to memory copy macro. It is probably faster than memcopy on your machine -- I have checked it on several machines and have always found it to be faster. Some may find it instructive as an example of code optimization. Techniques used are: loop unrolling, localized use of register variables, count down loops, and shift and modulo operators instead of arithmetic operators. Some general notes: There are two major ways to unroll a loop with a variable remainder. The method used here is to use two loops, one for the remainder and one for the bulk. The other method is to jump into a loop inside a switch (legal but bizarre.) The second method has the advantage of not needing the "short" loop for the remainder. However timing studies show no real difference between the two methods. The reason is that the overhead of setting up the switch balances off with the higher cost of the short loop. Given two loops, one which is unrolled by a constant, and a short loop handling the remainder, a loop unrolling factor of 8 is optimal under most circumstances. A count down loop saves several instructions in a CISC machine, provided that (a) the loop variable is a register variable, and (b) the optimation option is selected. The savings are due to (1) the comparison is against 0 rather than a constant, (2) CISC machines typically have a branch on result of comparison, (3) optimization and register declaration avoids a store of the loop variable. For some compilers there is also a savings by placing the calculation of the limit in the first clause of the loop when the limit is a complex expression. It should be noted that the savings are only signifigant in tight loops. A count down loop should have one of the following two forms: for (index=(expression)+1;--index>0;) {body} for (index=(expression);--index>=0;) {body} Plum Hall uses the first form; I use and recommend the second as being clearer. The second form will fail (infinite loop) if the index is unsigned. However the savings will be lost in any case if the index is unsigned. These forms should be used as is; the loop optimization is lost if you don't predecrement in the test. The macros do not include or exploit an alignment test on the grounds that (a) the resulting code is not portable, (b) the overhead of the tests removes the savings in short and medium length copies, (c) many aligned copies can be caught by passing the proper type, and (d) I didn't feel like coding it. The copyright notice is included solely on the grounds that I feel that if you use somebody elses code you should acknowlege where you got it from. ------------------------ CUT HERE ----------------------------------- /* Copyright (c) 1990, Software Maintenance and Development Systems Inc. Permission to copy, to modify, and to distribute, whether for sale or not is granted, provided that a copy of this notice is included and is not altered. Macro name: gencopy Function: In memory copy Author: Richard Harter Arguments: dest Pointer to destination src Pointer to source nb Number of items to copy type Type of items Synopsis: This macro is an optimized in line memory to memory copy routine. The following restrictions apply: (a) if dest and src overlap, dest must have a lower address than src; (b) the item type must be one for which assignment is valid. Bugs: No check is made for negative length. Results may be surprising. */ #define gencopy(dest,src,nb,type ) {register type *cpy_a;\ register type *cpy_b;\ register int cpy_i,cpy_j;cpy_a=(dest);cpy_b=(src);cpy_j=(nb);\ for(cpy_i=(cpy_j & 7); --cpy_i>=0;) *cpy_a++ = *cpy_b++;\ for (cpy_i=(cpy_j>>3); --cpy_i>=0;) { *cpy_a++ = *cpy_b++;*cpy_a++ = *cpy_b++;\ *cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++;\ *cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++; *cpy_a++ = *cpy_b++; }} #define copy(dest,src,nb) gencopy(dest,src,nb,char) ------------------------ CUT HERE ----------------------------------- -- Richard Harter, Software Maintenance and Development Systems, Inc. Net address: jjmhome!smds!rh Phone: 508-369-7398 US Mail: SMDS Inc., PO Box 555, Concord MA 01742 This sentence no verb. This sentence short. This signature done.