Xref: utzoo comp.lang.c:12327 comp.arch:6234 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!njin!princeton!mccc!njsmu!klg From: klg@njsmu.UUCP (Kenneth Goodwin) Newsgroups: comp.lang.c,comp.arch Subject: Re: Explanation, please! Summary: UNIONS are handy Message-ID: <468@njsmu.UUCP> Date: 1 Sep 88 17:19:39 GMT References: <638@paris.ics.uci.edu> <9064@pur-ee.UUCP> Organization: NJ State Medical Underwriters, Lawrenceville Lines: 90 In article <9064@pur-ee.UUCP>, hankd@pur-ee.UUCP (Hank Dietz) writes: > In article <189@bales.UUCP>, nat@bales.UUCP (Nathaniel Stitt) writes: > > Here is my own personal version of the "Portable Optimized Copy" routine. > 2. If the number of items/bytes is not known, then build a binary tree of > such structs and copy half, then half of what remains, etc. This is > struct t512 { int t[512]; }; > struct t256 { int t[256]; }; > struct t128 { int t[128]; }; .... etc ..... > if (n & 512) { > *((struct t512 *) q) = *((struct t512 *) p); q+=512; p+=512; > } > if (n & 256) { > *((struct t256 *) q) = *((struct t256 *) p); q+=256; p+=256; > } ... etc ... > Incidentally, this ran about 8x faster (on a VAX 11/780) than using > the usual copy loop. Unfortunately, the above code should have been > written as: > > if (n & 512) { > *(((struct t512 *) q)++) = *(((struct t512 *) p)++); > } > ... BUT This is where UNIONS come in handy, I used a similar although more brief technique for a faster version of a bmov() (byte move) subroutine on our PDP11-70 a while ago, and subsequently ported it to memcpy when we updated from V6 to System V. The basic idea that was used is to create a union of long, int, (short), and char pointers, use the character pointer to achieve the needed alignments and then use the largest available pointer to do the copy. There is no reason why a stucture copy could not be used, although I suspect on NON-VAX systems it may actually be detremental (sp?) in some cases. The PDP11 C compiler used to stuff registers onto the stack and create a 16 bit word copy loop to do structure copies using the freed registers, restoring them when it was done. So a structure copy would be the same as a word copy on that style of a system (ie, ones without block move instructions) So In the case of your example, a modified brief version of it would be: union ptr_types { struct t512 { int t512[512] } *t512; .... struct t32 { int t32[32] } *t32; long *t_long; int *t_int; short *t_short; char *t_char; } ; (probably could dispense with long and short pointers and related tests) memcpy(a, b, len) char *a; *b; { register union ptr_types a_ptr, b_ptr; a_ptr.t_char = a; b_ptr.t_char = b; while(NOT ON A WORD BOUNDARY AND CHARS LEFT) { *a_ptr.t_char++ = *b_ptr.t_char++; len--; } if(len >= sizeof(int) * 512) { /* if we can use a 512 int structure copy */ *a_ptr.t512++ = *b_ptr.t512++; len -= (512 * sizeof(int)); } /*M the biggest win is that the pointers increment correctly len -= (sizeof(*element pointer)) is the correct form over N INTS * sizeof int */ ....... I guess the rest is obvious, some GLUE may be needed that has not be shown.... :-) Boundaries should be checked on source and destination addresses to avoid memory faults.... As you may be given incompatible source and destination address that may require a full char by char copy. The first test loop sort of does this, but all the other copies should also check for proper address alignments before proceeding. Ken Goodwin NJSMU.