Xref: utzoo comp.lang.c:12279 comp.arch:6215 Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.lang.c,comp.arch Subject: Re: Explanation, please! Summary: A dumb note... Message-ID: <9064@pur-ee.UUCP> Date: 31 Aug 88 21:52:14 GMT References: <638@paris.ics.uci.edu> <189@bales.UUCP> Organization: Purdue University Engineering Computer Network Lines: 80 In article <189@bales.UUCP>, nat@bales.UUCP (Nathaniel Stitt) writes: > Here is my own personal version of the "Portable Optimized Copy" routine. .... then he gives a rather verbose, but structured, encoding.... As long as we're getting into structured, portable, hacks, let me suggest the following two ways of doing block copy: 1. If the number of items/bytes is known at compile time, then you can define a struct type of the appropriate size and use struct assign. with type casts to make it fly. For example, suppose p and q are pointers to ints and I want to copy 601 ints from p to q. Then I can write the fast and surprizingly portable: struct t601 { int t[601]; }; *((struct t601 *) q) = *((struct t601 *) p); Of course, you do have to watch-out for alignment problems, but if your compiler doesn't generate very fast code for this.... 2. If the number of items/bytes is not known, then build a binary tree of such structs and copy half, then half of what remains, etc. This is funny looking, but very fast also. Suppose the number of ints (n) is not known at compile time, but can't be more than 601. You can write: struct t512 { int t[512]; }; struct t256 { int t[256]; }; struct t128 { int t[128]; }; struct t64 { int t[64]; }; struct t32 { int t[32]; }; struct t16 { int t[16]; }; struct t8 { int t[8]; }; struct t4 { int t[4]; }; struct t2 { int t[2]; }; if (n & 512) { *((struct t512 *) q) = *((struct t512 *) p); q+=512; p+=512; } if (n & 256) { *((struct t256 *) q) = *((struct t256 *) p); q+=256; p+=256; } if (n & 128) { *((struct t128 *) q) = *((struct t128 *) p); q+=128; p+=128; } if (n & 64) { *((struct t64 *) q) = *((struct t64 *) p); q+=64; p+=64; } if (n & 32) { *((struct t32 *) q) = *((struct t32 *) p); q+=32; p+=32; } if (n & 16) { *((struct t16 *) q) = *((struct t16 *) p); q+=16; p+=16; } if (n & 8) { *((struct t8 *) q) = *((struct t8 *) p); q+=8; p+=8; } if (n & 4) { *((struct t4 *) q) = *((struct t4 *) p); q+=4; p+=4; } if (n & 2) { *((struct t2 *) q) = *((struct t2 *) p); q+=2; p+=2; } if (n & 1) *q = *p; Notice that, in this case, n, p, and q should be declared as being register variables and that p and q are altered by this routine. Of course, you can copy larger things by making larger power-of-2 sized structs. Incidentally, this ran about 8x faster (on a VAX 11/780) than using the usual copy loop. Unfortunately, the above code should have been written as: if (n & 512) { *(((struct t512 *) q)++) = *(((struct t512 *) p)++); } ... but, for some unknown reason, the VAX C compiler didn't like that. Enjoy. hankd@ee.ecn.purdue.edu