Xref: utzoo comp.lang.c:12273 comp.arch:6213 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!bloom-beacon!mit-eddie!uw-beaver!uw-june!rik From: rik@june.cs.washington.edu (Rik Littlefield) Newsgroups: comp.lang.c,comp.arch Subject: Loop unfolding Summary: Loop unfolding is useful for more than just copy Message-ID: <5612@june.cs.washington.edu> Date: 1 Sep 88 01:03:05 GMT Organization: U of Washington, Computer Science, Seattle Lines: 31 All of the examples of loop unfolding recently discussed on the net have implemented just copying. Several authors have suggested improving on the loop unfolding method by using a (pseudo-) standard routine like 'memcpy', or by declaring large structures that the compiler can generate good code for moving. I have seen at least three cases where loop unfolding was very productive but neither of the above suggestions seems to apply. All were in time- critical production applications. 1. For an ultrasonic inspection program, the inner loop contained a summation along the lines of s += *p++; 2. In an image processing program, the inner loop was an indexed move like *p++ = *(*q++); 3. A driver for a memory-mapped I/O device used multiple stores into a single address: *q = *p++; As I said, unrolling was a very effective way of removing virtually all the overhead from these loops. Can anyone suggest other solutions analogous to the alternatives mentioned above, or for that matter, any better solution other than assembly language? --Rik