Xref: utzoo comp.lang.c:12211 comp.arch:6201 Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!oliveb!amdahl!chuck From: chuck@amdahl.uts.amdahl.com (Charles Simmons) Newsgroups: comp.lang.c,comp.arch Subject: Re: Explanation, please! Message-ID: Date: 30 Aug 88 04:31:27 GMT References: <653@paris.ICS.UCI.EDU> <2877@ttrdc.UUCP> Reply-To: chuck@amdahl.uts.amdahl.com (Charles Simmons) Organization: Amdahl Corporation, Sunnyvale CA Lines: 26 In article <2877@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes: >In article <653@paris.ICS.UCI.EDU>, schmidt@bonnie.ics.uci.edu (Douglas C. Schmidt) writes: >> Since I posted my original question there has been a great deal of >> abstract discussion about the relative merits of the loop unrolling >> scheme. The topic has piqued my curiousity, so I when ahead and >> implemented a short test program, included below, to test Duff's >> device against the ``ordinary for loop w/index variable'' technique. >> See for yourself.... >> >> After some quick testing I found that gcc 1.26 -O on a Sun 3 and a >> Sequent Balance was pretty heavily in favor of the regular (non-Duff) >> loop. Your mileage may vary. I realize that there may be other >> tests, and if anyone has a better version, I'd like to see it! > >I modified this program to run under System V, changed the arrays to be dynam- >ically allocated, and changed both the Duff and ordinary copies to use register >pointers instead of global pointers (for the Duff copy) and array indexing (for >the ordinary copy). I then tried it on a SVR2 3B20, a SVR3 3B2, a Sun-3, and a >Sun-4 both with and without -O optimization (using the standard pcc-type C >compiler on each system). The result? Duff wins by about 10%-20% on all >machines tested. I then added a piece to the program to use 'memcpy'. The results? Duff beats a simple loop by 10%. 'memcpy' is 9 times faster than Duff. So why do people spend so much time avoiding standard subroutines? -- Chuck