Xref: utzoo comp.lang.c:12211 comp.arch:6201
Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!oliveb!amdahl!chuck
From: chuck@amdahl.uts.amdahl.com (Charles Simmons)
Newsgroups: comp.lang.c,comp.arch
Subject: Re: Explanation, please!
Message-ID: <ac4GLe9fit1010twl3.@amdahl.uts.amdahl.com>
Date: 30 Aug 88 04:31:27 GMT
References: <653@paris.ICS.UCI.EDU> <2877@ttrdc.UUCP>
Reply-To: chuck@amdahl.uts.amdahl.com (Charles Simmons)
Organization: Amdahl Corporation, Sunnyvale  CA
Lines: 26

In article <2877@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
>In article <653@paris.ICS.UCI.EDU>, schmidt@bonnie.ics.uci.edu (Douglas C. Schmidt) writes:
>>    Since I posted my original question there has been a great deal of
>> abstract discussion about the relative merits of the loop unrolling
>> scheme.  The topic has piqued my curiousity, so I when ahead and
>> implemented a short test program, included below, to test Duff's
>> device against the ``ordinary for loop w/index variable'' technique.
>> See for yourself....   
>> 
>> After some quick testing I found that gcc 1.26 -O on a Sun 3 and a
>> Sequent Balance was pretty heavily in favor of the regular (non-Duff)
>> loop.  Your mileage may vary.  I realize that there may be other
>> tests, and if anyone has a better version, I'd like to see it!
>
>I modified this program to run under System V, changed the arrays to be dynam-
>ically allocated, and changed both the Duff and ordinary copies to use register
>pointers instead of global pointers (for the Duff copy) and array indexing (for
>the ordinary copy).  I then tried it on a SVR2 3B20, a SVR3 3B2, a Sun-3, and a
>Sun-4 both with and without -O optimization (using the standard pcc-type C
>compiler on each system).  The result?  Duff wins by about 10%-20% on all
>machines tested.

I then added a piece to the program to use 'memcpy'.  The results?
Duff beats a simple loop by 10%.  'memcpy' is 9 times faster than
Duff.  So why do people spend so much time avoiding standard subroutines?

-- Chuck