Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!uc!nic.MR.NET!thor.acc.stolaf.edu!mike From: mike@thor.acc.stolaf.edu (Mike Haertel) Newsgroups: comp.lang.c Subject: Re: faster bcopy using duffs device (source) Keywords: loop unrolling, optimize, hacks Message-ID: <5603@thor.acc.stolaf.edu> Date: 8 Sep 89 19:51:43 GMT References: <5180@portia.Stanford.EDU> <19473@mimsy.UUCP> Reply-To: mike@thor.stolaf.edu () Organization: St. Olaf College, Northfield, MN Lines: 22 In article <19473@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >bcopy() should be written in assembly (on most processors), put in >a library, and forgotten about, because---for instance---a dbra loop >beats a Duff loop on a 68010, every time. (And on a 68000, a loop >using movml is best. 68020s have an I-cache, so a hand-coded `Duffish' >loop is a good bet. Some VAXen have a special instruction which does >a good job. [ . . . ] I just tried the obvious bcopy "while (n--) *s++ = *d++;" on a 68010 using gcc. It produced a dbra loop that beat the sh*t out of the supposedly carefully handcoded one in the C library. (Which is a Duffish sort of thing that tries to copy fullwords at a time. Not Duff's device, but structurally similar.) If you have a halfway decent compiler, I bet a lot of the string routines will compile to excellent code using just the obvious C implementations. -- Mike Haertel ``There's nothing remarkable about it. All one has to do is hit the right keys at the right time and the instrument plays itself.'' -- J. S. Bach