Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!uc!nic.MR.NET!thor.acc.stolaf.edu!mike
From: mike@thor.acc.stolaf.edu (Mike Haertel)
Newsgroups: comp.lang.c
Subject: Re: faster bcopy using duffs device (source)
Keywords: loop unrolling, optimize, hacks
Message-ID: <5603@thor.acc.stolaf.edu>
Date: 8 Sep 89 19:51:43 GMT
References: <5180@portia.Stanford.EDU> <19473@mimsy.UUCP>
Reply-To: mike@thor.stolaf.edu ()
Organization: St. Olaf College, Northfield, MN
Lines: 22

In article <19473@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>bcopy() should be written in assembly (on most processors), put in
>a library, and forgotten about, because---for instance---a dbra loop
>beats a Duff loop on a 68010, every time.  (And on a 68000, a loop
>using movml is best.  68020s have an I-cache, so a hand-coded `Duffish'
>loop is a good bet.  Some VAXen have a special instruction which does
>a good job. [ . . . ]

I just tried the obvious bcopy "while (n--) *s++ = *d++;"
on a 68010 using gcc.  It produced a dbra loop that beat
the sh*t out of the supposedly carefully handcoded one
in the C library.  (Which is a Duffish sort of thing that
tries to copy fullwords at a time.  Not Duff's device,
but structurally similar.)

If you have a halfway decent compiler, I bet a lot of
the string routines will compile to excellent code using
just the obvious C implementations.
-- 
Mike Haertel <mike@stolaf.edu>
``There's nothing remarkable about it.  All one has to do is hit the right
  keys at the right time and the instrument plays itself.'' -- J. S. Bach