Xref: utzoo comp.lang.c:12219 comp.arch:6203
Path: utzoo!utgpu!water!watmath!clyde!att!cbnews!lvc
From: lvc@cbnews.ATT.COM (Lawrence V. Cipriani)
Newsgroups: comp.lang.c,comp.arch
Subject: Re: Explanation, please!
Message-ID: <1002@cbnews.ATT.COM>
Date: 30 Aug 88 12:56:52 GMT
References: <653@paris.ICS.UCI.EDU> <2877@ttrdc.UUCP> <ac4GLe9fit1010twl3.@amdahl.uts.amdahl.com>
Reply-To: lvc@cbnews.ATT.COM (Lawrence V. Cipriani)
Organization: AT&T Bell Laboratories, Columbus
Lines: 16

In article <ac4GLe9fit1010twl3.@amdahl.uts.amdahl.com> chuck@amdahl.uts.amdahl.com (Charles Simmons) writes:
	[discussion of Duff copy deleted]
>I then added a piece to the program to use 'memcpy'.  The results?
>Duff beats a simple loop by 10%.  'memcpy' is 9 times faster than
>Duff.  So why do people spend so much time avoiding standard subroutines?

Sometimes the standard subroutines are implemented horribly.  I was
horrified when I saw that the machine dependent version of memcpy
on the AT&T 3Bs is nothing but a byte by byte transfer written in
assembly language.  It is tricky, but doable, to speed this up by a
roughly a factor of sizeof(long).  In fact it already is done in the
3B implementation of the UNIX(tm) operating system in the copyin (?)
routine.  Why wasn't it done in memcpy too?  Sigh.

-- 
Larry Cipriani, AT&T Network Systems, Columbus OH, cbnews!lvc lvc@cbnews.ATT.COM