Path: utzoo!utgpu!water!watmath!clyde!ima!johnl
From: johnl@ima.ima.isc.com (John R. Levine)
Newsgroups: comp.lang.c
Subject: Re: Efficient Coding Practices
Summary: Enough, already.
Message-ID: <2732@ima.ima.isc.com>
Date: 3 Oct 88 21:23:42 GMT
References: <8809191521.AA17824@ucbvax.Berkeley.EDU> <68995@sun.uucp> <23025@amdcad.AMD.COM> <607@ardent.UUCP> <836@proxftl.UUCP> <34112@XA <34196@XAIT.Xerox.COM>
Reply-To: johnl@ima.UUCP (John R. Levine)
Organization: Not much
Lines: 40

In article <34196@XAIT.Xerox.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>! [ first allegedly optimal code ]
>>!	tmp1 = dst;
>>!	tmp2 = src;
>>!	for (i=0;i<n;i++) *tmp1++ = *tmp2++;
>
>> [second allegedly optimal code]
>>	tmp1 = dst;
>>	tmp2 = src;
>>	tmp3 = dst + n;
>>	while (tmp1 != tmp3) {
>>		*tmp1++ = *tmp2++;
> [ third allegedly optimal code]
>	register int i;
>	...
>	tmp1 = dst;
>	tmp2 = src;
>	for (i=n;i;--i) *tmp1++ = *tmp++;

On an Intel 386, assuming your compiler isn't smart enough to recognize such
loops and generate string move instructions, and assuming the
two blocks don't overlap, your best bet probably is:

	register i, rdst = dst, rsrc = src;

	for(i = n; --i; )
		rdst[i] = rsrc[i];

This uses the 386's scaled index modes and loop control instructions and
generates a loop two instructions long.  On non-Vax machines *p++ does
not generate particularly good code, after all.

The message here is that unless you have a specific performance problem in
a specific environment, such micro-optimization is a waste of time since
the "best" code depends heavily on the particular instruction set and
addressing model in use.
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ bbn | think | decvax | harvard | yale }!ima!johnl, Levine@YALE.something
Rome fell, Babylon fell, Scarsdale will have its turn.  -G. B. Shaw