Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!husc6!uwvax!oddjob!mimsy!chris
From: chris@mimsy.UUCP (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: Efficient Coding Practices
Message-ID: <13837@mimsy.UUCP>
Date: 3 Oct 88 19:46:58 GMT
References: <8809191521.AA17824@ucbvax.Berkeley.EDU> <68995@sun.uucp> <846@proxftl.UUCP>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 45

>In article <34112@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
>>Ignoring library routines, inline routines, etc, suppose that we want
>>to copy n bytes from one place to another, say array src to array dst.
>>We might write
>>	for(i=0;i<n;i++) dst[i]=src[i];

>>Let's hand optimize this.

[This was part of an argument *against*, but let that stand:]

>>	tmp1 = dst;
>>	tmp2 = src;
>>	for (i=0;i<n;i++) *tmp1++ = *tmp2++;

In article <846@proxftl.UUCP> francis@proxftl.UUCP (Francis H. Yu) writes:
>The better code is 
>	tmp1 = dst;
>	tmp2 = src;
>	tmp3 = dst + n;
>	while (tmp1 != tmp3) {
>		*tmp1++ = *tmp2++;
>	}

Better for whom?

On a 68010, the following is *much* better:

	register short n1;
	if ((n1 = n - 1) >= 0)
		do *tmp1++ = *tmp2++; while (--n1 != -1);

because it can compile into a `dbra' loop and take advantage of the
68010 loop mode.  But this is much less efficient on a VAX than the
`movc3' instruction that the compiler might generate for the original
loop.  But the second way is better for the Foobar CPU, which has a
`count-up' loop mode; but the third is better for the BazWoRKS chip.

This is micro-efficiency at its finest: you cannot characterise it
outside of its environment.  Which loop is `best' is heavily machine
dependent.  If that loop takes much time, go ahead and optimise it,
but if not, you might as well not bother, since everyone else will
just have to re-optimise it differently anyway.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris