Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!husc6!uwvax!oddjob!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.c Subject: Re: Efficient Coding Practices Message-ID: <13837@mimsy.UUCP> Date: 3 Oct 88 19:46:58 GMT References: <8809191521.AA17824@ucbvax.Berkeley.EDU> <68995@sun.uucp> <846@proxftl.UUCP> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 45 >In article <34112@XAIT.XEROX.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>Ignoring library routines, inline routines, etc, suppose that we want >>to copy n bytes from one place to another, say array src to array dst. >>We might write >> for(i=0;i>Let's hand optimize this. [This was part of an argument *against*, but let that stand:] >> tmp1 = dst; >> tmp2 = src; >> for (i=0;i francis@proxftl.UUCP (Francis H. Yu) writes: >The better code is > tmp1 = dst; > tmp2 = src; > tmp3 = dst + n; > while (tmp1 != tmp3) { > *tmp1++ = *tmp2++; > } Better for whom? On a 68010, the following is *much* better: register short n1; if ((n1 = n - 1) >= 0) do *tmp1++ = *tmp2++; while (--n1 != -1); because it can compile into a `dbra' loop and take advantage of the 68010 loop mode. But this is much less efficient on a VAX than the `movc3' instruction that the compiler might generate for the original loop. But the second way is better for the Foobar CPU, which has a `count-up' loop mode; but the third is better for the BazWoRKS chip. This is micro-efficiency at its finest: you cannot characterise it outside of its environment. Which loop is `best' is heavily machine dependent. If that loop takes much time, go ahead and optimise it, but if not, you might as well not bother, since everyone else will just have to re-optimise it differently anyway. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris