Path: utzoo!utgpu!water!watmath!clyde!ima!johnl From: johnl@ima.ima.isc.com (John R. Levine) Newsgroups: comp.lang.c Subject: Re: Efficient Coding Practices Summary: Enough, already. Message-ID: <2732@ima.ima.isc.com> Date: 3 Oct 88 21:23:42 GMT References: <8809191521.AA17824@ucbvax.Berkeley.EDU> <68995@sun.uucp> <23025@amdcad.AMD.COM> <607@ardent.UUCP> <836@proxftl.UUCP> <34112@XA <34196@XAIT.Xerox.COM> Reply-To: johnl@ima.UUCP (John R. Levine) Organization: Not much Lines: 40 In article <34196@XAIT.Xerox.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes: >>! [ first allegedly optimal code ] >>! tmp1 = dst; >>! tmp2 = src; >>! for (i=0;i >> [second allegedly optimal code] >> tmp1 = dst; >> tmp2 = src; >> tmp3 = dst + n; >> while (tmp1 != tmp3) { >> *tmp1++ = *tmp2++; > [ third allegedly optimal code] > register int i; > ... > tmp1 = dst; > tmp2 = src; > for (i=n;i;--i) *tmp1++ = *tmp++; On an Intel 386, assuming your compiler isn't smart enough to recognize such loops and generate string move instructions, and assuming the two blocks don't overlap, your best bet probably is: register i, rdst = dst, rsrc = src; for(i = n; --i; ) rdst[i] = rsrc[i]; This uses the 386's scaled index modes and loop control instructions and generates a loop two instructions long. On non-Vax machines *p++ does not generate particularly good code, after all. The message here is that unless you have a specific performance problem in a specific environment, such micro-optimization is a waste of time since the "best" code depends heavily on the particular instruction set and addressing model in use. -- John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869 { bbn | think | decvax | harvard | yale }!ima!johnl, Levine@YALE.something Rome fell, Babylon fell, Scarsdale will have its turn. -G. B. Shaw