Xref: utzoo comp.unix.wizards:16879 comp.lang.c:19283 Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!ames!elroy!usc!hacgate!ashtate!dbase!awd From: awd@dbase.UUCP (Alastair Dallas) Newsgroups: comp.unix.wizards,comp.lang.c Subject: Re: Optimal for loop on the 68020. Summary: Microsoft C v 5.1 gives similar results Keywords: for ( i = COUNT; --i >= 0; ) Message-ID: <96@dbase.UUCP> Date: 10 Jun 89 20:45:57 GMT References: <11993@well.UUCP> Organization: Ashton Tate Devlopment Center Glendale, Calif. Lines: 57 In article <11993@well.UUCP>, pokey@well.UUCP (Jef Poskanzer) writes: > SUMMARY > ------- > > I compiled the following different kinds of for loops: > > for ( i = 0; i < COUNT; i++ ) > for ( i = 0; i < COUNT; ++i ) > for ( i = 0; ++i <= COUNT; ) > for ( i = 0; i++ < COUNT; ) > for ( i = COUNT; i > 0; i-- ) > for ( i = COUNT; i > 0; --i ) > for ( i = COUNT; --i >= 0; ) > for ( i = COUNT; i-- > 0; ) > > on a Sun 3/50 with both SunOS 3.5 cc and gcc 1.35, and looked at the > generated code and the timings. COUNT was a small (< 127) compile-time > constant. The loop body did not reference i and had no side-effects. > In theory, all eight of these loops should have optimized to the most > efficient loop possible, ignoring the otherwise unreferenced variable i > and simply traversing the loop body the proper number of times. On the > 68020, this most efficient loop is a dbra instruction. But in fact, cc > never generates dbra >
> > CONCLUSION > ---------- > > For both compilers and all levels of optimization, this loop: > > for ( i = COUNT; --i >= 0; ) > > gives the lowest overhead. I tried these eight loops on Microsoft C v5.1 and was very surprised to get the same results, more or less. Jef's fastest loop (--i >= 0) on his Sun was also MSC's fastest loop on an 80386. And for the same reason: the compiler wakes up an realizes that the condition flags are already set. On 80x86, JCXZ is the fast loop instruction, but it is ignored by the MSC optimizer as well. (To be fair, I didn't turn on any special optimization, so I can't say MSC won't do it under some conditions.) Making i a register variable would obviously improve the timings, but MSC uses the SI register, not CX, and therefore can't take advantage of JCXZ. The code MSC generates for (--i >= 0) is approx. 47% of the t-states used by the code it generates for the typical (i=0; i