Xref: utzoo comp.unix.wizards:16674 comp.lang.c:19177 Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!apple!ames!haven!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.unix.wizards,comp.lang.c Subject: Re: Optimal for loop on the 68020. Keywords: for ( i = COUNT; --i >= 0; ) Message-ID: <17891@mimsy.UUCP> Date: 5 Jun 89 20:11:24 GMT References: <11993@well.UUCP> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 41 In article <11993@well.UUCP> pokey@well.UUCP (Jef Poskanzer) writes: >... COUNT was a small (< 127) compile-time constant. > for ( i = COUNT; --i >= 0; ) [all but gcc -O -fstrength-reduce deleted] > moveq #COUNT,d0 > jra tag2 >tag1: > >tag2: > dbra d0,tag1 > clrw d0 > subql #1,d0 > jcc tag1 >... But wait! What's that chud after the loop? Let's see, clear d1 >to zero, subtract one from it giving -1 and setting carry, and jump >if carry is clear. Hmm, looks like a three-instruction no-op to me! No---the problem is that `dbra' decrements a *word*, compares the result against -1, and (if not -1) braches. The semantics of the loop demands a 32 bit comparison. The only reason it is not necessary in this particular case is the first quoted line above. Still, it would be nice if gcc always used the dbra/clrw/subql/jcc sequence for `--x >= 0' loops, since it does always work. The `clrw' fixes up the case where the 16-bit result has gone to -1: before decrement: wxyz 0000 after decrement: wxyz FFFF after clrw: wxyz 0000 after subql: wxyz-1 FFFF The dbra loop is so much faster that the extra time and space for one `unnecessary' dbra+clrw (when the loop really does go from 0 to -1, and at every 65536 trips when the loop counter is large and positive) that I would make this optimisation unconditional. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris