Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!ucsd!dog.ee.lbl.gov!elf.ee.lbl.gov!torek From: torek@elf.ee.lbl.gov (Chris Torek) Newsgroups: comp.arch Subject: Re: Loop instructions Message-ID: <12330@dog.ee.lbl.gov> Date: 22 Apr 91 01:34:58 GMT References: <1991Apr16.152438.3445@waikato.ac.nz> <12739@pt.cs.cmu.edu> <1991Apr21.210031.16749@leland.Stanford.EDU> Reply-To: torek@elf.ee.lbl.gov (Chris Torek) Organization: Lawrence Berkeley Laboratory, Berkeley Lines: 50 X-Local-Date: Sun, 21 Apr 91 18:34:59 PDT >In article <12739@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu >(Donald Lindsay) writes: >>Compiler writers dislike [the 68000 DBcc] instruction, but not because >>of the test semantics. The killer is that the count is 16 bits, on a >>machine where variables and expressions are naturally 32 bits. ... In article <1991Apr21.210031.16749@leland.Stanford.EDU> dhinds@elaine18.Stanford.EDU (David Hinds) writes: > I'm not familiar with the 68000 instruction set, but couldn't this >instruction be adapted to 32-bit counts by just splitting the count into >upper and lower half-words and using a nested pair of 16-bit loops? No splitting is necessary: instead of compiling do { ... } while (i-- != 0); /* i now dead, hence need not = -1 */ as jra Lloop Ltop: subql #1,d2 Lloop: ... tstl d2 jne Ltop one compiles it as (approximately): Lloop: ... dbra d2,Loop /* * at this point, low(d2) == 0xffff; * high(d2) is unchanged but should be decremented; * the loop is finished iff d2==-1 afterward */ subl #0x10000,d2 cmpl #-1,d2 jne Lloop However, it turns out that on the 68020 it is almost invariably faster to avoid DBcc anyway (bcopy, for instance, should be unrolled). Score 0 for fancy instructions :-) (Note that a dbra bcopy is the fastest available on the 68010, but not on the 68000!) -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov