Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!ucsd!dog.ee.lbl.gov!elf.ee.lbl.gov!torek
From: torek@elf.ee.lbl.gov (Chris Torek)
Newsgroups: comp.arch
Subject: Re: Loop instructions
Message-ID: <12330@dog.ee.lbl.gov>
Date: 22 Apr 91 01:34:58 GMT
References: <1991Apr16.152438.3445@waikato.ac.nz> <12739@pt.cs.cmu.edu> <1991Apr21.210031.16749@leland.Stanford.EDU>
Reply-To: torek@elf.ee.lbl.gov (Chris Torek)
Organization: Lawrence Berkeley Laboratory, Berkeley
Lines: 50
X-Local-Date: Sun, 21 Apr 91 18:34:59 PDT

>In article <12739@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu
>(Donald Lindsay) writes:
>>Compiler writers dislike [the 68000 DBcc] instruction, but not because
>>of the test semantics. The killer is that the count is 16 bits, on a
>>machine where variables and expressions are naturally 32 bits. ...

In article <1991Apr21.210031.16749@leland.Stanford.EDU>
dhinds@elaine18.Stanford.EDU (David Hinds) writes:
>    I'm not familiar with the 68000 instruction set, but couldn't this
>instruction be adapted to 32-bit counts by just splitting the count into
>upper and lower half-words and using a nested pair of 16-bit loops?

No splitting is necessary: instead of compiling

	do {
		...
	} while (i-- != 0);
	/* i now dead, hence need not = -1 */

as

		jra	Lloop
	Ltop:
		subql	#1,d2
	Lloop:
		...
		tstl	d2
		jne	Ltop

one compiles it as (approximately):

	Lloop:
		...
		dbra	d2,Loop
		/*
		 * at this point, low(d2) == 0xffff;
		 * high(d2) is unchanged but should be decremented;
		 * the loop is finished iff d2==-1 afterward
		 */
		subl	#0x10000,d2
		cmpl	#-1,d2
		jne	Lloop

However, it turns out that on the 68020 it is almost invariably faster
to avoid DBcc anyway (bcopy, for instance, should be unrolled).  Score
0 for fancy instructions :-)  (Note that a dbra bcopy is the fastest
available on the 68010, but not on the 68000!)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov