Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!watserv1!watdragon!rose!ccplumb
From: ccplumb@rose.uwaterloo.ca (Colin Plumb)
Subject: Re:     Lemmings - a tutorial Part V (last)
Message-ID: <1991Apr6.225956.21886@watdragon.waterloo.edu>
Sender: news@watdragon.waterloo.edu (News Owner)
Organization: University of Waterloo
References: <mykes.1028@amiga0.SF-Bay.ORG> <1991Apr2.002631.22799@mintaka.lcs.mit.edu> <20243@cbmvax.commodore.com>
Date: Sat, 6 Apr 1991 22:59:56 GMT
Lines: 50

Gcc (written as a function, arguments passed in):

_foo:
	movel a6@(8),a0
	movel a6@(12),a1
	tstb a0@
	jeq L5
L4:
	moveb a0@+,a1@+
	tstb a0@
	jne L4
L5:
	rts


jesup@cbmvax.commodore.com (Randell Jesup) wrote:

>SAS C: (5.10a)
>       | 0000  48E7 0030                      MOVEM.L   A2-A3,-(A7)
>       | 0004  47EC  0000-02.2                LEA       02.00000000(A4),A3
>       | 0008  45EC  0000-01.2                LEA       01.00000000(A4),A2
>       | 000C  6002                           BRA.B     0010
>       | 000E  16DA                           MOVE.B    (A2)+,(A3)+
>       | 0010  4A12                           TST.B     (A2)
>       | 0012  66FA                           BNE.B     000E
>       | 0014  4CDF 0C00                      MOVEM.L   (A7)+,A2-A3
>       | 0018  4E75                           RTS
>
>	It does use a2/a3 instead of a0/a1.  However it beats the GNU
>version slightly by jumping to the test instead having two copies of it.

We must disagree on what is good optimisation... I consider gcc's duplication
of the test to be a feature, and SAS's jump-to-the-end a missed optimisation.
It's clearly faster the way gcc does it.  (Gcc saves one untaken branch
in the no-execute case, and one taken branch in the execute case.)

However,
	move.b	a0@+,d0
	jeq L5
L4:
	moveb d0,a1@+
	move.b	a0@+,d0
	jne L4
L5:
	rts

Is faster still, by 4 clocks per loop iteration on a 68000.  I'm submitting
this as a bug in gcc.
-- 
	-Colin