Newsgroups: comp.sys.amiga.programmer Path: utzoo!utgpu!watserv1!watdragon!rose!ccplumb From: ccplumb@rose.uwaterloo.ca (Colin Plumb) Subject: Re: Lemmings - a tutorial Part V (last) Message-ID: <1991Apr6.225956.21886@watdragon.waterloo.edu> Sender: news@watdragon.waterloo.edu (News Owner) Organization: University of Waterloo References: <1991Apr2.002631.22799@mintaka.lcs.mit.edu> <20243@cbmvax.commodore.com> Date: Sat, 6 Apr 1991 22:59:56 GMT Lines: 50 Gcc (written as a function, arguments passed in): _foo: movel a6@(8),a0 movel a6@(12),a1 tstb a0@ jeq L5 L4: moveb a0@+,a1@+ tstb a0@ jne L4 L5: rts jesup@cbmvax.commodore.com (Randell Jesup) wrote: >SAS C: (5.10a) > | 0000 48E7 0030 MOVEM.L A2-A3,-(A7) > | 0004 47EC 0000-02.2 LEA 02.00000000(A4),A3 > | 0008 45EC 0000-01.2 LEA 01.00000000(A4),A2 > | 000C 6002 BRA.B 0010 > | 000E 16DA MOVE.B (A2)+,(A3)+ > | 0010 4A12 TST.B (A2) > | 0012 66FA BNE.B 000E > | 0014 4CDF 0C00 MOVEM.L (A7)+,A2-A3 > | 0018 4E75 RTS > > It does use a2/a3 instead of a0/a1. However it beats the GNU >version slightly by jumping to the test instead having two copies of it. We must disagree on what is good optimisation... I consider gcc's duplication of the test to be a feature, and SAS's jump-to-the-end a missed optimisation. It's clearly faster the way gcc does it. (Gcc saves one untaken branch in the no-execute case, and one taken branch in the execute case.) However, move.b a0@+,d0 jeq L5 L4: moveb d0,a1@+ move.b a0@+,d0 jne L4 L5: rts Is faster still, by 4 clocks per loop iteration on a 68000. I'm submitting this as a bug in gcc. -- -Colin