Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!julius.cs.uiuc.edu!rpi!crdgw1!uunet!overload!dillon
From: dillon@overload.Berkeley.CA.US (Matthew Dillon)
Newsgroups: comp.sys.amiga.tech
Subject: Re: C compilers code generation
Message-ID: <dillon.7176@overload.Berkeley.CA.US>
Date: 12 Nov 90 22:13:10 GMT
References: <1149@teslab.lab.OZ>
Lines: 67

>In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes:
>Over the years I have been intrigued by the code generated by
>different C compilers, and have been comparing Lattice C code with
>Aztec C.  From the first it always seemed that Lattice performed more
>optimizations but that Aztec did better simply because of better code
>generation.  Nowadays, they seem to be much closer, producing
>reasonable code with simple optimizations - but there is a lot of
>room for improvement.
>
>Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02
>and PDC 3.34 using several benchmarks.  On disassembling the
> ...
>for this loop (with all optimizations on).
>
>     LATTICE/AZTEC	      DICE		      PDC
>
>     MOVE.W  I,K	      MOVE.W  I,D0	      EXT.L   I
>     ADD.W   I,K	      EXT.L   D0	      EXT.L   I
>			      MOVE,W  K,D1	      MOVE.L  I,D0
>			      EXT.L   D1	      ADD.L   I,D0
>			      ADD.L   D0,D1	      MOVE.L  D0,K
>			      MOVE.W  D1,D3
>     BRA.B   IN	      BRA.B   IN
>
>LOOP LEA     f(A4),A0   LOOP LEA     f(A4),A0   LOOP CMPI.L  #8190,K
>     CLR.B   0(A0,K.W)       ADDA.W  K,A0            BGT.B   OUT
>			      MOVE.B  #0,(A0)         LEA     f(A4),A0
>     ADD.W   I,K	      ADD.W   I,K	      ADDA.L  K,A0
>IN   CMPI.W  #8190,K	 IN   CMPI.W  #8190,K	      CLR.B   (A0)
>     BLE.B   LOOP	      BLE.B   LOOP	      ADD.L   I,K
>						      BRA.B   LOOP
>						 OUT  ...
>
>I calculated the total 68000 clock cycles for the inner loop
>(excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and

    Neat!  BTW, DICE now optimizes short adds when the result is also a
    short, the initialization part of the loop generates:

	move.w	D0,D1
	add.w	D0,D1
	bra	IN

    It also optimizes other arithmatic and logical operations that act
    entirely on shorts (DICE is a 32bit-int compiler only, BTW.  In the
    above code was Aztec and Lattice run in 32bit-int modes? Probably, but
    just wondering...).

    As far as the inner loop goes, I'm kind of proud of DICE in that it
    does a pretty good job without any real optimization at all. Lattice
    and Aztec could actually get more speed out of their code if they did
    not use CLR.  CLR always reads the location before writing a 0.

    PDC looks like it needs a lot of work.

>Andrew.
>--
>Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712

					-Matt

--

    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA