Path: utzoo!attcan!uunet!munnari.oz.au!mel.dit.csiro.au!yarra!pta!teti!teslab!andrew
From: andrew@teslab.lab.OZ (Andrew Phillips)
Newsgroups: comp.sys.amiga.tech
Subject: C compilers code generation
Summary: Lattice/Aztec code generation still needs improvement
Keywords: lattice aztec code generation optimization
Message-ID: <1149@teslab.lab.OZ>
Date: 8 Nov 90 06:59:05 GMT
Reply-To: andrew@teslab.lab.oz.au (Andrew Phillips)
Organization: Technology Evaluation Section, L.A.B., Sydney
Lines: 66

Over the years I have been intrigued by the code generated by
different C compilers, and have been comparing Lattice C code with
Aztec C.  From the first it always seemed that Lattice performed more
optimizations but that Aztec did better simply because of better code
generation.  Nowadays, they seem to be much closer, producing
reasonable code with simple optimizations - but there is a lot of
room for improvement.

Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02
and PDC 3.34 using several benchmarks.  On disassembling the
innermost loop of the sieve of Eratosthenes I found that the four
compilers had generated the code shown below.

The C code for this loop was:

    register short i, k;
    ...

        for (k = i + i; k <= 8190; k += i)
            flags[k] = 0;

In the assembler code below the first part is the loop initialization
(k = i + i) and the names I and K represent the data registers
corresponding to the variables i and k.  Interestingly Lattice and
Aztec took the same time in the benchmark and generated the same code
for this loop (with all optimizations on).

     LATTICE/AZTEC           DICE                    PDC

     MOVE.W  I,K             MOVE.W  I,D0            EXT.L   I
     ADD.W   I,K             EXT.L   D0              EXT.L   I
                             MOVE,W  K,D1            MOVE.L  I,D0
                             EXT.L   D1              ADD.L   I,D0
                             ADD.L   D0,D1           MOVE.L  D0,K
                             MOVE.W  D1,D3
     BRA.B   IN              BRA.B   IN

LOOP LEA     f(A4),A0   LOOP LEA     f(A4),A0   LOOP CMPI.L  #8190,K
     CLR.B   0(A0,K.W)       ADDA.W  K,A0            BGT.B   OUT
                             MOVE.B  #0,(A0)         LEA     f(A4),A0
     ADD.W   I,K             ADD.W   I,K             ADDA.L  K,A0
IN   CMPI.W  #8190,K    IN   CMPI.W  #8190,K         CLR.B   (A0)
     BLE.B   LOOP            BLE.B   LOOP            ADD.L   I,K
                                                     BRA.B   LOOP
                                                OUT  ...

I calculated the total 68000 clock cycles for the inner loop
(excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and
PDC 64.  These correspond roughly to the ratios of run times that I
got when timing the whole program.

Even with all optimizations on, both Lattice and Aztec left the first
instruction of the loop inside the loop depite the fact that it is
"loop invariant".  They also seem to make poor use of the available
registers.

It is interesting to note that PDC appears to treat shorts as 32 bit
quantities, like ints and longs.  It also seems that BOTH of the
lines with "EXT.L I" are redundant as I is already 32 bits.

So I think Lattice and Aztec still have work to do.  I hope someone
finds this of interest.

Andrew.
-- 
Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712