Path: utzoo!attcan!uunet!munnari.oz.au!mel.dit.csiro.au!yarra!pta!teti!teslab!andrew From: andrew@teslab.lab.OZ (Andrew Phillips) Newsgroups: comp.sys.amiga.tech Subject: C compilers code generation Summary: Lattice/Aztec code generation still needs improvement Keywords: lattice aztec code generation optimization Message-ID: <1149@teslab.lab.OZ> Date: 8 Nov 90 06:59:05 GMT Reply-To: andrew@teslab.lab.oz.au (Andrew Phillips) Organization: Technology Evaluation Section, L.A.B., Sydney Lines: 66 Over the years I have been intrigued by the code generated by different C compilers, and have been comparing Lattice C code with Aztec C. From the first it always seemed that Lattice performed more optimizations but that Aztec did better simply because of better code generation. Nowadays, they seem to be much closer, producing reasonable code with simple optimizations - but there is a lot of room for improvement. Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02 and PDC 3.34 using several benchmarks. On disassembling the innermost loop of the sieve of Eratosthenes I found that the four compilers had generated the code shown below. The C code for this loop was: register short i, k; ... for (k = i + i; k <= 8190; k += i) flags[k] = 0; In the assembler code below the first part is the loop initialization (k = i + i) and the names I and K represent the data registers corresponding to the variables i and k. Interestingly Lattice and Aztec took the same time in the benchmark and generated the same code for this loop (with all optimizations on). LATTICE/AZTEC DICE PDC MOVE.W I,K MOVE.W I,D0 EXT.L I ADD.W I,K EXT.L D0 EXT.L I MOVE,W K,D1 MOVE.L I,D0 EXT.L D1 ADD.L I,D0 ADD.L D0,D1 MOVE.L D0,K MOVE.W D1,D3 BRA.B IN BRA.B IN LOOP LEA f(A4),A0 LOOP LEA f(A4),A0 LOOP CMPI.L #8190,K CLR.B 0(A0,K.W) ADDA.W K,A0 BGT.B OUT MOVE.B #0,(A0) LEA f(A4),A0 ADD.W I,K ADD.W I,K ADDA.L K,A0 IN CMPI.W #8190,K IN CMPI.W #8190,K CLR.B (A0) BLE.B LOOP BLE.B LOOP ADD.L I,K BRA.B LOOP OUT ... I calculated the total 68000 clock cycles for the inner loop (excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and PDC 64. These correspond roughly to the ratios of run times that I got when timing the whole program. Even with all optimizations on, both Lattice and Aztec left the first instruction of the loop inside the loop depite the fact that it is "loop invariant". They also seem to make poor use of the available registers. It is interesting to note that PDC appears to treat shorts as 32 bit quantities, like ints and longs. It also seems that BOTH of the lines with "EXT.L I" are redundant as I is already 32 bits. So I think Lattice and Aztec still have work to do. I hope someone finds this of interest. Andrew. -- Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712