Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!julius.cs.uiuc.edu!rpi!crdgw1!uunet!overload!dillon From: dillon@overload.Berkeley.CA.US (Matthew Dillon) Newsgroups: comp.sys.amiga.tech Subject: Re: C compilers code generation Message-ID: Date: 12 Nov 90 22:13:10 GMT References: <1149@teslab.lab.OZ> Lines: 67 >In article <1149@teslab.lab.OZ> andrew@teslab.lab.OZ (Andrew Phillips) writes: >Over the years I have been intrigued by the code generated by >different C compilers, and have been comparing Lattice C code with >Aztec C. From the first it always seemed that Lattice performed more >optimizations but that Aztec did better simply because of better code >generation. Nowadays, they seem to be much closer, producing >reasonable code with simple optimizations - but there is a lot of >room for improvement. > >Recently I have been comparing Lattice C 5.04, Aztec C 5.0, DICE 2.02 >and PDC 3.34 using several benchmarks. On disassembling the > ... >for this loop (with all optimizations on). > > LATTICE/AZTEC DICE PDC > > MOVE.W I,K MOVE.W I,D0 EXT.L I > ADD.W I,K EXT.L D0 EXT.L I > MOVE,W K,D1 MOVE.L I,D0 > EXT.L D1 ADD.L I,D0 > ADD.L D0,D1 MOVE.L D0,K > MOVE.W D1,D3 > BRA.B IN BRA.B IN > >LOOP LEA f(A4),A0 LOOP LEA f(A4),A0 LOOP CMPI.L #8190,K > CLR.B 0(A0,K.W) ADDA.W K,A0 BGT.B OUT > MOVE.B #0,(A0) LEA f(A4),A0 > ADD.W I,K ADD.W I,K ADDA.L K,A0 >IN CMPI.W #8190,K IN CMPI.W #8190,K CLR.B (A0) > BLE.B LOOP BLE.B LOOP ADD.L I,K > BRA.B LOOP > OUT ... > >I calculated the total 68000 clock cycles for the inner loop >(excluding initialization) to be: Lattice 48, Aztec 48, DICE 50 and Neat! BTW, DICE now optimizes short adds when the result is also a short, the initialization part of the loop generates: move.w D0,D1 add.w D0,D1 bra IN It also optimizes other arithmatic and logical operations that act entirely on shorts (DICE is a 32bit-int compiler only, BTW. In the above code was Aztec and Lattice run in 32bit-int modes? Probably, but just wondering...). As far as the inner loop goes, I'm kind of proud of DICE in that it does a pretty good job without any real optimization at all. Lattice and Aztec could actually get more speed out of their code if they did not use CLR. CLR always reads the location before writing a 0. PDC looks like it needs a lot of work. >Andrew. >-- >Andrew Phillips (andrew@teslab.lab.oz.au) Phone +61 (Aust) 2 (Sydney) 289 8712 -Matt -- Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA