Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!hao!ames!ll-xn!cit-vax!newton From: newton@cit-vax.Caltech.Edu (Mike Newton) Newsgroups: comp.sys.mac Subject: Mac C Compilers, Benchmarks, Stupidity Message-ID: <3560@cit-vax.Caltech.Edu> Date: Tue, 11-Aug-87 06:08:32 EDT Article-I.D.: cit-vax.3560 Posted: Tue Aug 11 06:08:32 1987 Date-Received: Wed, 12-Aug-87 07:17:25 EDT Reply-To: newton@cit-vax.UUCP (Mike Newton) Distribution: world Organization: California Institute of Technology Lines: 294 Hi -- This (rather long message) will hopefully save a fair number of people some money when buying compilers. It is also a rather strong flame against current Mac compilers. I suspect this is largely a result of the market-place. With 10 times as many customers buying %*&*&&^%& 8xxx86 systems, a lot more time and effort goes into producing more competitive 80(*&^*&^%$^%)86 compilers. SUMMARY (for those that dont want to read the whole message): If your'e going to get A/UX for your mac II, DONT buy a compiler ... get a copy of Gnu CC and send a contribution. For those planning to run native Mac OS, bitch to Apple and others about the state of the compilers. You are wasting your machine..... (BTW: I have heard that Apple distributes a C compiler with A/UX, but the compiler that they used for all of their work (the Green Hills compiler) cost much extra). Probably like a lot of Mac II buyers, when I saw the latest issue of Byte, I was very disappointed. The article causing this disapointment was the one comparing the Mac II vs. the 80386 based PS2/80. First I was disappointed in the article -- I could not tell which compilers were being used (I may have just not read the article carefully). From a lot of experience programming the 8086 and the 68020, I was shocked. The 68020 __should__ be a faster system, and like a lot of these tests, this seemed to be more of a comparison of compilers than machines. (I can provide a couple of references (some good, some bad) on this. One of them is an IEEE article.) My first reaction was that there might be some mistake. So, I ported the Dhrystone benchmark from Unix to the mac (ie: I changed the calls to the timer routines and nothing else), and compiled it under MPW C (the Green Hills Compiler). At least the MPW compiler produced better code than the compiler used in the Byte article. The benchmark clocked at 2777 Dhrystones (faster than the Sun-3/52 with the Sun 3.2 cc -O!). However, this was still 10-20 percent slower than the ps2/80. I couldnt believe this, so I went and disassembled the compiled code.... ---> NO WONDER THE PS/2 GETS BETTER TIMINGS THAN THE MAC II. <--- ---> THE COMPILER PRODUCES SHITTY CODE <--- (and it is the best one currently available. I hate to see what the others are like!!!!!!!!!!!!!!) Before I go on, some disclaimers, comments ...: [a] I am a compiler writer myself. I'm currently working on a peephole optimization paper, and have written the code generator and run time system for the fastest running version of Prolog. [b] I know some of the Green Hills people, and am not particularly fond of them. I pick on their compiler, but currently their compiler produces the best code of any. [c] I'm thinking of writing my own C compiler for the Mac someday. Unlikely to ever occur, but . . . [d] I DO plan on writing some optimizers. [e] This code was compiled on release 2.0B (i think) [f] I FUCKING HATE IT WHEN THE COMPILERS WONT GIVE YOU ASSEMBLY, BUT INSTEAD GO STRAIGHT TO OBJECT CODE. (MPW may have an option to do this, but it was NOT listed in the documentation that I had access to.). [g] This message was done after a long day. There is easily the possibility that one or two of my samples below are wrong. However, that still leaves MANY! [h] I'll send the full disassembled code to anyone that asks and that I can get my mailer to send to... [i] ALL OF THE SAMPLES SHOWN BELOW COULD BE DETECTED BY A PEEPHOLE OPTIMZER. MORE GLOBAL THINGS ARE HARD TO POINT OUT AND PRODUCE EVEN MORE DRAMATIC EFFECTS ON CODE SPEED IF DONE RIGHT. [j] It's far easier to point out problems with other peoples compilers than to actually write one yourself. [k] I havent included the fact that 68020 code was not being produced. [l] I hate 8086s. I programmed them for a year. So, using tests done on Suns and Macs, I concluded that Gnu CC produced much better code than Green Hills (or LSC or any of the other current MAC compilers), and that it was also better than the Sun 'cc'. In particluar, it really seems as if there was no peephole optimizer when the following instruction is generated: (the condition codes it sets were NOT used...): 528: 1000 MOVE.B D0,D0 ; <<--- STUPID !!!!!!!!!!!!!!!! Anyway, the 'appendix' contains the gory details for anyone that want proof. - mike ps: At one of the places that I consult, I had a chance to look at the Clipper code produced by another version of their compiler. It showed MANY of the same problems. Considering the price of GH compilers, if I were Apple or Fairchild, I'd feel a little cheated. Now, some examples from a disassembled copy of the Dhrystone program ;;; _proc0: . . . 01C: 4EBA 0572 JSR *+$0574 ; 00590 ; ReadDateTime() 020: 301F MOVE.W (A7)+,D0 ; <-- STUPID (see comments 10 lines below) 022: 48C0 EXT.L D0 ; <-- STUPID (see comments 10 lines below) 024: 7A00 MOVEQ #$00,D5 ; i = 0 in LOOP 026: 6002 BRA.S *+$4 ; 2A ; <-- STUPID, Branch 1 more 028: 5285 ADDQ.L #$1,D5 ; i < 500000 02A: 0C85 0000 C350 CMPI.L #$0000C350,D5 030: 6500 FFF6 BCS *-$0008 ; 00028 ; no, -- loop 034: 558F SUBQ.L #$2,A7 036: 486E FFEC PEA $FFEC(A6) 03A: 4EBA 0554 JSR *+$0556 ; 00590 ; ReadDateTime 03E: 301F MOVE.W (A7)+,D0 ; <-- STUPID Since we dont look at the 040: 48C0 EXT.L D0 ; <-- STUPID return value, why do this 042: 202E FFE8 MOVE.L $FFE8(A6),D0 ; when we are going to overwrite IT!!!! 046: 91AE FFEC SUB.L D0,$FFEC(A6) ; nulltime - nulltime - startime 04A: 4878 002A PEA $002A 04E: 4EBA 0ACA JSR *+$0ACC ; 00B1A ; malloc 052: 2B40 F68C MOVE.L D0,$F68C(A5) 056: 4878 002A PEA $002A 05A: 4EBA 0ABE JSR *+$0AC0 ; 00B1A ; malloc 05E: 2B40 F688 MOVE.L D0,$F688(A5) ; PtrGlb = (RecordPtr)malloc(...) 062: 206D F688 MOVEA.L $F688(A5),A0 ; <-- STUPID just move D0 to A0 . . . 08E: 4868 000A PEA $000A(A0) 092: 4EBA 0D32 JSR *+$0D34 ; 00DC6 ; <-- STRCPY is a proc call . . . 0AE: 4EBA 04E0 JSR *+$04E2 ; 00590 ; ReadDateTim 0B2: 301F MOVE.W (A7)+,D0 ; <-- STUPID (see above) 0B4: 48C0 EXT.L D0 ; <-- STUPID . . . 0D0: 2D48 FFFC MOVE.L A0,$FFFC(A6) 0D4: 4FEF 0018 LEA $0018(A7),A7 0D8: 6000 0130 BRA *+$0132 ; 0020A ; <-- STUPID (branch to end ; end of loop, event though ; compiler can detect not to. 0DC: 4EBA 0294 JSR *+$0296 ; 00372 ; Proc5() 0E0: 4EBA 0278 JSR *+$027A ; 0035A ; Proc4() 0E4: 7402 MOVEQ #$02,D2 ; IntLoc1 = 2; 0E6: 2D42 FFE0 MOVE.L D2,$FFE0(A6) 001CE: 2D42 FFE4 MOVE.L D2,$FFE4(A6) . . . 1D2: 222E FFE0 MOVE.L $FFE0(A6),D1 ; This only affects a register so this: 1D6: 202E FFE4 MOVE.L $FFE4(A6),D0 ; <-- STUPID (!) since we KNOW it is in D2 1DA: 4EBA 078E JSR *+$0790 ; 0096A 1DE: 2D40 FFF6 MOVE.L D0,$FFF6(A6) 1E2: 242E FFE4 MOVE.L $FFE4(A6),D2 ; <-- STUPID (!) (see above) 1E6: 94AE FFF6 SUB.L $FFF6(A6),D2 . . . ;;; _proc1 26E: 2F0A MOVE.L A2,-(A7) 270: 246F 0008 MOVEA.L $0008(A7),A2 ; structassign(NextRec,*PtrGlb) 274: 2052 MOVEA.L (A2),A0 276: 226D F688 MOVEA.L $F688(A5),A1 27A: 7014 MOVEQ #$14,D0 ; This should be 7 so that 27C: 30D9 MOVE.W (A1)+,(A0)+ ; <-- STUPID this could be 27E: 51C8 FFFC DBF D0,*-$0002 ; 0027C ; 32bit moves 282: 7005 MOVEQ #$05,D0 284: 2540 0006 MOVE.L D0,$0006(A2) 288: 2052 MOVEA.L (A2),A0 28A: 216A 0006 0006 MOVE.L $0006(A2),$0006(A0) ; <-- STUPID (!) previous stmt cant ; affect memory, so: move.l d0,$6(a0) !! 290: 2052 MOVEA.L (A2),A0 ; NexRecord.PtrComp = PtrParIn->PtrComp 292: 2092 MOVE.L (A2),(A0) ; A good compiler (but NOT a peephole analyser) ; could get rid of the next line!!!!!!!!! 294: 2052 MOVEA.L (A2),A0 ; Proc3(NextRecord.PtrComp); 296: 2F10 MOVE.L (A0),-(A7) 298: 4EBA 008E JSR *+$0090 ; 00328 . . . 2E4: 204A MOVEA.L A2,A0 2E6: 7014 MOVEQ #$14,D0 2E8: 30D9 MOVE.W (A1)+,(A0)+ ; This could have been 32 bit moves!! 2EA: 51C8 FFFC DBF D0,*-$0002 ; 002E8 2EE: 245F MOVEA.L (A7)+,A2 2F0: 4E75 RTS . . . ;;; _Proc7 3E6: 202F 0004 MOVE.L $0004(A7),D0 3EA: 222F 0008 MOVE.L $0008(A7),D1 ; <-- STUPID 3EE: 206F 000C MOVEA.L $000C(A7),A0 ; <-- STUPID 3F2: 5480 ADDQ.L #$2,D0 3F4: D081 ADD.L D1,D0 ; <-- ADD.L $08(A7),D0 3F6: 2080 MOVE.L D0,(A0) ; <-- MOVE.L D0,$0C(A7) 3F8: 4E75 RTS c8() 40A: 2A00 MOVE.L D0,D5 ; this sequence 40C: 5A85 ADDQ.L #$5,D5 ; is: 40E: 2005 MOVE.L D5,D0 ; <-- STUPID STUPID STUPID (see 2 lines above) 410: E580 ASL.L #$2,D0 412: 2040 MOVEA.L D0,A0 414: D1C3 ADDA.L D3,A0 416: 20AF 0024 MOVE.L $0024(A7),(A0) 41A: 2005 MOVE.L D5,D0 ; <-- see this 41C: 5280 ADDQ.L #$1,D0 ; <-- 41E: E580 ASL.L #$2,D0 ; <-- 420: 2040 MOVEA.L D0,A0 422: D1C3 ADDA.L D3,A0 424: 2005 MOVE.L D5,D0 ; <-- STUPID 426: E580 ASL.L #$2,D0 ; <-- COMMON SUBEXPRESSION ELIM 428: 2240 MOVEA.L D0,A1 42A: D3C3 ADDA.L D3,A1 42C: 2091 MOVE.L (A1),(A0) 42E: 2005 MOVE.L D5,D0 430: 721E MOVEQ #$1E,D1 ; <-- STUPID -- just add it to D0 as . . . 432: D081 ADD.L D1,D0 434: E580 ASL.L #$2,D0 436: 2040 MOVEA.L D0,A0 438: D1C3 ADDA.L D3,A0 43A: 2085 MOVE.L D5,(A0) 43C: 2C05 MOVE.L D5,D6 43E: 2005 MOVE.L D5,D0 440: 2200 MOVE.L D0,D1 ; . . .this destroys D1 before it is used again 442: 2401 MOVE.L D1,D2 444: C0FC 00CC MULU.W #$00CC,D0 . . . 4A4: 2401 MOVE.L D1,D2 ; <-- See this?? 4A6: C0FC 00CC MULU.W #$00CC,D0 4AA: 4841 SWAP D1 4AC: C2FC 00CC MULU.W #$00CC,D1 4B0: 7400 MOVEQ #$00,D2 ; <-- STUPID LINE ABOVE !! WHY???? 4B2: D481 ADD.L D1,D2 ; <-- STUPID 4B4: 4842 SWAP D2 . . . ;;; Func1 4D8: 102F 0007 MOVE.B $0007(A7),D0 4DC: 122F 000B MOVE.B $000B(A7),D1 4E0: B001 CMP.B D1,D0 4E2: 6704 BEQ.S *+$0006 ; 004E8 4E4: 4201 CLR.B D1 4E6: 6002 BRA.S *+$0004 ; 004EA ; <-- STUPID -- just put ; instructions here 4E8: 7201 MOVEQ #$01,D1 ; Oh no.... 4EA: 7000 MOVEQ #$00,D0 ; .... 4EC: 1001 MOVE.B D1,D0 ; <-- STUPID (THIS IS RIDICULOUS!) 4EE: 4E75 RTS . . . ;;; _Func2() 524: 4EBA FFB2 JSR *-$004C ; 004D8 528: 1000 MOVE.B D0,D0 ; <-- STUPID !!!!!!!!!!!!!!!! 52A: 508F ADDQ.L #$8,A7 . . . ;;; Func3 57E: 102F 0007 MOVE.B $0007(A7),D0 582: 0C00 0002 CMPI.B #$02,D0 586: 6604 BNE.S *+$0006 ; 0058C 588: 7001 MOVEQ #$01,D0 58A: 6002 BRA.S *+$0004 ; 0058E ; <-- STUPID Branching uncond. to a RTS?? 58C: 7000 MOVEQ #$00,D0 58E: 4E75 RTS ke