Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!hao!ames!ll-xn!cit-vax!newton
From: newton@cit-vax.Caltech.Edu (Mike Newton)
Newsgroups: comp.sys.mac
Subject: Mac C Compilers, Benchmarks, Stupidity
Message-ID: <3560@cit-vax.Caltech.Edu>
Date: Tue, 11-Aug-87 06:08:32 EDT
Article-I.D.: cit-vax.3560
Posted: Tue Aug 11 06:08:32 1987
Date-Received: Wed, 12-Aug-87 07:17:25 EDT
Reply-To: newton@cit-vax.UUCP (Mike Newton)
Distribution: world
Organization: California Institute of Technology
Lines: 294


Hi --

This (rather long message) will hopefully save a fair number of people some
money when buying compilers.  It is also a rather strong flame against
current Mac compilers.  I suspect this is largely a result of the
market-place.  With 10 times as many customers buying %*&*&&^%& 8xxx86
systems, a lot more time and effort goes into producing more competitive
80(*&^*&^%$^%)86 compilers.

SUMMARY (for those that dont want to read the whole message): If your'e
going to get A/UX for your mac II, DONT buy a compiler ... get a copy of
Gnu CC and send a contribution.  For those planning to run native Mac OS,
bitch to Apple and others about the state of the compilers.  You are
wasting your machine.....  (BTW: I have heard that Apple distributes
a C compiler with A/UX, but the compiler that they used for all of their
work (the Green Hills compiler) cost much extra).  

Probably like a lot of Mac II buyers, when I saw the latest issue of Byte,
I was very disappointed.  The article causing this disapointment was the
one comparing the Mac II vs. the 80386 based PS2/80. First I was
disappointed in the article -- I could not tell which compilers were being
used (I may have just not read the article carefully).  From a lot of
experience programming the 8086 and the 68020, I was shocked.  The 68020
__should__ be a faster system, and like a lot of these tests, this seemed
to be more of a comparison of compilers than machines.  (I can provide a
couple of references (some good, some bad) on this.  One of them is an IEEE
article.)

My first reaction was that there might be some mistake.  So, I ported the
Dhrystone benchmark from Unix to the mac (ie: I changed the calls to the
timer routines and nothing else), and compiled it under MPW C (the Green
Hills Compiler).  At least the MPW compiler produced better code than the
compiler used in the Byte article.  The benchmark clocked at 2777
Dhrystones (faster than the Sun-3/52 with the Sun 3.2 cc -O!).

However, this was still 10-20 percent slower than the ps2/80.  I couldnt
believe this, so I went and disassembled the compiled code....

      ---> NO WONDER THE PS/2 GETS BETTER TIMINGS THAN THE MAC II. <---
               --->    THE COMPILER PRODUCES SHITTY CODE <---

(and it is the best one currently available.  I hate to see what the others
are like!!!!!!!!!!!!!!)

Before I go on, some disclaimers, comments ...:
	[a] I am a compiler writer myself.  I'm currently working on a
		peephole optimization paper, and have written the code
		generator and run time system  for the fastest running
		version of Prolog.
	[b] I know some of the Green Hills people, and am not particularly
		fond of them.  I pick on their compiler,  but currently
		their compiler produces the best code of any.
	[c] I'm thinking of writing my own C compiler for the Mac someday.
		Unlikely to ever occur,  but . . .
	[d] I DO plan on writing some optimizers.
	[e] This code was compiled on release 2.0B (i think)
	[f] I FUCKING HATE IT WHEN THE COMPILERS WONT GIVE YOU ASSEMBLY, BUT
		INSTEAD GO STRAIGHT TO OBJECT CODE.  (MPW may have an option
		to do this, but it was NOT listed in the documentation that
		I had access to.).
	[g] This message was done after a long day.  There is easily the
		possibility that one or two of my samples below are wrong.
		However, that still leaves MANY!
	[h] I'll send the full disassembled code to anyone that asks and
		that I can get my mailer to send to...
	[i] ALL OF THE SAMPLES SHOWN BELOW COULD BE DETECTED BY A PEEPHOLE
		OPTIMZER.  MORE GLOBAL THINGS ARE HARD TO POINT OUT AND
		PRODUCE EVEN MORE DRAMATIC EFFECTS ON CODE SPEED IF DONE RIGHT.
	[j] It's far easier to point out problems with other peoples
		compilers than to actually write one yourself.
	[k] I havent included the fact that 68020 code was not being produced.
	[l] I hate 8086s.  I programmed them for a year.


So, using tests done on Suns and Macs, I concluded that Gnu CC produced
much better code than Green Hills (or LSC or any of the other current MAC
compilers), and that it was also better than the Sun 'cc'.

In particluar, it really seems as if there was no peephole optimizer when
the following instruction is generated: (the condition codes it sets were
NOT used...):

528: 1000 MOVE.B D0,D0 ; <<--- STUPID !!!!!!!!!!!!!!!!

Anyway, the 'appendix' contains the gory details for anyone that want proof.


- mike


ps: At one of the places that I consult, I had a chance to look at the
Clipper code produced by another version of their compiler.  It showed MANY
of the same problems.  Considering the price of GH compilers, if I were
Apple or Fairchild, I'd feel a little cheated.


Now, some examples from a disassembled copy of the Dhrystone program


;;; _proc0:

. . .
01C: 4EBA 0572     JSR      *+$0574   ; 00590  ; ReadDateTime()
020: 301F          MOVE.W   (A7)+,D0 ; <-- STUPID  (see comments 10 lines below)
022: 48C0          EXT.L    D0   ; <-- STUPID  (see comments 10 lines below)
024: 7A00          MOVEQ    #$00,D5                 ; i = 0 in LOOP
026: 6002          BRA.S    *+$4   ; 2A  ; <-- STUPID, Branch 1 more
028: 5285          ADDQ.L   #$1,D5                  ; i < 500000
02A: 0C85 0000 C350  CMPI.L   #$0000C350,D5
030: 6500 FFF6       BCS      *-$0008   ; 00028  ; no, -- loop
034: 558F            SUBQ.L   #$2,A7
036: 486E FFEC       PEA      $FFEC(A6)
03A: 4EBA 0554       JSR      *+$0556   ; 00590  ; ReadDateTime
03E: 301F            MOVE.W   (A7)+,D0  ; <-- STUPID   Since we dont look at the
040: 48C0            EXT.L    D0  ; <-- STUPID   return value, why do this
042: 202E FFE8       MOVE.L   $FFE8(A6),D0 ; when we are going to overwrite IT!!!!
046: 91AE FFEC       SUB.L    D0,$FFEC(A6)  ; nulltime - nulltime - startime
04A: 4878 002A       PEA      $002A
04E: 4EBA 0ACA       JSR      *+$0ACC   ; 00B1A ; malloc
052: 2B40 F68C       MOVE.L   D0,$F68C(A5)
056: 4878 002A       PEA      $002A
05A: 4EBA 0ABE       JSR      *+$0AC0   ; 00B1A ; malloc
05E: 2B40 F688       MOVE.L   D0,$F688(A5)  ; PtrGlb = (RecordPtr)malloc(...)
062: 206D F688       MOVEA.L  $F688(A5),A0  ; <-- STUPID just move D0 to A0

. . .


08E: 4868 000A       PEA      $000A(A0)
092: 4EBA 0D32       JSR      *+$0D34   ; 00DC6  ; <-- STRCPY is a proc call


. . .

0AE: 4EBA 04E0       JSR      *+$04E2   ; 00590 ; ReadDateTim
0B2: 301F            MOVE.W   (A7)+,D0               ; <-- STUPID (see above)
0B4: 48C0            EXT.L    D0                     ; <-- STUPID

. . .

0D0: 2D48 FFFC       MOVE.L   A0,$FFFC(A6)
0D4: 4FEF 0018       LEA      $0018(A7),A7
0D8: 6000 0130       BRA      *+$0132   ; 0020A ; <-- STUPID (branch to end
                                                        ; end of loop, event though
                                                        ; compiler can detect not to.
0DC: 4EBA 0294       JSR      *+$0296   ; 00372 ; Proc5()
0E0: 4EBA 0278       JSR      *+$027A   ; 0035A ; Proc4()
0E4: 7402            MOVEQ    #$02,D2                ; IntLoc1 = 2;
0E6: 2D42 FFE0       MOVE.L   D2,$FFE0(A6)
001CE: 2D42 FFE4        MOVE.L   D2,$FFE4(A6)

. . .

1D2: 222E FFE0       MOVE.L   $FFE0(A6),D1  ; This only affects a register so this:
1D6: 202E FFE4       MOVE.L   $FFE4(A6),D0  ; <-- STUPID (!) since we KNOW it is in D2
1DA: 4EBA 078E       JSR      *+$0790   ; 0096A
1DE: 2D40 FFF6       MOVE.L   D0,$FFF6(A6)
1E2: 242E FFE4       MOVE.L   $FFE4(A6),D2  ; <-- STUPID (!) (see above)
1E6: 94AE FFF6       SUB.L    $FFF6(A6),D2

. . .


;;; _proc1

26E: 2F0A            MOVE.L   A2,-(A7)
270: 246F 0008       MOVEA.L  $0008(A7),A2        ; structassign(NextRec,*PtrGlb)
274: 2052            MOVEA.L  (A2),A0
276: 226D F688       MOVEA.L  $F688(A5),A1
27A: 7014            MOVEQ    #$14,D0             ; This should be 7 so that
27C: 30D9            MOVE.W   (A1)+,(A0)+         ; <-- STUPID this could be 
27E: 51C8 FFFC       DBF      D0,*-$0002  ; 0027C  ; 32bit moves
282: 7005            MOVEQ    #$05,D0
284: 2540 0006       MOVE.L   D0,$0006(A2)
288: 2052            MOVEA.L  (A2),A0
28A: 216A 0006 0006  MOVE.L   $0006(A2),$0006(A0) ; <-- STUPID (!) previous stmt cant
                                             ; affect memory, so: move.l d0,$6(a0) !!
290: 2052            MOVEA.L  (A2),A0 ; NexRecord.PtrComp = PtrParIn->PtrComp
292: 2092            MOVE.L   (A2),(A0) ; A good compiler (but NOT a peephole analyser)
                             ; could get rid of the next line!!!!!!!!!
294: 2052            MOVEA.L  (A2),A0 ; Proc3(NextRecord.PtrComp);
296: 2F10            MOVE.L   (A0),-(A7)
298: 4EBA 008E       JSR      *+$0090   ; 00328

. . .


2E4: 204A            MOVEA.L  A2,A0
2E6: 7014            MOVEQ    #$14,D0
2E8: 30D9            MOVE.W   (A1)+,(A0)+  ; This could have been 32 bit moves!!
2EA: 51C8 FFFC       DBF      D0,*-$0002  ; 002E8
2EE: 245F            MOVEA.L  (A7)+,A2
2F0: 4E75            RTS      

. . .

;;; _Proc7


3E6: 202F 0004       MOVE.L   $0004(A7),D0
3EA: 222F 0008       MOVE.L   $0008(A7),D1 ; <-- STUPID
3EE: 206F 000C       MOVEA.L  $000C(A7),A0 ; <-- STUPID
3F2: 5480            ADDQ.L   #$2,D0             
3F4: D081            ADD.L    D1,D0        ; <-- ADD.L $08(A7),D0
3F6: 2080            MOVE.L   D0,(A0)      ; <-- MOVE.L D0,$0C(A7)
3F8: 4E75            RTS      
c8()


40A: 2A00            MOVE.L   D0,D5 ; this sequence
40C: 5A85            ADDQ.L   #$5,D5  ; is:
40E: 2005            MOVE.L   D5,D0 ; <-- STUPID STUPID STUPID (see 2 lines above)
410: E580            ASL.L    #$2,D0
412: 2040            MOVEA.L  D0,A0
414: D1C3            ADDA.L   D3,A0
416: 20AF 0024       MOVE.L   $0024(A7),(A0)
41A: 2005            MOVE.L   D5,D0  ; <-- see this
41C: 5280            ADDQ.L   #$1,D0 ; <--
41E: E580            ASL.L    #$2,D0 ; <--
420: 2040            MOVEA.L  D0,A0
422: D1C3            ADDA.L   D3,A0
424: 2005            MOVE.L   D5,D0  ; <-- STUPID
426: E580            ASL.L    #$2,D0 ; <-- COMMON SUBEXPRESSION ELIM
428: 2240            MOVEA.L  D0,A1
42A: D3C3            ADDA.L   D3,A1
42C: 2091            MOVE.L   (A1),(A0)
42E: 2005            MOVE.L   D5,D0
430: 721E            MOVEQ    #$1E,D1  ; <-- STUPID -- just add it to D0 as . . .
432: D081            ADD.L    D1,D0
434: E580            ASL.L    #$2,D0
436: 2040            MOVEA.L  D0,A0
438: D1C3            ADDA.L   D3,A0
43A: 2085            MOVE.L   D5,(A0)
43C: 2C05            MOVE.L   D5,D6
43E: 2005            MOVE.L   D5,D0
440: 2200            MOVE.L   D0,D1  ; . . .this destroys D1 before it is used again
442: 2401            MOVE.L   D1,D2
444: C0FC 00CC       MULU.W   #$00CC,D0

. . .


4A4: 2401            MOVE.L   D1,D2    ; <-- See this??
4A6: C0FC 00CC       MULU.W   #$00CC,D0
4AA: 4841            SWAP     D1
4AC: C2FC 00CC       MULU.W   #$00CC,D1
4B0: 7400            MOVEQ    #$00,D2  ; <-- STUPID LINE ABOVE !! WHY???? 
4B2: D481            ADD.L    D1,D2    ; <-- STUPID 
4B4: 4842            SWAP     D2

. . .


;;; Func1


4D8: 102F 0007       MOVE.B   $0007(A7),D0
4DC: 122F 000B       MOVE.B   $000B(A7),D1
4E0: B001            CMP.B    D1,D0
4E2: 6704            BEQ.S    *+$0006   ; 004E8
4E4: 4201            CLR.B    D1             
4E6: 6002            BRA.S    *+$0004   ; 004EA ; <-- STUPID -- just put 
                                                        ; instructions here
4E8: 7201            MOVEQ    #$01,D1   ; Oh no....
4EA: 7000            MOVEQ    #$00,D0   ; ....
4EC: 1001            MOVE.B   D1,D0     ; <-- STUPID  (THIS IS RIDICULOUS!)
4EE: 4E75            RTS      

. . .

;;; _Func2()

524: 4EBA FFB2       JSR      *-$004C   ; 004D8
528: 1000            MOVE.B   D0,D0   ; <-- STUPID !!!!!!!!!!!!!!!!
52A: 508F            ADDQ.L   #$8,A7


. . .

;;; Func3


57E: 102F 0007       MOVE.B   $0007(A7),D0
582: 0C00 0002       CMPI.B   #$02,D0
586: 6604            BNE.S    *+$0006   ; 0058C
588: 7001            MOVEQ    #$01,D0
58A: 6002            BRA.S    *+$0004   ; 0058E  ; <-- STUPID Branching uncond. to a RTS??
58C: 7000            MOVEQ    #$00,D0
58E: 4E75            RTS      
ke