Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!mit-eddie!ll-xn!ames!ptsfa!ihnp4!homxb!mtuxo!mtune!codas!usfvax2!pdn!alan From: alan@pdn.UUCP (Alan Lovejoy) Newsgroups: comp.sys.mac Subject: Re: Compiler efficiency Message-ID: <1730@pdn.UUCP> Date: Thu, 5-Nov-87 01:05:12 EST Article-I.D.: pdn.1730 Posted: Thu Nov 5 01:05:12 1987 Date-Received: Sat, 14-Nov-87 05:36:39 EST References: <3987@watdragon.waterloo.edu> <304@fairlight.oz> Reply-To: alan@pdn.UUCP (0000-Alan Lovejoy) Organization: Paradyne Corporation, Largo, Florida Lines: 67 In article <304@fairlight.oz> gary@fairlight.oz (Gary Evesson) writes: >In article palarson@watdragon.waterloo.edu (Paul Larson) writes: > >>I have heard some murmurs (they weren't loud enough to be be termed complaints) >>that some (many? all?) compilers for the mac produce code of questionable >>efficiency. I don't know enough assembly to prove or disprove this statement. >>Is there any merit to these murmurs? You must remember that it's only been in the last year or two that really *good* (by mini-/mainframe standards) compilers for the IBM PC family have become available. The 68000 in general, and the Mac in particular, just haven't been successful enough long enough to attract the critical mass necessary for state-of-the-art compilers to be developed. The failings of most Mac compilers are as follows: 1) Poor register optimization. This is a killer, since the architecture of the 68000 *depends* upon efficient register usage for its performance. Also, most "C" compilers won't keep values in registers unless told specifically by the programmer to use "reg"ister storage for that variable. This is one of the reasons for the Mac II's poorer than expected showing against the '386 machines in the BYTE Benchmarks. 2) Poor use of the available addressing modes. This, too, is a very serious problem. The 68000 has some 14 addressing modes (even the 386 only has 9). There is a tendency to use d8(An, Dn) for iterating over arrays, when (An)+ would be much more efficient. Most compilers won't use 'LEA , An' folowed by repeated references to (An) inside a loop, but instead use inside the loop. The expression 'a = b + c' should be generated as 'move b(A6),D0; add c(A6),D0; move D0,a(A6)' (assuming that a, b and c start out in memory and are not referenced again frequently and/or soon enough to be worth keeping in the registers). But many compilers can't do things quite that efficiently. You'll often see 'move c(A6),D0; move b(A6),D1; add D0,D1; move D1,a(A6)'. I've seen worse. I could go on... 3) Poor run-time data structures and/or techniques. By this I mean things like procedure activation records, stack frames, procedure calling conventions, argument passing techniques, local and/or external procedure calling techniques, local, global and/or external variable reference techniques, inefficient jump tables and failure to use in-line procedure expansion for short procedures (some of these problems are the fault of the C language and/or the Mac's OS, but not most of them). 4) Few compilers make heavy use of the classical and mostly machine-independent optimizations such as constant folding, common sub-expression elimination, copy propogation, dead-code elimination, code-motion, induction variable elimination and strength reduction. 5) I don't know of *any* Mac compilers that attempt multiple generation strategies for the some code block, and pick the best one based on the estimated number of machine cycles each strategy will require for execution. (This has to be done dynamically for an optimizing compiler, because the "code template" for each source language construct is not a static entity when the optimizer is making changes all over the place). I have a Modula-2/68000 compiler that does all these optimizations, but it doesn't run on the Mac. Its Sieve code does 10 iterations in 1.14 seconds on a 12 MHz 68000 (1 wait state). You might expect a time of 1.8 seconds if this code were run on an SE, 0.45 seconds on the Mac II. The best Mac compilers I've seen run around 3 seconds on the SE and around 0.7 seconds on the II for this benchmark. Go back to the BYTE benchmark articles and compare these numbers! --alan@pdn