Path: utzoo!attcan!cmtl01!matrox!uvm-gen!uunet!lll-winken!ames!pasteur!ucbvax!decwrl!decvax!ima!haddock!suitti
From: suitti@haddock.ima.isc.com (Stephen Uitti)
Newsgroups: comp.sys.mac.programmer
Subject: Re: Code generation in LSC
Message-ID: <11536@haddock.ima.isc.com>
Date: 25 Jan 89 20:59:37 GMT
References: <1112@dogie.edu> <6337@hoptoad.uucp> <5623@phoenix.Princeton.EDU> <11494@haddock.ima.isc.com> <5801@phoenix.Princeton.EDU>
Reply-To: suitti@haddock.ima.isc.com (Stephen Uitti)
Organization: Interactive Systems, Boston
Lines: 149

In article <5801@phoenix.Princeton.EDU> mbkennel@phoenix.Princeton.EDU (Matthew B. Kennel) writes:
>In article <11494@haddock.ima.isc.com> suitti@haddock.ima.isc.com (Stephen Uitti) writes:
>>
>>Some time ago, I ran some C benchmarks on a Mac II & Sun III.
>>...The Mac tended to have run times that
>>were very close to those of the Sun, with some run times actually
>>faster than the Sun.
>
>Hmm. This is surprising.  What kind of benchmarks did you use?
>How did you time them? 
	The benchmarks were non-floating point.  The sieve, for
example, was faster on the Mac.  Embedded time calls were used.  Wall
clock timing, etc., was used to make sure things were being reported
at least approximately correctly.  All runs were greater than 30
seconds.

>>  "Even on microcomputers" is incorrect.
>>The compilers for PCs and Macs are MUCH better than for larger
>>machines.  It even makes sense.  There is more money in it.
>
>In terms of compile time and overall convenience, undoubtedly.
>Code generation?  No.
	PCC based compilers are still far more common than GNU C for
UNIX.  MSC (for the PC) claims to do all sorts of interesting things.
Global registers for the 8086 are probably a loss, since there aren't
lots of registers...  I find that Turbo C produces code that is
smaller and about the same speed (see below) as MSC, on the same
machine.  It produces code at least three times faster.
	Of course, "microcomputer" compilers are also closer to
supporting ANSI C (prototypes, etc.).  I say "microcomputer", but my
Mac II is at least 2.5 times faster than a 780 (though an SE will beat
a Mac II in a foot race down the hall).  I really mean "personal" or
"home" computer.

>Just looking at some of my programs with MacsBug, I can see
>many _obvious_ inefficiencies that a even a peephole optimizer could remove:
>
>	MOV.L -(SP), DO
>	MOV	DO, D6
>or reloading the same expression which was already in a register.

Can you see labels with MacsBug?  Could it have been:
	MOV.L -(SP), DO
foo:	MOV	DO, D6

Anyway, I've seen this type of thing fall through (even without
intervening labels) with PCC based compilers optimizers.  There are
sequences where, in the above example, the compiler really did want
the value in both registers...  Anyway, LSC is not worse than large
systems compilers here.

>LSC doesn't do other kinds of optimizations such as turning
>
>for(i=0; i<=num; i++)
>	d += a[i];
>
>into
>
>register int	*p;
>for(p=a; p< a+num; p++)  /* a+num should _not_ be computed in each iteration */
>	d += *p;	 /* of the loop! */	 

	Some compilers will figure out what to stuff into registers
and do loop invariant type stuff (MSC for the PC).  My code specifies
the use of registers, in order of preference, and tends not to have
loop invariants.  Other optimizations are also wasted.  Compilers
which attempt to do this for me tend to be broken (MSC for the PC,
compilers for Cyber 205, IBM RT, etc), meaning that turning off the
optimizer tends to allow my code to work.
	Another odd thing that has come up is that when I have written
code that explicitly removes loop invariants, the optimizers of some
of the "smarter" compilers tend to slow my code down.  It seems that
they allocate additional variables to point into the arrays, and even
update the redundant copies.

>For these trivial examples, it's no big deal, but in complicated
>computationally-intensive programs, all of these types of optimizations
>combine and can be very significant.
	For large programs, using a profiler will allow you to
concentrate on the correct portion of the program.  This will allow
you to use a quick compiler to outperform an optimizing compiler,
generally speaking.

>Note that I generally write scientific programs with lots of loops, &
>arrays and such in which good optimization can make a big difference.
	Floating point?

>I suspect that many Mac applications are of the type
>...
>and so these kind of "global" optimizations aren't so important.
>Intelligent register usage should always be a win, though!

	Mac applications mostly spend their time waiting for the next
event (polling).  However, when they do something, they often want to
do it quickly, as they do not want to appear sluggish.

>>Would gcc be better than LSC?  Well, gcc would never produce code as
>>quickly, gdb would never be as nice as LSC 3.0, you'd need "make",
>>etc.  The code *might* be as fast as LSC's code.
>
>If it were _only_ as fast at LSC, I'd think it were broken! :)
	The main reason it wouldn't be is that a typical LSC
edit/compile/run/debug loop will be an order of magnitude better.
LSC has a (simple) profiler.

>Seriously, I've looked at the output from good optimizing compilers,
>and in general, they do a better job than I (admittedly a non-expert
>assembly programmer) could do without a very large amount of effort.
	I've yet to see an optimizing compiler that did as well as I
would do.  When doing assembler, I count bytes and cycles.  I optimize
register usage.  One gets used to it.  The painful part of assembly
is when you've got code that works & have discovered a neat new way
of doing the problem...  It is often hard enough to get yourself to
redo it when using a higher level language.

>Looking at the output of the MIPSco C compiler, I am completely 
>astounded as to the transformational magic that it manages to perform
>on my programs.  (Then again, BASIC would scream on a MIPS)
	When I look at what PCC does to my code, then the optimizer
undoes, I'm amazed.  Still, optimized code (for me) is easier to
debug... if I'm forced to debug in assembly.

>>  It probably wouldn't
>>be as fast, since one would probably make int's 32 bits.

>Quite true.  But isn't this true only for a 68000, i.e. shouldn't
>a Mac II be as fast w/ 32 bit ints as 16?  (I think that some minis
>are _slower_ accessing 16 bits rather than 32!)
	No.  For whatever reason, 16 bits are considerably quicker.

>NOTE: My direct experience has _ONLY_ been with LSC 2.15.  If things
>have changed significantly, please correct me!  Thanks.

	LSC 3.0 (etc.) supports floating point on the 68881.  When
used, this is a win.  3.0 has a serious debugger.  It is really
awesome when your system has more than one screen (like mine does).

	Summary: I'd prefer a compiler that spits out reasonable code
infinitely fast (such as LSC) to one that spends all day working on
it.  LSC is not as bad as you think.  "Mainframe" optimizing compilers
are not as good as you think.  I believe that the LSC compiler can be
improved, but that they are going about it in the correct manner.

>Matt Kennel
>mbkennel@phoenix.princeton.edu
>"Assume a spherical cow."

	Stephen.
"Everything in moderation.  Including moderation."