Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!ptimtc!nntp-server.caltech.edu!toddpw
From: toddpw@nntp-server.caltech.edu (Todd P. Whitesel)
Newsgroups: comp.sys.apple2
Subject: Re: ML subroutines (passing parameters in ML)
Message-ID: <1991Apr27.015822.10369@nntp-server.caltech.edu>
Date: 27 Apr 91 01:58:22 GMT
References: 112122@tut.cis.ohio-state.edu> <13954@ucrmath.ucr.edu> <52084@apple.Apple.COM> <13977@ucrmath.ucr.edu>
Organization: California Institute of Technology, Pasadena
Lines: 28

rhyde@feller.ucr.edu (randy hyde) writes:

>A one cycle penalty on a three cycle instruction is 25%.  You should see
>how hard compilers work to get a 25% performance improvement.
>...
>writing code like the C or Pascal compilers do, there is very little
>benefit to using assembly.  You certainly won't get the 5-10x
>performance boosts I've been talking about.

With Orca/C, you can get about 2x if the code is simple, 4-8x if it uses
lots of arrays or structs. I found that I got 10x improvement on my LZW
decompressor by writing the code in 'smart' assembly, by translating each
group of lines and optimizing the register usage between them, and writing
the critical construction loop very tightly. I got about 2% on top of that
by aligning the direct page, and it took me a while to figure out how to
modify the enter/exit code to properly snap the direct page as well as
make temporary copies of the function arguments in case they were not in
reach of the aligned DP. I think the code is an appropriate example --
it uses lots of 16 bit DP variables -- and I have to conclude that aligning
the direct page wasn't worth the effort in this case.

I am not trying to say that Randy is wrong -- this is the only example I
am really familiar with. It's just that can't think of any _application_
code examples for which the DP alignment would make a significant difference
compared to everything else.

Todd Whitesel
toddpw @ tybalt.caltech.edu