Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!ptimtc!nntp-server.caltech.edu!toddpw From: toddpw@nntp-server.caltech.edu (Todd P. Whitesel) Newsgroups: comp.sys.apple2 Subject: Re: ML subroutines (passing parameters in ML) Message-ID: <1991Apr27.015822.10369@nntp-server.caltech.edu> Date: 27 Apr 91 01:58:22 GMT References: 112122@tut.cis.ohio-state.edu> <13954@ucrmath.ucr.edu> <52084@apple.Apple.COM> <13977@ucrmath.ucr.edu> Organization: California Institute of Technology, Pasadena Lines: 28 rhyde@feller.ucr.edu (randy hyde) writes: >A one cycle penalty on a three cycle instruction is 25%. You should see >how hard compilers work to get a 25% performance improvement. >... >writing code like the C or Pascal compilers do, there is very little >benefit to using assembly. You certainly won't get the 5-10x >performance boosts I've been talking about. With Orca/C, you can get about 2x if the code is simple, 4-8x if it uses lots of arrays or structs. I found that I got 10x improvement on my LZW decompressor by writing the code in 'smart' assembly, by translating each group of lines and optimizing the register usage between them, and writing the critical construction loop very tightly. I got about 2% on top of that by aligning the direct page, and it took me a while to figure out how to modify the enter/exit code to properly snap the direct page as well as make temporary copies of the function arguments in case they were not in reach of the aligned DP. I think the code is an appropriate example -- it uses lots of 16 bit DP variables -- and I have to conclude that aligning the direct page wasn't worth the effort in this case. I am not trying to say that Randy is wrong -- this is the only example I am really familiar with. It's just that can't think of any _application_ code examples for which the DP alignment would make a significant difference compared to everything else. Todd Whitesel toddpw @ tybalt.caltech.edu