Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!usc!jarthur!nntp-server.caltech.edu!toddpw From: toddpw@nntp-server.caltech.edu (Todd P. Whitesel) Newsgroups: comp.sys.apple2 Subject: Re: Computer capabilities Message-ID: <1991Jan4.122840.14246@nntp-server.caltech.edu> Date: 4 Jan 91 12:28:40 GMT References: <325@generic.UUCP> <10827@ucrmath.ucr.edu> Organization: California Institute of Technology, Pasadena Lines: 92 rhyde@ucrmath.ucr.edu (randy hyde) writes: >but for complex tasks HLLs are *way* too slow on the '816. I've always wondered if that's a fact of life or if it's really just something wrong with the way the existing compilers are designed. > It was too slow (ORCA/C) and the version I had >(whatever their second release was) had way too many bugs. The standard I agree, the standard library is annoying. I have considered writing some real replacement routines, but I have projects I want done first and the 1.1 release of the compiler/libraries has been adequate so far. It's funny: every bug in the compiler or the libraries I discovered quickly -- if it took more than an hour it was always a genuine bug of mine. >>> ORCA/C uses the direct page register (as a base pointer).. >(1) Your stack frames are limited to 256 bytes, max (no big arrays!) I've declared char a[3000] as a local variable in Orca/C without any problems. The generated code uses dp,x (if the base of the array is in the dp accessible space) or a generated long pointer (if it isn't). The real problem is the total size of the stack frame (4K default). However, from looking at Orca's output for a simple program like void main (void) { char a[1000], b[1000]; int i; for (i=1000; i--; ) a[i]=b[i]=0; } it is obvious that Mike is using seperate paradigms for accessing nontrivial variables (like arrays outside the 256 byte dp) and evaluating expressions; the result being that Orca/C has an unbelievably complex code generator which produces some really ugly code in many situations that 'show weaknesses of the 65816'. If Orca/C used a FORTH-like paradigm which attempted to keep the top of stack in the CPU registers (not hard), for both evaluation of expressions and nontrivial variable access, it would simplify things a lot. Maybe they would simplify enough so that Mike could get the whole damn compiler working and write a real library!! >(2) You can't use direct page values as "register" variables. Since the frames > are rarely page aligned, accessing such variables costs an extra cycle on > each access. How often does this make a real and practical difference? The theoretical speed hit is 25% under absolute worst conditions (i.e. constant 8 bit dp accesses). If you really want an aligned direct page, it is not that hard to force it down to the nearest alignment and copy the variables over. If the speed increase is actually going to make a difference then the setup/takedown time will be dominated by the code that needed the register variables in the first place. I really don't think it's necessary, because if you can't write a tight assembly loop to do it, then dp alignment isn't going to help you anyway. Now that I'm working in word size operands all the time, I really only notice the speed difference in the most tightly coded assembly loops. An example would be the DiskCopy program -- the asm{} statement that computes the checksum of the disk image data is 57 cycles per word of data, five of which are due to direct page nonalignment. If the dp was aligned the code would be 9.6% faster. If it were really that important, I'd be doing it truly in assembly and not as an asm{} statement and then it wouldn't be an issue any more. >>> [instruction sequence] isn't THAT bad. > This is 10 cycles above and beyond the cost of a return. point taken. >Not to mention six extra bytes (for *each* call). No, for each function. Orca/C uses that kind of sequence to both create and destroy the stack frames on function entry and exit. >For "C" you have no choice but to do this. For Pascal, whose procedures and >functions have a fixed number of parameters) the subroutine can pop the >parameters rather than the caller. Sorry, but C subroutines usually pop the parameters because it's more efficient for them to do so. Variable parameter functions have to "know" how many to pop -- the standard library looks at the format string, but with ANSI C it's been standardized with the va_arg stuff. > Given how universities are pushing "modular" code these days, and the >amount of people writing *tiny* subroutines That's really sad. Has everybody forgotten the original reasons for inline() and parametric defines? They can't be THAT anal retentive about modularity. Todd Whitesel toddpw @ tybalt.caltech.edu