Newsgroups: comp.sys.amiga.programmer Path: utzoo!utgpu!watserv1!watdragon!rose!ccplumb From: ccplumb@rose.uwaterloo.ca (Colin Plumb) Subject: Re: Mike Farren tutorial Message-ID: <1991Apr7.000920.25630@watdragon.waterloo.edu> Sender: news@watdragon.waterloo.edu (News Owner) Organization: University of Waterloo References: Date: Sun, 7 Apr 1991 00:09:20 GMT Lines: 50 dillon@overload.Berkeley.CA.US (Matthew Dillon) wrote: > The 1.3 OS was compiled with greenhills, I believe, which is a pretty > good compiler. Would you rather the OS not have come out at all? Do > you know how many YEARS it would take to write all that stuff in > hand assembly? Much less debug it and enhance it. Not to detract from your point, but I stepped through some of the graphics.library code the other day and the quality makes me nauseous. One particularly memorable part in ScrollVPort: moveq #0,d0 move.w offset(a2),d0 ; ViewPort->Modes, I recall... move.l d0,d1 moveq #0,d0 move.w offset2(a2),d0 (play with d1 a bit) move.w d1,offset3(a3) Now, if the cast to 32 bits is in the source code, I admit it's a bit hard for a compiler to look and see that it's not necessary. But if it's not, it's pretty inexcusable to extend everything to 32 bits. And extending in d0 and then moving to d1 so you can clobber d0 again... well, I'm disgusted. > 28% is nothing compared to finding an algorithm that gives you an order > of magnitude better performance, and the chance of finding such an > algorithm is incredibly high when you have the time to think about it. A friend has a game in development. It's written in assembler. One part is too slow - it can't scroll smoothly on a vanilla 1000. So I'm rewriting the main loop for speed, in C. I'm just being careful not to do unnecessary work. More efficient data structures for keeping track of which enemies are off-screen, postponing or eliminating status display updates, that sort of thing. Assembler has its uses - 60% of the frame time is spent in two inner loops that I first came up with fiendishly clever algorithms for, debugged in C, then converted to assembler and spent weeks removing every last ounce of fat from. I'm very confident that, other than unrolling them even more, there are no spare cycles anywhere in the implementation, but more importantly, while one loop is fairly straightforward (it's only two-dimensional forward differencing and a bunch of rendering bit-bashing), the other involves rather non-obvious projective geometry and I'm absolutely positive that the entire cracker population of West Germany couldn't do the same thing at half the speed before the year 2000. Small-scale optimisations like assembler hacks are useful, but algorithms are where you get order-of-magnitude speedups. -- -Colin