Xref: utzoo rec.games.programmer:3408 comp.os.msdos.programmer:4626 Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!batcomputer!cornell!rochester!pt.cs.cmu.edu!o.gp.cs.cmu.edu!netnews From: Ralf.Brown@B.GP.CS.CMU.EDU Newsgroups: rec.games.programmer,comp.os.msdos.programmer Subject: Re: 3D int/float optimizations stuff Message-ID: <280706ef@ralf> Date: 13 Apr 91 13:26:07 GMT Sender: netnews@cs.cmu.edu (USENET News Group Software) Organization: Carnegie Mellon University School of Computer Science Lines: 123 In-Reply-To: <28002@uflorida.cis.ufl.EDU> In article <28002@uflorida.cis.ufl.EDU>, jdb@reef.cis.ufl.edu (Brian K. W. Hook) wrote: }Thanks to everyone who helped with the optimizations. For those }interested, I am posting the results of each optimization followed by the }final source code. } }Summary: WOW! } }First pass: 21.86 seconds }Last pass: 12.20 seconds } }I am sure that a couple of optimizations can still be done, most obviously }those bit shifts (although I really doubt they matter much). I am not sure }how accurate these calculations are, but I do know that they don't distort I don't remember which compiler you said you are using, but if it is a 16-bit compiler (such as MSC, Zortech, or Turbo), then both multiplies and shifts on longs make calls to the runtime library. As I recall, you said the function originally used 90% of the execution time; with the following assembler version of the function, you should get your execution time down to under six seconds. Note that I've rearranged the order of calculations somewhat, that it could be optimized further by using SI and DI as temporaries to avoid memory accesses, and that you will have to supply the necessary wrapper for calling from your C code. You can also get better precision by scaling the sine and cosine factors by 16384 (14 bits), since you only need a range of -1..+1 (which would be -16384..16384 after scaling); in that case, change all the 1024s to 16384. xa dw ? ; note that these are ints instead of longs! ya dw ? za dw ? neg WX mov ax,yawCosFactor imul WZ mov cx,dx mov bx,ax mov ax,yawSinFactor imul WX sub ax,bx sbb dx,cx mov cx,1024 idiv cx ; faster than a loop! mov za,ax mov ax,yawSinFactor imul WZ mov cx,dx mov bx,ax mov ax,yawCosFactor imul WX sub ax,bx sbb dx,cx mov cx,1024 idiv cx ; faster than a loop! mov xa,ax imul rollCosFactor mov cx,dx mov bx,ax mov ax,rollSinFactor imul WY add ax,bx adc dx,cx mov cx,1024 idiv cx add ax,MX mov WX,ax mov ax,xa imul pitchSinFactor mov bx,ax mov cx,dx mov ax,za imul pitchCosFactor sub ax,bx sbb dx,cx mov cx,1024 idiv cx mov ya,ax mov ax,za imul pitchSinFactor mov cx,dx mov bx,ax mov ax,ya imul pitchCosFactor add ax,bx adc dx,cx mov cx,1024 idiv cx add ax,MY mov WY,ax mov ax,ya imul pitchSinFactor mov cx,dx mov bx,ax mov ax,za imul pitchCosFactor sub ax,bx sbb dx,cx mov cx,1024 idiv cx add ax,MZ jnz l_1 dec ax l_1: mov cx,ax ; WZ doesn't need to be stored in memory mov ax,word ptr AngularPerspFactor mov dx,word ptr AngularPerspFactor+2 idiv cx ; APF / WZ mov cx,ax ; store a copy for later imul WX add ax,400 ; tmp*WX+400 mov _DX,ax mov ax,WY mul cx add ax,300 ; tmp*WY+300 mov _DY,ax -- {backbone}!cs.cmu.edu!ralf ARPA: RALF@CS.CMU.EDU FIDO: Ralf Brown 1:129/3.1 BITnet: RALF%CS.CMU.EDU@CMUCCVMA AT&Tnet: (412)268-3053 (school) FAX: ask DISCLAIMER? Did | It isn't what we don't know that gives us trouble, it's I claim something?| what we know that ain't so. --Will Rogers