Newsgroups: comp.sys.acorn Path: utzoo!utgpu!watserv1!watdragon!daisy.waterloo.edu!gcwillia From: gcwillia@daisy.waterloo.edu (Graeme Williams) Subject: Re: really FAST integer division Message-ID: <1991Jun28.190758.21677@watdragon.waterloo.edu> Summary: Avg. exec. time < 60 cycles - the fastest div in the West? Sender: gcwillia@daisy.waterloo.edu Organization: University of Waterloo Date: Fri, 28 Jun 1991 19:07:58 GMT Lines: 51 Following several posts regarding division routines, in particular one with a lovely 3 instr. central loop I modified my old division routine into the following (supersonic, lightning fast, ZR rated, :-) ) 32bit routine: ;Essentially what's happening here is we are figuring out how far left ;we can shift the numerator before any actual dividing begins and thus save ;a significant number of useless trips through the (unrolled) 3 instr. loop. ; - Note we only go to the nearest 4 bits, trying to save another 2 bits ; results in 3 extra instrs. for a saving of 6 instrs. half the time ; and 0 instrs. the other half - i.e. no nett benefit. MOV cnt,#28 : MOV mod,num,LSR#4 CMP den,mod,LSR#12 : SUBLE cnt,cnt,#16 : MOVLE mod,mod,LSR#16 CMP den,mod,LSR#4 : SUBLE cnt,cnt,#8 : MOVLE mod,mod,LSR#8 CMP den,mod : SUBLE cnt,cnt,#4 : MOVLE mod,mod,LSR#4 MOV num,num,LSL cnt : RSB den,den,#0 ADDS num,num,num ;Now skip over cnt copies of the 3 instr. loop. ADD cnt,cnt,cnt,LSL#1 : ADD pc,pc,cnt,LSL#2 : MOV R0,R0 {ADCS mod,den,mod,LSL#1 : SUBLO mod,mod,den : ADCS num,num,num} x 32 copies The routine behaves as follows: On Entry: num - is a signed numerator A (but +ve) den - is a signed denominator B (but non-zero and +ve) On Exit: mod - contains A MOD B num - contains A DIV B The routine requires 4 registers and 452 bytes of memory. Execution time: Given numbers A and B (both +ve) chosen at random from a distribution in which the most significant bit of the numbers is uniformly distributed between bits 0 and 31 (i.e. the density of numbers goes as roughly log n) then the average execution time is: For A < B : 32 cycles (no division actually required) For A >= B: 58.25 cycles The best and worst cases are 32 cycles and 116 cycles respectively. Have fun. (Stuff like this ought to be worth good money!) Graeme Williams gcwillia@daisy.waterloo.edu P.S. With any luck there's only a couple of typos :-).