Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!rutgers!sun-barr!cs.utexas.edu!samsung!usc!snorkelwacker!bloom-beacon!eru!hagbard!sunic!mcsun!ukc!acorn!abccam!pete From: pete@abccam.abcl.co.uk (Peter Cockerell) Newsgroups: comp.arch Subject: Re: benchmark for evaluating extended precision Keywords: extended precision,multiply,benchmark,arithmetic Message-ID: <513@abccam.abcl.co.uk> Date: 14 Sep 90 15:03:26 GMT References: <3989@bingvaxu.cc.binghamton.edu> Organization: Active Book Company Limited, Cambridge, UK Lines: 58 In article <3989@bingvaxu.cc.binghamton.edu>, vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: > > After a number of private communications, I've managed to render > one of the little benchmarks I have presentable enough to post, > along with some performance figures from different machines > (basically, its put up or shut up time). [load of stuff deleted] > P.S. The benchmark was _deliberately_ kept rather simple; I > wanted to measure the performance of _multiply_ in > the context of basic extended precision arithmetic, not > the memory or i/o subsystems. The results of the 'benchmark' when run on my ARM (Acorn Risc Machine) system running BSD 4.3 are: DOUBLE defined 3.6 DOUBLE not defined 8.9 Ratio 2.5 All this seems to be telling me is that the conversions required to used the ARM's 32*32->32 instruction to perform 8*8->16 arithmetic are more onerous (and so slower) than those required to do 16*16->32. (Short<->int on the ARM requires masking and/or shifting; there are no explicit conversion instructions, so a C int=short*short compiles to MOV R0, R0, LSL #16 ;Sign extend RHS MOV R0, R0, ASR #16 MOV R1, R1, LSL #16 ;Sign extend RHS MOV R1, R1, ASR #16 MUL R2, R0, R1 ;Do the multiply Similarly, a short=char*char is MOV R0, R0, LSL #24 ;Sign extend RHS MOV R0, R0, ASR #24 MOV R1, R1, LSL #24 ;Sign extend RHS MOV R1, R1, ASR #24 MUL R2, R0, R1 ;Do the mul MOV R2, R2, LSL #16 ;Convert to short MOV R2, R2, ASR #16 In comparison, int=int*int compiles to MUL R2, R0, R1 The benchmark time for the case when LONG and SHORT are both defined to be int (ie the natural length for the processor) is 0.4s! Or am I missing something...?