Path: utzoo!attcan!uunet!snorkelwacker!usc!zaphod.mps.ohio-state.edu!rpi!leah!bingvaxu!kym From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) Newsgroups: comp.arch Subject: Re: int x int -> long for * (or is it 32x32->64) Keywords: arithmetic,arbitrary precision,benchmark,modular arithmetic Message-ID: <4002@bingvaxu.cc.binghamton.edu> Date: 13 Sep 90 16:46:05 GMT References: <3984@bingvaxu.cc.binghamton.edu> <41425@mips.mips.COM> <353@kaos.MATH.UCLA.EDU> <119977@linus.mitre.org> Reply-To: kym@bingvaxu.cc.binghamton.edu.cc.binghamton.edu (R. Kym Horsell) Organization: SUNY Binghamton, NY Lines: 35 In article <119977@linus.mitre.org> bs@linus.mitre.org (Robert D. Silverman) writes: \\\ >Even Peter reports a 5 fold speed increase. One difference between his ^^^^^^ >code and mine that would tend to exaggerate the difference is that he \\\ # SUN 3/280 SUN 4/260 (both with FPUs) # gcc 1.37.1 cc gcc 1.37.1 cc # MUL1 10.40 13.52 3.67 2.92 (simple integer arithmetic) # MUL2 9.00 10.38 2.03 2.05 (floating point arithmetic) # MUL3 10.14 11.30 6.11 6.32 (break into 16-bit pieces) # MUL4 1.88 1.98 1.82 1.83 (assembly code) Ratio of MUL3/MUL1: .97 .83 1.7 2.2 There does not seem to be a 5-fold difference between using ``simple integer arithmetic'' and ``break into 16-bit pieces'' in Peter's figures. Of course the ratio between ``simple integer arithmetic'' and ``assemby code'' are much higher -- this would be comparing apples & oranges. Since Peter various assembly sources to hand -- I would be interested to see figures concerning: 32x32->64 multiply and 32x32->32 via 16-bit pieces multiply Another experiment, still in assembler, would be to get figures for the same program using a non-naive multiply and divide in which the lack of 32x32->64 would be even less marked. -Kym Horsell