Path: utzoo!attcan!uunet!snorkelwacker!usc!zaphod.mps.ohio-state.edu!rpi!leah!bingvaxu!kym
From: kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell)
Newsgroups: comp.arch
Subject: Re: int x int -> long for * (or is it 32x32->64)
Keywords: arithmetic,arbitrary precision,benchmark,modular arithmetic
Message-ID: <4002@bingvaxu.cc.binghamton.edu>
Date: 13 Sep 90 16:46:05 GMT
References: <3984@bingvaxu.cc.binghamton.edu> <41425@mips.mips.COM> <353@kaos.MATH.UCLA.EDU> <119977@linus.mitre.org>
Reply-To: kym@bingvaxu.cc.binghamton.edu.cc.binghamton.edu (R. Kym Horsell)
Organization: SUNY Binghamton, NY
Lines: 35

In article <119977@linus.mitre.org> bs@linus.mitre.org (Robert D. Silverman) writes:
\\\
>Even Peter reports a 5 fold speed increase. One difference between his
		      ^^^^^^
>code and mine that would tend to exaggerate the difference is that he
\\\

#	       SUN 3/280	     SUN 4/260	   (both with FPUs)
#      gcc 1.37.1     cc     gcc 1.37.1	    cc
# MUL1	    10.40  13.52           3.67	  2.92     (simple integer arithmetic)
# MUL2	     9.00  10.38           2.03   2.05	   (floating point arithmetic)
# MUL3	    10.14  11.30           6.11	  6.32     (break into 16-bit pieces)
# MUL4	     1.88   1.98 	   1.82	  1.83     (assembly code)

Ratio of MUL3/MUL1:

		.97 .83 		1.7 2.2

There does not seem to be a 5-fold difference between using ``simple integer
arithmetic'' and ``break into 16-bit pieces'' in Peter's figures.

Of course the ratio between ``simple integer arithmetic'' and
``assemby code'' are much higher -- this would be comparing apples & oranges. 

Since Peter various assembly sources to hand -- I would be
interested to see figures concerning:
	32x32->64 multiply
and
	32x32->32 via 16-bit pieces multiply

Another experiment, still in assembler, would be to get figures for
the same program using a non-naive multiply and divide in which
the lack of 32x32->64 would be even less marked.

-Kym Horsell