Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: benchmark for evaluating extended precision
Summary: Use an appropriate test, using machine instructions
Keywords: extended precision,multiply,benchmark,arithmetic
Message-ID: <2550@l.cc.purdue.edu>
Date: 13 Sep 90 13:05:35 GMT
References: <3989@bingvaxu.cc.binghamton.edu> <1990Sep12.223253.9574@csc.ti.com>
Distribution: usa
Organization: Purdue University Statistics Department
Lines: 31

In article <1990Sep12.223253.9574@csc.ti.com>, bmk@csc.ti.com (Brian M Kennedy) writes:
> =>It has been claimed that a lack of 32x32->64 multiplication
> =>makes a factor of 10 difference in the running time of
> =>typical extended precision arithmetic routines. Although it
> =>obviously makes _a_ difference in run time I do not measure
> =>an order of magnitude difference.

			............................

>                                      Instead I will measure
> an upper-bound on the performance increase by comparing:
> 
>   64*64->64 via 32*32->32      vs.      32*32->32

		[Long description deleted.]

The original problem was 32x32 -> 64 compared to 32x32 -> 32.  To
do a reasonable type of test, consider the general problem of NxN -> 2N
vs. NxN -> N.  Now to do this properly, one should remember that in the
machine with NxN -> N, N is the length available.  Thus, in adding two
N-bit numbere, one must use a test-for-carry to detect a bit in position
N (starting the count from 0).  Also, the comparison should not depend on
the peculiarities of a particular compiler, but should be done at the 
machine-language level.  This is not a long code.

To carry out the benchmark, one could use N = 16 (or even 8) to get a
general idea.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)