Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!munnari.oz.au!goanna!ok From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) Newsgroups: comp.arch Subject: Re: benchmark for evaluating extended precision Keywords: extended precision,multiply,benchmark,arithmetic Message-ID: <3794@goanna.cs.rmit.oz.au> Date: 20 Sep 90 08:20:18 GMT References: <3989@bingvaxu.cc.binghamton.edu> <513@abccam.abcl.co.uk> <4037@bingvaxu.cc.binghamton.edu> Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 45 In article <4037@bingvaxu.cc.binghamton.edu>, kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: > The program attempts to perform one of the things that tend > to take time in XP calcs -- big multiplies -- and have adopted the naive Horsell's benchmark had two parts: part 1 calculated a bunch of factorials, then part 2 added and multiplied them. I noticed that the code to add two bignums was wrong ("cry" is one bit too short...), so concentrated on part 1. part 1 does a lot of bigit * bignum multiplies, except that it turns out to use very small bigits. That's why I looked at factorials, because that was part of Horsell's benchmark, and it was possible to use a technique for factorials which _is_ similar to bignum multiplication. > O'Keefe's figure of 4-5 times speedup when _using_ vs _not using_ > an _actual_ double-sized product is important to note. It turned out that we have GCC here. That makes a difference. New figures: Factorial test (NS32532, Encore Multimax 4.3BSD) version mult.size lang User t System Ratio Horsell "8x8 -> 16" cc 311.2 1.9 1.68 Horsell "16x16 -> 32" cc 228.8 2.6 1.23 My code "32x32 -> 64" as 52.6 0.5 1/3.52 Horsell "8x8->16" gcc 264.1 1.3 1.43 Horsell "16x16->32" gcc 185.1 0.8 1.00 My code "32x32->64" as 52.4 0.3 1/3.53 (.aligned) So compared with Horsell's "DOUBLE" version compiled by a rather better compiler, assembly code using 32x32->64 unsigned multiplication gets a 3.5ish speedup. (The .aligned note refers to the trick of aligning branch targets on word boundaries. On machines with variable size instructions that can often help. Here it didn't, not to speak of.) A figure of about 3.5 sounds right: going from 16-bit to 32-bit chunks we expect to do 1/4 the number of multiplications, and those multiplications may be a wee bit slower. It's worth noting that Horsell's representation for bignums is hairy. I just used pointer-to-(length, bigit 0, bigit 1, ...) which amongst other things let me use add-compare-and-branch. -- Heuer's Law: Any feature is a bug unless it can be turned off.