Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ll-xn!cit-vax!amdahl!nsc!curry
From: curry@nsc.UUCP (Ray Curry)
Newsgroups: net.arch
Subject: re:Floating point performance
Message-ID: <3833@nsc.UUCP>
Date: Wed, 8-Oct-86 16:53:37 EDT
Article-I.D.: nsc.3833
Posted: Wed Oct  8 16:53:37 1986
Date-Received: Thu, 9-Oct-86 03:33:33 EDT
Reply-To: curry@nsc.UUCP (Ray Curry)
Followup-To: <340@euroies.UUCP>
Distribution: net
Organization: National Semiconductor, Sunnyvale
Lines: 50

>Path: nsc!pyramid!decwrl!decvax!ucbvax!ucbcad!nike!lll-crg!seismo!mcvax!euroies!shepherd
>From: shepherd@euroies.UUCP (Roger Shepherd)
>Newsgroups: net.arch
>Subject: Floating point performance
>Message-ID: <340@euroies.UUCP>
 
>dislikes the NS 32310 (four chips); they seem to give the
>same MFLOP rating. (Does anyone have Whetstone figures for
>these two?)
 
>Comparisons against Weiteks (or whatever) are also somewhat
>suspect.  To use their peak data rate you have to use them in
>pipelined mode, their scalar mode tends to be somewhat slower
 
-- 
>Roger Shepherd
>INMOS Limited, 1000 Aztec West, Almondsbury, Bristol, BS12 4SQ, GB
>USENET: ...!euroies!shepherd
>PHONE: +44 454 616616

Just by coincidence, I have been running some floating point benchmarks
on NS32081 floating point processor and thought I needed to respond
with some more up to date numbers.  I ran the single precision Whetstone 
on the NS32032 and NS32081 at 10MHz on the DB32000 board, and the NS32332
and NS32081 at 15 MHz on the DB332 board.  I don't know where the posted
32032-32081 number came from but I measure better even using our older
compiliers.  Our new compilers show marked improvement.

	32032-32081 (10MHz)		189 Kwhets (old compiler)
	32032-32081 (10MHz)		390 Kwhets (new compiler)
	32332-32081 (15MHz)		728 Kwhets (new compiler)

I used the 32332-32081 numbers to generate instruction counts to project
worst case performance for the NS32310 and the NS32381, worst case being
using the identical math routines and minimizing the pipelining of the
32310.  These project performance for the 32332-32381 (15MHz) at approx-
imately 1100-1200 KWhets and 32332-32310 (15MHz) at 1500-1600 KWhets. 
Since both the 32310 and 32381 will have new instructions that will
impact the math libraries, the real performance could be higher.
Just for interest, preliminary analysis is saying pipelining should
improve performance at least 15% overall (30% for the floating point
portion of the instruction mix).

I would like to add my own question to the value of benchmarks.  That
is what do the people on the net feel about transcendental functions?
The Whetstone seems to me to place more emphasis on them than real life.
One of the reasons for not including them directly in the 32081 was that
it was felt that implementing them in math routines instead of hardware 
was more cost effective.  Is this true or are transcendentals important
enough for the increased cost of implementing them in hardware?