Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!cbmvax!daveh
From: daveh@cbmvax.UUCP (Dave Haynie)
Newsgroups: comp.sys.amiga
Subject: Re: RISC coprocessor for Amiga?
Message-ID: <7174@cbmvax.UUCP>
Date: 29 Jun 89 15:48:39 GMT
References: <26165@amdcad.AMD.COM>
Organization: Commodore Technology, West Chester, PA
Lines: 54

in article <26165@amdcad.AMD.COM>, tim@crackle.amd.com (Tim Olson) says:

> In article <18689@louie.udel.EDU> 451061@uottawa.bitnet writes:
> | When comparing Drystones, indeed RISC technology seems fantastic, but the real
> | world out there, including the simulated ray-traced world talks floating point.
> | And on floating point benchmarks, the RISC and CISC architectures are on par.

> What data do you have to back up your claim?  A look at the MIPS
> Performance Brief, issue 3.6, shows RISC processors consistently
> outperforming CISC processors on both integer and FP applications,

I think lot of it has to do more with how the various type of CPUs are doing 
floating point, rather than whether you're strictly a RISC or CISC CPU.  The
current bottom line I suspect is just what Tim is claiming; that RISC is
currently outperforming CISC at floating point.  But the reasons why make things
a little more interesting.  

For instance, every CISC type device uses an external math coprocessor, and 
most of these are relatively low performance devices, like 68882 or 80387 chips.
But they hook into the CPU with a bare minimum of support hardware, they extend
the CPU instruction set in a standard way, and they're relatively cheap.

RISC's taken two different approaches.  Devices like Moto's 88100 and even the
semi-RISCy Transputer have small by very fast on-chip floating point units
(the 88k actually has two, one for add/subtract, one for multiplies).  This is
kind of a RISCy answer to floating point -- you make the basic floating point
ops so fast, any more complex operations can be coded in software and still go
much faster than their microcoded counterpart external coprocessors.  The other
approach is to hook up a simple but fast FPU externally; SPARC machines do this.

Both of these techniques are as applicable to CISC as RISC, and they're both
getting implemented as we speak.  Weitek math chips can be hooked up to '030s
or '386s just as easily as to any RISC machine, with similar performance
increases (at a loss of the standard instruction set and register-extension
model you get with the tightly-coupled coprocessors, though at least in the
'386 world one Weitek chip is emerging as a second FPU standard).  And both 
corresponding next generation CISC CPUs are bringing on-chip a small, fast
FPU which works pretty much the same way that RISCy FPUs work.  The '040s going
to apparently do a floating point add in 3 clocks and a floating point 
multiply in 5. 

As Tim pointed out, there's lots more to a "FPU-bound" program than the actual
speed of the FPU instructions, though I think pretty much from this year on,
the integer speed differences will be more responsible for floating point
differences between CISC and RISC than the true raw floating point speed.  At
least maybe until all the RISC parts start vectorizing like some fool Cray or
something :-)

> 	-- Tim Olson
> 	Advanced Micro Devices
-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
           Be careful what you wish for -- you just might get it