Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.arch
Subject: Re: RISC as a "technology window"?
Summary: Integer arithmetic is similar to fp
Message-ID: <1188@l.cc.purdue.edu>
Date: 25 Mar 89 13:09:23 GMT
References: <1552@vicom.COM> <15690@cup.portal.com> <1562@vicom.COM> <717@m3.mfci.UUCP>
Organization: Purdue University Statistics Department
Lines: 65

In article <717@m3.mfci.UUCP>, rodman@mfci.UUCP (Paul Rodman) writes:
> In article <22974@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
> >
> >So, my question is:  If you ASSUME that you have to have high speed arithmetic,
> >what is the best way to partition functions between chips?  I believe that the
> >best way is Control, ALU/FPU, and instruction cache on one chip, and data
> >cache/MMU on another chip.  Why doesn't the market agree with me?
> >
> 
> Personally, I think the optimal partitioning for large f.p. problems would
> be to split the f.p. unit and registers onto another chip. The amount of
> comms required between the integer domain and floating domain is very
> small and extra cycles to go from one to the other aren't a problem (speaking
> from the our experience with partition the cpu in just this way). 
> 
> I haven't
> thought about how to solve the problems in splitting integer data caches
> and floating data caches, but I'm sure there would be an acceptable solution.
> Assuming your compiler guys are up to it , :-)
> 
> The main advantage here are:
< 
<       - You can get more pins for the f.p. chip for more loads/stores per
<         clock on the f-unit. Also you can get more than 16 d.p. registers 
<         (which isn't enough, in our experience for two piped fu's). 
< 
<       - The i-chip, which made no use of the funit hardware, has more area
<         for integer goodies, including a larger on-chip data cache for
<         integer data. I would rather have the MMU on this chip to make sure
<         that the memory pipeline for explicit loads is one cycle shorter,
<         i.e. save a chip crossing here.
< 
< Now the guys that don't use floating point can just buy the i-chip, those
< that want screaming f.p. perf buy both. 
< 
< I just don't see the point in doing hairy-chested cramming of f.p. hardware
< on the same chip as the integer stuff, when the two functional units 
< are so nicely seperable, to the benefit of each.

I can see the point of having separate address arithmetic and low-precision
multiplication for address purposes.  But restricting the term "integer
arithmetic" to that is destructive of computing power.

I am not arguing one way or the other on partitioning functions among chips.
I suspect it is a good idea, but this is not the point.  A floating point
operation consists of separating the sxponents from the mantissas, differencing
the exponents and shifting for addition and subtraction, performing the
fixed point operation, and performing the necessary shifting and exponent
calculation.  The cost is greatest for multiplication and division, where
the similarities between fixed and floating point are greatest.  Indeed,
many architectures with a floating point accelerator do integer multiplication
in that unit.

But suppose you want high precision arithmetic, integer, fixed point, or
floating point?  You now want a good integer arithmetic machine; if floating
point arithmetic must be used, integer arithmetic must be emulated in it,
which is quite clumsy.  The computational equipment for high precision
multiplication and division is largely the same for integer, fixed point,
and floating point.  For high-precision addition and subtraction, the overlap
is still great.  An architecture, language, or programmer not capable of
taking advantage of this must be considered limited.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)