Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!elroy!orion.cf.uci.edu!uci-ics!ucla-cs!marc
From: marc@oahu.cs.ucla.edu (Marc Tremblay)
Newsgroups: comp.arch
Subject: Re: RISC as a "technology window"?
Message-ID: <22202@shemp.CS.UCLA.EDU>
Date: 24 Mar 89 17:15:59 GMT
References: <1552@vicom.COM> <15690@cup.portal.com> <1562@vicom.COM> <15702@clover.ICO.ISC.COM> <27681@apple.Apple.COM> <15695@winchester.mips.COM> <22974@ames.arc.nasa.gov> <51@microsoft.UUCP>
Sender: news@CS.UCLA.EDU
Reply-To: marc@cs.ucla.edu (Marc Tremblay)
Organization: UCLA Computer Science Department
Lines: 59

In article <51@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes:
>lamaster@ames.arc.nasa.gov (Hugh LaMaster) wrote:
>> So, my question is:  If you ASSUME that you have to have high speed
>> arithmetic, what is the best way to partition functions between chips?
>> I believe that the best way is Control, ALU/FPU, and instruction cache
>> on one chip, and data cache/MMU on another chip.  Why doesn't the market
>> agree with me?

I also believe that putting the Integer unit and the FPU on the same 
chip makes sense. These two units have to communicate quickly, possibly 
sharing registers, and the FPU depends on the core section for its 
flow of instructions. I think that the trend is toward putting them
on the same chip anyway. Floating-point coprocessors were very
detached from the processor when they first came out (although
surprisingly enough the 8087 was a little closer), especially when 
you think that just setting up FPU instructions could take around
10 cycles! The MIPS approach, i.e. to make the coprocessor (R3010)
closely coupled is a huge improvement, especially regarding the 
instruction-issuing overhead. 
The new trend? Because the FPU needs the core unit then put it on-chip, 
(both Motorola 88000 and Intel i860 have the FPU on-chip).
Since you *currently* have to go off-chip to access reasonably large
caches, you might as well put the MMU with the caches.
The idea of Hugh LeMaster's comment above, may introduce problems for
accessing the instruction cache though, especially if it is physical.

>Well, given that latency to memory is a serious problem these days, and
>that MMU address translation is often on the critical path, moving
>it off-chip doesn't sound like such a good idea.

My reasoning is:
	access to reasonable cache -> need to go off-chip
	MMU is used to access cache -> need to go off-chip
	since you need to go off-chip anyway -> put MMU off-chip

	floating-point computations -> can be done internally
	FPU *needs* the integer unit -> put it close to the processor
	close to the processor -> at least closely coupled, better on-chip.

>I've said it before: I'm *astounded* nobody else has used this idea.
>It's such a great Win.  Cache control is the custom bit, so do it
>in custom logic.  With all the rest of the custom logic: on the
>microprocessor.  Cache RAM is very generic.  So don't re-invent the
>wheel.

FPU is also quite custom! :-)  --> put it on the same chip!

>Has anyone out there (other than MIPS, of course) considered this scheme
>and then rejected it?  Is my enthusiasm blind to some Great Problem?

I think that one of the reasons why some companies have rejected it
is that the size of a chip with integer + FPU is HUGE. The R3010, a great 
FPU coprocessor, with all its custom logic and its 75000 transistors 
is quite large (about 8.4 * 8.8 mm) especially when you compare it to a MMU.
It is easier (in terms of area) to put an MMU on-chip than a FPU on-chip,
at least for a good FPU!
					Marc Tremblay
					marc@CS.UCLA.EDU
					Computer Science Department, UCLA