Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!shukra!ram From: ram@shukra.Sun.COM (Renu Raman) Newsgroups: comp.arch Subject: Re: Pipelined FP add Message-ID: <129303@sun.Eng.Sun.COM> Date: 15 Dec 89 19:57:33 GMT References: <241@dg.dg.com> <33570@hal.mips.COM> <3740@brazos.Rice.edu> <38132@ames.arc.nasa.gov> <33623@mips.mips.COM> Sender: news@sun.Eng.Sun.COM Reply-To: ram@sun.UUCP (Renu Raman) Organization: Sun Microsystems, Mountain View Lines: 22 In article <33623@mips.mips.COM> mark@mips.COM (Mark G. Johnson) writes: >In article <38132@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: > >(I hope nobody still uses the IBM 360 setup, where you have to push > >the data to *memory* to get it back and forth between integer and fp > > units...) > >I believe SPARC does this, because there aren't FPR <-> GPR move instructions. >However, "*memory*" is just a few cycles away (3 for the store, 2 more for >the load) thanks to cache. minor addendum: What mark@mips said is for the Fujitsu/LSI/cypress parts. You can at best crunch it down to 3 cycles (2 cycles for store and 1 cycle for load - for doubles, it would be 4) if you can design a good cache system using the BIT ECL parts (which is left as an exercise to the reader :-)). So, it is very implementation dependent. The best case ofcourse is 2 cycles (if you can do single cycle stores. You can do single cycle load doubles to FP using the BIT parts) renu raman