Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!sun-barr!newstop!sun!shukra!ram
From: ram@shukra.Sun.COM (Renu Raman)
Newsgroups: comp.arch
Subject: Re: Pipelined FP add
Message-ID: <129303@sun.Eng.Sun.COM>
Date: 15 Dec 89 19:57:33 GMT
References: <241@dg.dg.com> <33570@hal.mips.COM> <3740@brazos.Rice.edu> <38132@ames.arc.nasa.gov> <33623@mips.mips.COM>
Sender: news@sun.Eng.Sun.COM
Reply-To: ram@sun.UUCP (Renu Raman)
Organization: Sun Microsystems, Mountain View
Lines: 22

In article <33623@mips.mips.COM> mark@mips.COM (Mark G. Johnson) writes:
>In article <38132@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>  >(I hope nobody still uses the IBM 360 setup, where you have to push
>  >the data to *memory* to get it back and forth between integer and fp
>  > units...)
>
>I believe SPARC does this, because there aren't FPR <-> GPR move instructions.
>However, "*memory*" is just a few cycles away (3 for the store, 2 more for
>the load) thanks to cache.

  minor addendum:

  What mark@mips said is for the Fujitsu/LSI/cypress parts.

  You can at best crunch it down to 3 cycles (2 cycles for store and 1 cycle
  for load - for doubles, it would be 4) if you can design a good
  cache system using the BIT ECL parts (which is left as an exercise to
  the reader :-)).  So, it is very implementation dependent.  The best case
  ofcourse is 2 cycles (if you can do single cycle stores. You can do
  single cycle load doubles to FP using the BIT parts)

  renu raman