Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!sdd.hp.com!elroy.jpl.nasa.gov!ucla-cs!oahu.cs.ucla.edu!marc From: marc@oahu.cs.ucla.edu (Marc Tremblay) Newsgroups: comp.arch Subject: Re: i860 registers/chip in general Message-ID: <36150@shemp.CS.UCLA.EDU> Date: 11 Jun 90 19:51:55 GMT References: <495@tau.megatek.uucp> Sender: news@CS.UCLA.EDU Organization: UCLA Computer Science Department Lines: 50 In article <495@tau.megatek.uucp> rstewart@megatek.UUCP (Rich Stewart) writes: >Does anyone out there *know* why intel insisted in creating a distinction >between floating point and integer registers? Are they planning to >change this in the ix60 model?? Sure could write faster software if they >did. As mentioned in another article, compilers can use that feature to improve performance. Other reasons, just as important, are related to the logic and the layout of the chip. First of all, in order to have two instructions execute in one cycle (one "core" and one floating-point instruction in the dual-instruction mode), the architecture must allow parallelism between operations using the core register file and the fpu register file. This could be done by providing one large register file with many ports and extra logic to avoid read/write conflicts among the two units. Notice that the fpu register file already has 5 ports and the core unit has a 3 port register file (some of these ports are time-multiplexed). Combining these two register files could lead to extra capacitance on the buses, slower register cells and possibly a longer clock cycle. Separation of the two units allows faster separate execution. The register files of the i860 are disjointed and can thus be tailored to their own unit in a better way. The fix-point unit has a 32-bit datapath so the register file needs to be only 32 bit wide with two buses running on top of it to provide the two operands. The floating-point unit, on the other hand, has a 64-bit datapath going through the adder and multiplier. The registers for the fpu seem to be organized as a stack of 64-bit registers so that two operands can be routed to the adder/multiplier in one cycle and match the width of the datapath. To allow floating-point loads of up to 128 bits to be done in a single cycle, the width of the datapath was made wider than the 32-bit datapath of the core. Notice that it is not clear when loading 128 bits if the words are demultiplexed onto two buses to load two rows of registers (each row containing 64 bits) or if the register file is organized as a stack of 128-bit registers which is in turn multiplexed onto the 64-bit datapath. Finally, the 8kbyte (2-way) data cache also has a 128-bit internal path which fits very well (on the chip) with the fpu; the 4 data cache cells having the same pitch as one fpu cell. This interaction between the floating-point register file and the data cache allows the cache to be used as "vector registers" fulfilling the necessary bandwidth for some matrix operations. In a word, YES it makes a lot of sense, from the architectural and VLSI point of view, to separate the register files. Marc Tremblay internet: marc@CS.UCLA.EDU UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc