Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!sdd.hp.com!elroy.jpl.nasa.gov!ucla-cs!oahu.cs.ucla.edu!marc
From: marc@oahu.cs.ucla.edu (Marc Tremblay)
Newsgroups: comp.arch
Subject: Re: i860 registers/chip in general
Message-ID: <36150@shemp.CS.UCLA.EDU>
Date: 11 Jun 90 19:51:55 GMT
References: <495@tau.megatek.uucp>
Sender: news@CS.UCLA.EDU
Organization: UCLA Computer Science Department
Lines: 50

In article <495@tau.megatek.uucp> rstewart@megatek.UUCP (Rich Stewart) writes:
>Does anyone out there *know* why intel insisted in creating a distinction
>between floating point and integer registers? Are they planning to
>change this in the ix60 model?? Sure could write faster software if they
>did.

As mentioned in another article, compilers can use that feature to improve
performance. Other reasons, just as important, are related to the logic and
the layout of the chip.

First of all, in order to have two instructions execute in one cycle
(one "core" and one floating-point instruction in the dual-instruction mode),
the architecture must allow parallelism between operations using the core
register file and the fpu register file.
This could be done by providing one large register file with many ports and
extra logic to avoid read/write conflicts among the two units.
Notice that the fpu register file already has 5 ports and the core unit
has a 3 port register file (some of these ports are time-multiplexed).
Combining these two register files could lead to extra capacitance on the
buses, slower register cells and possibly a longer clock cycle. Separation
of the two units allows faster separate execution.

The register files of the i860 are disjointed and can thus be tailored
to their own unit in a better way. The fix-point unit has a 32-bit datapath
so the register file needs to be only 32 bit wide with two buses running on
top of it to provide the two operands.
The floating-point unit, on the other hand, has a 64-bit datapath
going through the adder and multiplier. The registers for the fpu seem to be
organized as a stack of 64-bit registers so that two operands can be routed
to the adder/multiplier in one cycle and match the width of the datapath.
To allow floating-point loads of up to 128 bits to be done in a single cycle,
the width of the datapath was made wider than the 32-bit datapath of the core.
Notice that it is not clear when loading 128 bits if the words are demultiplexed
onto two buses to load two rows of registers (each row containing 64 bits)
or if the register file is organized as a stack of 128-bit registers which
is in turn multiplexed onto the 64-bit datapath.

Finally, the 8kbyte (2-way) data cache also has a 128-bit internal path
which fits very well (on the chip) with the fpu; the 4 data cache cells
having the same pitch as one fpu cell. This interaction between the
floating-point register file and the data cache allows the cache
to be used as "vector registers" fulfilling the necessary bandwidth
for some matrix operations.

In a word, YES it makes a lot of sense, from the architectural and
VLSI point of view, to separate the register files.

			Marc Tremblay
			internet: marc@CS.UCLA.EDU
			UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc