Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!rice!titan.rice.edu!preston
From: preston@titan.rice.edu (Preston Briggs)
Newsgroups: comp.arch
Subject: Re: i860 registers/chip in general
Message-ID: <8744@brazos.Rice.edu>
Date: 11 Jun 90 17:03:18 GMT
References: <495@tau.megatek.uucp>
Sender: root@rice.edu
Organization: Rice University, Houston
Lines: 39

In article <495@tau.megatek.uucp> rstewart@megatek.UUCP (Rich Stewart) writes:
>Does anyone out there *know* why intel insisted in creating a distinction
>between floating point and integer registers? Are they planning to
>change this in the ix60 model?? Sure could write faster software if they
>did.

I *like* seperate FP and integer register sets.  Am I a minority among
compiler writers?  I can give a few reasons of varying importance.

- Seperate register sets allows us to specify more registers in
  a shorter instruction word.  5 bits can specify 32 int regs or
  32 FP regs.  On a three-address machine, we save 3 bits over
  naming 64 registers another way.

- Coloring register allocators can handle either style, but it's
  sometimes possible (e.g., the i860, and probably the MIPS and SPARC)
  to run to do seperate allocations.  This allows a very nice space saving
  at compile time.  (The interference graph is proportional to n^2 where
  n is the number of live ranges to be colored.  By coloring the ints
  seperately, we can save up to a factor of 4 on the interference
  graph alone.  There is also a related time savings for zeroing
  the graph initially.  Additionally, the savings in cache
  misses and paging overhead will be important.)

- Fancy restructuring to take advantage of the memory hierarchy
  needs to know the size of the cache and the size of the register
  set(s).  Seperate register sets increase the precision of our knowledge.
  That is, we don't have an unknown number of addressing temporaries
  out there competing for our FP registers.

How do seperate sets slow your code?  I can think of integer multiplies
and perhaps the pipelined loads which can only be done in the FP set.

Me?  I vote for a pipelined reciprocal and no pixel operations and
a prefetch for the data cache and ...

--
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu