Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!rice!titan!preston
From: preston@titan.rice.edu (Preston Briggs)
Newsgroups: comp.arch
Subject: paired registers
Message-ID: <4259@kalliope.rice.edu>
Date: 11 Jul 89 16:18:33 GMT
References: <2190@oakhill.UUCP> <13980@lanl.gov> <2199@oakhill.UUCP>
Sender: usenet@rice.edu
Reply-To: preston@titan.rice.edu (Preston Briggs)
Organization: Rice University, Houston
Lines: 84


People have been arguing about how difficult it is to compile
for a RISC or CISC.  The subject of "paired registers" arose
as a difficulty associated with RISCs.

My 1st (bad) experience was with the AP-101, a sixteen bit
version of the 360 used in the Shuttle;  certainly a CISC.  There are
other examples.  However, they aren't hard!

There are 2 common cases:
	shifts
	    perhaps source and destination can be profitably paired (IBM RT)
	floats
	    sometimes single precision requires 1 register,
	    and double requires a pair of (adjacent) registers.

	    The 680x0 family commonly uses a coprocessor to
	    handle FP.  The 68881 has 8x80 bit registers
	    and thus avoids the problem.  Of course, there is
	    a speed penalty in that "single precision" results are
	    determined to more than the required precision.

There are other cases of course; the much discussed "divide and remainder"
operation, or multiply giving a 64 bit result, and so on.
All of these can be handled in a similar fashion.

I use a coloring register allocator, styled after Chaitin (vs. Chow).
After building the interference graph, we make a pass
over the code looking for COPY's that can be removed.
(If the source and destination don't interfere, coalesce them
and remove the COPY instruction.)  Additionally, we attempt to
coalesce the destination and a source operand of 2-address instrctions.
These are well known and mentioned in the original papers by Chaitin.

It's also possible to handle register pairs at the same time.
If, while examining the code, an instruction is discovered that requires
some of its registers to be paired, we find a pair of machine registers
(that don't interfere with the symbolic (uncolored) registers) and
"pre-color" the pair.  

If no pair can be found (because of pre-existing interferences), 
then we may have a problem.  The solution depends on the available 
instruction set.  For the RT (and paired shifts), a COPY must be inserted
later (while handling any remaining 2-address instructions).
For FP-type problems, spill code may be required; circumstances vary.

You may be unconvinced.
  The cost (compile-time and coding time) is minimal;
  we already have all the machinery in place and we're already
  making a pass over the code to coalesce registers.

  The "symbolic registers" had to be assigned to real registers
  eventually, so pre-assigning is ok.  

  It is true that the graph becomes more constrained; 
  but the pairing requirement is the constraint -- pre-assignment 
  just adds this additional constraint to the graph.

  The scheme requires a graph coloring allocator; but you
  should be using one anyway.

I've implemented the idea for a compiler on the RT (paired shifts).
It works pretty well.  It's suprising how tricky the resulting
code can be; makes looking at assembly fun.

The original papers describing register allocation via graph coloring are:
	"Register aloocation via coloring"
	Chaitin, Auslander, Chandra, Cocke, Hopkins, Markstein
	Computer Languages, January 1981
and
	"Register allocation and spilling via graph coloring"
	Chaitin
	Proceedings of the SIGPLAN '82 Symposium on Compiler Construction
	SIGPLAN Notices, June 1982

The scheme for handling paired registers was mentioned by 
Martin Hopkins in a paper, "Compiling for the RT PC ROMP", which is collected
in a small book called "IBM RT Personal Computer Technology", 1986.

Additional papers on graph coloring register allocators appear
in SIGPLAN '84, '86, and '89.

Preston Briggs
preston@titan.rice.edu